STUDENT COMMENTS 11/17/2008
Paper
A: Designing Games with a Purpose
Paper
B: A Game-Theoretic Analysis of Games with a Purpose
Subhash Arja
Paper A
This paper explores and evaluates one type of
online games called Games With a Purpose or GWAPs. The authors first the
various types of games that can be considered to be a part of the GWAP
category. These include output-agreement, inversion-problem, and
input-agreement games. I thought the concept of the GWAP was very interesting
because it is a win-win situation for the game developer as well as the
players. The developer can collect information and, in essense, use the
participants as free labor. Participants can have fun while playing and are not
required to divulge private information. The authors also provide an evaluation
framework to test how successful an GWAP will be. I
found this section to be unsubstantiated since there are no real experiments or
results presented. Also, it is very difficult to quantify how much a particular
user enjoys playing the game and how long he will continue to come back to site
to play. Online games' popularity tends to change quite rapidly as some games
become more popular, and this sometimes happens for reasons as simple as the
GUI or marketing. Following one model of success does not guarantee future
success for developers. These characteristics make GWAP a very open field for
research.
Paper B
I found this paper very interesting because it
was a contrast to the types of papers we have been reading in the Peer Production
and Human Computation areas. Mainly, it models one specific type of GWAP, the
ESP game, in a theoretical fashion and applies game theoretical concepts. One
assumption that the paper makes is that the players will act in a certain order
and only "consistent" strategies are considered, where the player
will not change his relative ordering based on his realized dictionary. This is
a somewhat limiting assumption but is necessary to model the ESP game in game
theoretic concepts. I had hoped that such theoretical modeling would have been
done in the Peer Production systems because it would provide a certain
foundation for developers who want to develop successful peer collaboration
website. In class we discussed how the vast majority of such sites fail but there
hasn't been deep study as what principles predict whether a system will succeed
or fail.
Michael Aubourg
First of all, it is a good point to try to use
the time everybody spent and will spend on Video game, to perform AI
improvements, in backgrounds. Obviously the goal of GWAP is to replace computer
decision and analysis , by human participations. I
signed up to GWAP and played the game. I had to identify musics, with a short
description. In the same time, my rival was playing. We have limited time to
identify each music. We can read our rival's description. Upon that, we can say
whether we were listening to the same music or not.
- First drawback. Providing people are honest,
and their are incentives to report accurately, the idea is not too bad, but it
is not a long term solution. I think it is a temporary
way to improve search engines or Internet Algorithm. But I think it should have
been more effective to make the computer learning HOW a human being identify a music (or picture) instead of the result itself. For
instance, instead of saying "this is rock". We could say "I
identify a characteristic rhythm inherent to rock music, the voices, guitar,
and sound level are that of rock.." I guess it
would be more effective so that computers could learn to focus on identifying a
particular kind of music (rhythm, timbre, number of different frequencies, frequency
spread in the frequency spectrum...)
- Second drawback : An
important point here is that, these approaches rely on the fact that people
want to be entertained and nothing else (no financial incentive...) So the goal
of people here is to have fun. But honestly, how cool are these games compared
to all others free-game available online ? I don't
think it is a good idea to bank all hopes on the willingness of people to help.
There are so many others fun games we can play. (just
go on the first website I found : http://www.gamefun.gr). You don't need to
sign up, it is free, very fast, no download required,
and much more funny. So I think the should turn the
game in something different. Maybe more exciting. A
shoot-them all game with pictures, where players should identify animals and
should shoot dogs for instance would be more fun !
Another idea could be to organize tournament. Indeed, think about this point : Let's imagine I am a good player, and I want to win
a lot. The more I play, the better I am. But it becomes frustrating to play
with people who provide very unclear descriptions or report. So the better I
am, the better people I am going to play with. Then, people are going to be out
of the tournament. And the algorithm could consider the answers and results of
best players of the tournament more seriously. The limit now with ranking is
that the best player is the guy who best guess what an
unclear statement refers to considering the rival's way of playing. However,
this is maybe not very useful for the algorithm.
- Third drawback : In
order to make the games more fun, the time is often counted. But I think one of
the side-effects is that people think less deeply on
their outputs, hence neglecting the quality of their answers. And we could also
think about an organization where 3 or more people play in the same time,
instead of 2 to analyse the impact on answers accuracy.
Brett Harisson
Both of these papers pertain to "Games with
a Purpose" (GWAP), i.e. games designed to take advantage of human
computational power by soliciting humans to do work in exchange for
entertainment.
The second paper attempts to analyze, from a
high-level perspective, the design elements of GWAPs. In addition, this paper
tries to quantify the effectiveness of such games in terms of how
"efficient" the games are at producing computational results. The
first paper gives a theoretical framework with which to understand the ESP
Game, a game in which two players try to label images in such a way that one of
their labels will match one of the other player's labels, resulting in a game
that provides human-computed labels for images. This framework allows for
discussion of relationships between the incentive models of the game and the
resulting equilibrium strategies (regarding which labels each player chooses to
submit in which order).
I like the first paper in that it provides a
nice, natural model with which to understand these games. My criticism of the
second paper is exactly this lack of formalism, with only intuitive notions of
things such as "fun" and "entertainment". There are also
several claims made about the accuracy of the results and the motivations not
to cheat, which are not well explained in this paper. Several of the papers
referenced by both of these papers do manage to treat the topics of cheating
and incentives more formally.
The majority of the human-computation examples
referenced in these papers have to do with language and image processing. What
other sorts of problems can be done easily by a human but not so easily by a
computer, and do these problems lend themselves to a game-like environment?
Haoqi Zhang
Paper A
The main contribution of the paper is in
providing a general framework and specific examples of designs for games with a
purpose: where as a by-product of humans playing a game they solve a useful
problem that is hard for computers but easy (easier) for humans. The players
play the game for enjoyment, and this in itself is designed/constructed, for
example via challenging problems (timed responses, etc), skill levels and
leaderboards, and randomness. The output of the game is easy to verify for
correctness based on probabilistic arguments of accuracy when two users have
agreed on an output and/or via the lack of incentives for deviation and other
mechanisms to thwart cheating / manipulation.
I have a couple of comments/questions about the
work:
- As described, it seems that the game systems
(e.g. ESP) only use within game data to solve problems, e.g. when two users in
game match on words. However, it seems useful to use word matches across games
as well, e.g., if one player says cute in game x and another says cute in game
y, this should still be a match. Furthermore, this may be something you want to
give points for, e.g. some notion of a 'sixth sense'.
- It seems that the high skilled / frequent
playing players are quite important (is there a power law here?). What are some
ways to maintain their interest and increased participation? One possibility is
to give bonus rounds / replay opportunities to players who have done very well
together and assign higher probability to them being matched (or to suggest to
them that they have great chemistry and play again). Of course, extra care
needs to be taken to prevent cheating.
- While the formal design framework generalizes
to any computational problem, I am not convinced that all games will be
enjoyable. In a sense, many of the games here resemble games that people have
traditionally played, e.g. charades, pictionary, etc. Would players play a game
of sorting numbers?
- Much of the success seems to be due to small
chunks of problems. Does this suggest that certain programming models, e.g.
recursion may be especially amenable to this kind of design (e.g. implementing
merge sort over large player populations).
Paper B
The main contribution of the paper is in
providing a game theoretic model of the ESP Game. The work is the first of its
kind in formalizing the game theoretic aspects of game playing in a peer
production setting. In particular, the work is interesting because the
strategies that players employ have a strong effect on the usefulness of the
outputs, e.g., the choosing of high effort, detailed labeled versus lower
effort, common words to match on. The authors show that it is a strict Bayes
Nash Eq. for both players to play with low effort when players wishes to match
on words as early as they can.
The interesting point this work makes (perhaps
implicitly) is that the preferences of agents correspond strongly to the points
structure of a game, in that by giving players a purpose one specifies the
utilities and induced equilibria in the game. This general
concept seems useful for more general game design and seem to be
generalizable to other games as well.
I had a not so important point about the
consistency assumption in strategies: it would seem that a player who plays
dog, cat, puppy in that order may wish to play puppy before cat when dog isn't
sampled because the non-match on dog implies some information about relative
likeli-hoods of other words to match.
Peter Blair
Paper A
The authors of this article introudce the
concept of GWAPS, or Games with a Purpose, as training grounds for AI machines.
Humans have highly evolved abilities when it comes to recognizing patterns, for
example images. To di a similar computation using a computer would be
computationally expensive, so it makes sense to create a framework where AI
programmers can utlize the human ability of recognition to train AI machines.
Given that 200 million + hours are spent playing games in the US on a daily
basis, it is natural to use games as a platform for image recognition. This
sentiment is summarized in the comments: "We know from the literature on
motivation in psychology and organization behavior that goals that are both
well-specified and challenging lead to hgher levels of effor and task perfomace
than goals that are too easy or vague." The two comments
that I have relative to this article center on the issue of accuracy in the
game. The first comment: since GWAPs are online games, there is a
preference to players with internet access (i.e.
wealthy people and people in developped countries) does this limit the number
and types of identifications that are made with a particular image. The
analogous problem in standardized testing is that of cultural bias in which
reading comprehension passages are taken from a given literature that is
familar to slivers of the population. A second comment has to do with the issue
of collusion and how that affects accuracy. Are there games where a players
partner is switched periodically, disrupting attempts between partners to
develop a collusion system.
Paper B
In this Paper the authors model Games with a
Purposes, in particular the image word matching game ESP. The goal is to first
understand the equilibria of the game under the match-early incentive
structure; and secondly to consider other incentive schemes, namely rare-words
preferences and study how the equilibrium changes. Under the first incentive
structure, the authors find that low effort is a Bayseian Nash for the game.
Without making additional assumption.conditions on the players valuation it can not be shown that low effort is a Bayseian Nash for the
rare-words preference incentive structure. This result is positive in that it
results in a game structure that will produce more words describing a given
image. I am interested in understanding the connection between choosing to play
the game and choosing one's effort level i.e. low, medium and high. It seems
reasonable to assume that if a player chooses to play the game that she will
choose to play with a high level of effort precisely because playing the game
is fun and competitive (one of the underlying motors of GWAPS as computational
AI devices). What might be examples of someone choosing to play the game and to
play with a low level of effort. One possible
application of this model would be in understanding the incentives of standardized
test for example the GRE where questions are computerized and become
successively harder with time, assuming that previous questions were answered
correctly. Could we create an GWAP where the pictures being identified get
harder with time, giving the player the impression that he'she needs to think
more about the answer and then thow in a simple picture every once in a while
to utilize the players high-level of concentration following a hard image
identification. This model could analogously be used to help students to
derived test taking strategies for exams like the GRE, given that the incentive
structure / point structure of the GRE is know.
Nick Wells
Paper A
This article describes games with a purpose,
GWAPS, which the authors helped to build. I spent some time playing the games
at www.gwap.com and they are quite fun and addicting even. The idea is that two
strangers are randomly paired together for a game and given a set of inputs to
which they respond. In some cases they respond to the inputs of the other
player and try to play a match vs. no-match game or else they their outputs may
be compared as in the Squigl game. The idea is that the data generated by
players can be used to help a computer learn about the different inputs the
users see such as music, pictures or words.
The paper explores general information the
game-builders gained while building the website, though not from a theoretical
point-of-view. In my opinion, the GWAP games are quite thoughtful solutions to
computationally difficult problems.
Paper B
This paper examines games with a purpose, which
are games designed to gain useful input from players for computationally
difficult problems. This paper shows that a change in the incentives of a game
can change the equilibrium outcomes and thus motivate consideration of
incentives in game design. In the games, the object is to label as many images
as possible in the specified time. The two strategies the paper mentions are
the rare-words-first and match-early strategies, both aimed at quickening the
game. The paper analyzes the different strategies from a game-theoretic
approach.
It would be interesting to explore different
strategies as well as to see what happens when alternative strategies are
matched with each other. One interesting exploration may be to see how one can
evolve the optimal strategies presented in the paper from different strategies
using a learning algorithm.
Angela Ying
Paper A
This paper discusses several games that have
been created for the purpose of providing training examples for machine
learning, particularly for image recognition. These games, known as GWAPs, are
shown to both highly entertaining and useful, since machine learning requires
many training examples that may not be feasible to get from simply a small
group of people. Some GWAPs include the ESP game, Peekaboom, Verbosity, etc
that fall into three general categories. The first, output agreement games, are
game where the users must agree on an output in order to win points.
Inversion-problem games are games where one user gives an input and the other
must give the correct output. Finally, input-agreement games are games where
both users must give clues to each other to check if their input agrees. All of
these games have incentive structures built in, such
as point rankings and high score lists, and are able to get many users to
contribute to machine learning. The main contribution of the paper was to
discuss these games, why they work, and how effective they are.
I think this is a very interesting topic. I went
onto the website and played a game of ESP. Although I played as a guest rather
than a registered user, I can see how this type of game, given its small time
requirement per game, can become addictive. It can compared to charity events,
where people are more likely to contribute to a charity because they can enjoy
themselves at an event
rather than getting nothing out of it. Thus, a strong aspect of
this paper is the entertainment value that people get out of these games. For
future work, I would be interested in seeing the end result from this machine
learning. It would be nice to see how effective the image identifier is before
and after a certain number of game plays, and the value that the creators get
out of these games.
Paper B
This paper analyzed a simplified version of the
ESP game to calculate the Bayesian Nash Equilibrium. It discusses two
preferences - match-early preferences where users try to speed through the
game, and rare-words preferences where users would rather look for uncommon
words than common words. For the first type, the paper demonstrated that the
BNE for the game is when both players make a low effort (presumably this means
they go for more frequent words). For the second type of game, the paper
demonstrated that there is no consistent strategy that stochastically dominates
the other strategies, or basically there is no one strategy that will always be
better to use than another strategy.
I thought this paper had an interesting analysis
of the ESP game, but I wasn't sure what the purpose of it was in relation to
the actual purpose of the ESP game, machine learning. This paper seems to
suggest that perhaps the ESP game as it is now is not very effective for
machine learning because people only use simple, frequent words to describe the
images. In addition, I was confused as to how the incentive structure of the
ESP game would change to make people look for rarer
words - would they get more points? there must be some
kind of balance that people would make between using uncommon words and
speeding through to get more images. What kind of result would the creators of
the ESP game want to have?
Xiaolu Yu
Paper A
The importance of this paper is introducing the
idea constructive channeling of human brainpower through computer games. In
another word, networked individuals accomplishing work.
The Open Mind Initiative is a worldwide research
endeavor developing "intelligent" software by leveraging human skills
to train computers. Volunteers participate by providing answers to questions
computers cannot answer, aiming to teach computer programs commonsense facts.
However, the Open Mind approach involves two drawbacks: reliance on the
willingness of unpaid volunteers to donate their time and no guarantee that the
information the enter is correct.
Because of spams, many site owners don't even
allow comments anymore, but without interactivity the internet
might just be a newspaper. A system like such games might be able to provide
the human oversight of high latency to out compete the CAPTCHA solving
networks. Players would be recruited either from the general population, or the
site owners and the site participants (sites as the source of comments). As suggested in the paper that to discourage players from random
matching, scoring in input-agreement games strongly penalizes incorrect
guesses. An accuracy feedback loop would be useful to rate players so
less accurate players could be dropped from the game.
If the spammers begin participating in
this game. Two bots from spammers who want to game the
system may pit against each other. But the problem is that spammers actually
are using humans to bypass these schemes. It definitely does slow them down.
However, I wonder if it is economically worth it for some lucrative targets
like Google. On the other hand, although systems based on money would be more
reliable than those based on human interest, it is very crucial to realize how
much farther we should go to gather human intelligence in order to compete with
spammers, and how expensive such task is, given this "intelligence"
library is somehow infinitely large.
Paper B
The main contribution of this paper is showing
that low effort is a Bayesian-Nash equilibrium for all
distributions on word frequencies, with players focusing attention on
high-frequency words.
It is very important to realize that changes in
the incentive structure can lead to different equilibrium structure. Therefore,
in order to extend the set of labels to the words with high effort, it is
critical to understand the incentive structure that results in playing words in
order of increasing frequency in conjunction with high effort for both players.
Although the authors suggest that identifying
specific score functions that provide desirable equilibrium and induces
large-scale desirable behaviors, it would be interesting to think about goals
of players – a certain amount of them are probably just one-time player
and do not care about the scores more than the entertainment itself (or how
many rounds they match). If this is the case to some extent, I want to ask
whether an appropriate score function can eventually lead to an extension of
the label (to less frequently used words) set for an image.
Nikhil Srivastava
The Von Ahn and Dabbish article provides a good
overview of GWAPs and classifies them based on their structure and the method
by which they extract information from participants. It also attempts to
formalize the "fun" aspect of playing and ties it to game design. The
Jain and Parkes gives a game-theoretic analysis of a particular GWAP - the ESP
game, and proves an important result about low effort in the current setup of
the game.
Both are interesting analyses, and it is
probably very useful to formalize some of the aspects
of GWAPs that are "obvious" to people who've played before. The ESP
game is certainly an intricate one, with several possible cheating mechanisms.
Perhaps more effort could be made in identifying and ruling out these
strategies, such as a global strategy of identifying colors in the ESP game and
the use of rhyming words and spelling-out in Verbosity.
It also seems important (and largely missing in
the theory) to encourage the social aspects of gameplay. The *feeling* of
playing against a real human, apart from incentive or reward considerations, is
valuable in its own right. Perhaps designers should leverage social networks to
allow friends to play with or against each other, or pair up strong players to
make their experience more enjoyable and to maximize information output.
Finally, the *type* of information gathered is
probably relevant to the design of these games, despite the fact it is
difficult to model. A game like ESP relies on accurate descriptive skills and
an ability to identify visually presented objects, whereas games like Verbosity
reward creativity, generally strong vocabulary, and an ability to relate words
and concepts at a higher level. User strategies are
probably dictated as much by these considerations as by traditional utility
ones.
Andrew Berry
Paper A
This paper describes three types of games which involve tasks that humans can perform with
relative ease, but with which computers struggle. These GWAPs can be
constructed for data retrieval to train machine learning
algorithms. GWAPs are divided into three general categories: output-agreement
games, input-agreement games and inversion-problem games. Since the most common
purpose for GWAPs appear to be for labeling, I wonder if the collection of
individuals who play GWAPs accurately reflect the target population. I think it
would have been good to include demographic statistics for GWAP users. For
instance, I would imagine the majority of users are from a younger age
demographic. The paper also states that making the games fun and challenging
are the best ways to ensure user participation and data collection. Timing
responses is the first idea presented to achieve these goals. However, couldnÕt
timed responses create undesirable effects such as allowing less data to be
collected or sacrificing the quality of user submissions for rapid quantity? I
consider these more minor concerns within the paper, but I am unconvinced of
the effectiveness of automated players. I think having an automated player play
a set of prerecorded game responses almost defeats the purpose of creating a
GWAP in the first place. It doesnÕt allow for new labels/moves to enter the
system. I do think the evaluation metrics for GWAP are excellent. Thoroughput captures how much data the system is collecting and ALP is a
sufficient rough estimate of how fun the game is.
Ziyad Aljarboua
Paper A
I find the idea of turning games into problem
solvers very inter sting. Given
the massive number
of hours spent on playing games, it is very a appealing idea to put those hours
into use. The main limitation here is the line between games with purpose
(GWAPs) that are fun to play. It seems to me that once there is more interest
directed to designing such games, the first and most challenging obstacle is
finding GWAPs that people are untested in playing in or making such tasks fun.
From my readings, it seems to me that the scope of such games is very limited . Most the games that i read about revolve around
image tagging. While this approach proved effective in tasks like image
tagging, i find it hard to apply this approach to other types of tasks.
One might consider other incentives besides
enjoyment. As we discussed in Taskcn, GWAPs could also be destined to have
monetary incentives.
Paper B
This paper presents an game-theortic model of
the ESP game and discusses
implications
on the equilibrim of the games structure under incentives. Two methods of
payoffs are presented here: Match-early and words-first preferences. In the
match-early preference model, players wish to complete as many rounds as possible
and receive same score regardless of the number of words they match. In the
rare-words-first preference model, the normal scheme of assigning score to
players is reflect.d
It was shown in this paper that playing a
decreasing frequency in low effort
level
game is a Bayesian Nash equilibrium for the ESP game. It was also shown that in
the rare-words first preference model, the decreasing frequency is no longer
stable. In this model, playing words in order of increasing frequency with high
effort level is a Bayesian Nash equilibrium.
Alice Gao
Both papers are concerned with games with a
purpose. The contribution of the
first paper is to discuss important design issues in making games with a
purpose a successful approach for solving computational problems using human
game play. I think the idea of
this approach is simple but in a sense groundbreaking. However, there are still many problems
with these approaches for accomplishing particular goals. When I tried to play these games, I
discovered many simple manipulation strategies commonly used by players. For the ESP game, players usually try
to match on words of a certain category such as color, or obvious objects, or
very common words. For the
Verbosity game, even though the describer can only enter two words for each
clue, players have thought of all kinds of ways to enter clues that doesn't
make sense in the sentence but nonetheless are useful for guessing the
word. These are all important
factors that we need to consider when we are thinking of modifying the designs
of these games.
I am also interested in reading about how the
data obtained from these game plays are currently processed and
interpreted. I think this is also
an important step in obtaining good data.
Perhaps, what we can really do is to use some very intelligent way to
process these data to filter out the useless ones and keep the good ones. This might be an interesting research
direction in addition to the research in improving the game designs.
The main contribution of the second paper is to
give a formal game-theoretic analysis of one equilibrium
behaviour of the ESP game.
This paper seems to be a starting point of many papers to come. I think this analysis is useful because
it proves that our intuition on playing frequent words being an
equilibrium is correct. We
should certainly take advantage of these kinds of analyses and try to come up
with design modifications that will promote other player behaviours.
Victor Chan
The two papers presented games with purpose,
which leverages human computational powers to solve problems that are hard for
computers. The papers Designing Games with a Purpose by von Ahn and Dabbish,
talks about the types of GWAP's that they have created so far on the gwap.com website.
This article mainly deals with three types of game structure that have been
used and briefly touches on the various aspects of the results that are
generated. The second paper A Game Theoretic Analysis of Games with a Purpose
by Jain (our TF) and Parkes looks specifically at the ESP game, which is one of
the games on gwap.com and follows the first form of GWAP's presented in von
Ahn's paper. This second paper discusses in detail the Bayesian Nash
Equilibrium that is achieved based on the different effort levels of the
players.
The main contribution of the first paper, was to give an overview of GWAP's. The authors show
the three types of GWAPs that they have implemented, which includes output
agreement games, inversion-problem games, and input agreement games. It is
interesting to see these three games, since they can all be played on gwap.com.
What I found interesting was whether or not these games were created based on
the three predefined templates, or were the templates derived from the games. Other
interesting points in the paper include discussing how to evaluate the
efficiency of the algorithms. I found this interesting, since it deviates from
big O style of understanding efficiency, however it should be noted that the
authors did not seem to take into account the correctness of the labels in
defining throughput. Since it appears that users will choose low effort words
to increase matching throughput, this type of cheating does not really reflect
the algorithms efficiency at solving the task at hand.
The main contribution of the second paper, is providing a model for the ESP game where it was
determined that playing the strategy of decreasing frequency is the
Bayesian-Nash equilibrium of the second stage of he ESP game and that Low
Effort with the decreasing frequency strategy is the Bayesian-Nash equilibrium
for the overall game. These results seem to suggest that the labelling
resulting from game will not be very useful, since player will tend to choose
the easiest labels, such as color, or shape, etc. This will even be true if
Taboo words are factored in, since the new Low Effort will consist of
variations of the Taboo words. The paper also presents the idea of using the
rare-words first preference. Under this preference scheme, players will likely
use increasing frequency in the words they choose to label the image.
The one thing that was unclear was how the data
generated by the ESP game or other GWAP's are used. When playing the game
myself, I often found that the other players did not care for the content of
the game, and did use the decreasing frequency and low effort strategy, and as a results, after a few rounds, I was also using this
strategy to maximize the points. Playing in this way tends to generate words
such as colors, sizes, shape or other generic nouns, which seem useless to
labelling the actual image. Interestingly, Google's Image Labeler does present
more points for using less frequent/harder words. However the same problem
still occurs. Another problem that I encountered was during the play of
verbosity. The describer would often ignore the preformed sentences given to
them, and use each field in the inputs to generate a sentence to the guesser.
The guesser would also enter questions into the guessing fields, to ask for
more information from the describer. This type of cheating seems to defeat the
purpose of the game, since the common sense being generated is from the
semantics, rather than key words. It would be interesting to see how the
incentive structure can be changed so that player will enter more useful data.
Rory Kulz
Paper A
GWAPs are fun, they do at least Google a
service, and there are some
basic
forms that they tend to fit and that seem to work. Okay, got it,
thanks
paper. Maybe I'm being a little dismissive because I saw the
Google Image Labeler a long, long time ago, and
I'm not really
interested
in game design, but what can I do? It's another soft
Communications of the ACM paper, and I'm a math
guy. I'm truly
thankful
for Shaili's paper.
Anyway, I think throughput is the obvious metric
to consider, although
in the domain of
web games, it would probably be useful to include in
the measure an
average player count or the probability that a given
visitor
to the website chooses to play the game at all, since an
important
part of game design seems to be luring the player in in the
first
place. That being said, I would have liked to have seen
statistics
for the games in question. They toss out a couple of
numbers,
but there's no overall picture forming of which game forms
seem so far to work
better than others. This would have been more
interesting
for me to read, at least, I think.
Paper B
A really interesting paper; this is very nice
way to formalize the
output-agreement
games with a purpose we've been looking at.
(Intuitively, the "inversion-problem"
and "input-agreement" games seem
harder
to model due to the sort of "updating" the agents need to do
due to the
communication mechanism.) Although I don't think the paper
says so, it does
seem to generalize to all such games from just the
ESP Game, at least ones where a notion of
frequency on the outputs and
agents'
awareness of that frequency is sensible. I think this can even
maybe
be stretched -- some modifications would definitely be
necessary,
though -- to cover some games like Squigl, replacing
frequency
with an idea of degrees of coarseness -- do I just circle
the general area
or do I try to make a very detailed outline?
Coarseness seems to encourage an earlier match
but also to generate
less useful
matches.
I was curious about the Taboo Words also. I'm
not sure if I'm right
about
this, but it seems like on a single game basis Taboo Words don't
make much of a
difference? It's just removing a few words from the
dictionary.
It would I suppose really be a problem when you
consider
the game being
played over and over and over by distinct pairs of
agents
who also have the common knowledge that the game has been
played
by many other pairs of agents (or actually better, the same
pair of agents with
a memory wipe each play-through), because then to
consistently
play high-frequency words may have the unintentional
consequence
of drying up the utility of strategies.
As for applications -- arguably the least
interesting thing to mention
-- it would be neat to
test some point scheme that encourages
infrequent
words, i.e. incentivizes towards the rare-words-first
model,
and compare with the current ESP Game setup. It's arguable that
less data would be
extracted from the system because players would
learn
towards obscurity, shrinking overlap with their partners.
Travis May
These articles address the construction of games
with a purpose, computer games that ÒtrickÓ people into become computational
agents that perform necessary tasks for their system. For example, the ESP game is a fun-to-play game that
requires individuals to agree on a label for a photo with an individual
with whom theyÕve had no communication besides their guesses, ultimately
yielding appropriate tags for the photo.
While creating these games seems like a clever
way to utilize human computational power, my largest concern is with
incentives. While the game is fun
to play once, it is unlikely that I return to play it with frequency, and there
is nothing besides my curiosity encouraging me to do so. A better system might somehow
incentivize me to participate, ideally with something other than fake points. One way to do this might be to utilize
human computational power in CAPTCHAs.
For example, instead of giving a randomly generated set of letters,
there could be a task that requires human computation (such as photo tagging)
that decides whether I am approved based on whether I match other results. Thus, instead of relying on a sense of
curiosity, the program would incentivize/require my participation.
Avner May
These two papers discuss ÒGames with a purposeÓ –
games designed to accomplish some computational task while entertaining its
players. Typically, these tasks
are ones which are easy for humans, but rather
difficult for computers. An
example of such a task is labeling of images; given that it is very hard for
computers to recognize images, it might makes more sense to have humans
dedicate time to accomplishing this task, than it does to design a probably
less effective, and much more complicated, algorithm to do it. I think that this is quite an
interesting approach to problem solving, and potentially very effective. In general I am interested in how to
harness people's knowledge and skills over the internet,
thus using the enormous computational power of people online. It makes sense to do this in a game
setting, in which people participate voluntarily, in order to get more users,
and maybe even more reliable data.
With regard to the first paper (Designing Games
with a purpose), I thought it did some interesting work with regard to
outlining the GWAP structures which have been seen to be quite effective, as
well as discussing the different metrics for analyzing the efficiency of such a
game. Nonetheless, I did not find
the article too insightful; it seemed more like an advice column to someone
hoping to create a GWAP.
With regard to the second paper (A
Game-Theoretic AnalysisÉ), I thought that it attacked an interesting topic:
approaching GWAPs from a game-theoretic angle. I think it is quite important, when designing a GWAP, to
make sure that it is in the playersÕ best interest, as well as an equilibrium
state, for the player to give ÒcorrectÓ output. This paper does a good job analyzing the ESP game in this
manner.
Sagar Mehta
I felt the most interesting part of these papers
was the issue of how to design a game such that it is both entertaining to play
and provides useful results. While playing the ESP game, at times I felt that
the "best" label for an image was often missed in favor of the
"easiest" to think of. This can be remedied by awarding more points
to "better" answers, but could also make the game less entertaining as it is not as fast paced. I'd like to see
work in this field go a step further and actually use the data that is gathered
from the human computation as training data and then measure the success of the
program. I also would want to know more information on the users who play GWAPs.
Are there a few predominant players that dominate the game? Are they motivated
by "fake points" or by the utility they gain from playing the game in
general.
I can think of a few potential applications of
GWAPs. For example, one particularly difficult AI problem is getting computers
to talk like humans. Is there a good game to train computers on how to speak? I
think it could also be interesting to apply human computation to the problem of
search. Everyone has varying degrees of success in finding information on the
web. An interesting game could be one where competing users search for
information for a third person. These results could potentially be used to
optimize search algorithms, though somewhat paradoxically, the two searchers
would probably use another search engine to find the actual dataÉ
Hao-Yuh Su
Paper A
The main goal of GWAP games is to complete tasks
that are difficult to machines but easy to human beings. Therefore, a good
design that can ensure sufficient population of players to achieve this purpose
is essential. This paper investigates all the designing factors of GWAP games
includes motivating factors, structures, guidelines and evaluations. I agree on
the viewpoints of this paper. I think it is greet to have such systematic
analysis on GWAP games, which definitely is beneficial for the designers to make
further improvement on their works. However, I have some other opinions GWAP
games based on my personal experience on it. The first suggestion is that
perhaps ESP games may consider adopting auto-correction function (like Google
does) in it. This way, the inputs of each player may increase during a fixed
time, and it may enhance the whole throughput, which is the main factor of the
evaluation on GWAP games' performance. My second suggestion is about the
"taboo words" of the ESP game. I believe such feature broadens the
wideness of input vocabulary, but I think it is unnecessary to show a warning
sign every time when a player entering those expected words, and it's also a
little bit disturbing for players, who should be completely concentrating on
the give picture. In my opinion, a
better way to implement this feature is just to show the taboo words by the
side of the picture and spare the time popping up warning sign, while screening
out these vocabularies in the background process. My last suggestion is about
the design of "Randomness," randomness in difficulties in particular.
In this page, it is claimed that because difficulty varies, the game is able to
keep being interesting and engaging for expert and novice players alike. I
cannot agree on this point. Everyone that has some experience on video games or
PC games knows that most games have several different levels of difficulties
for different levels of players. I believe such design is based the psychology
of human beings. If I were an expert of, say, Super Mario, I definitely
wouldn't want to waste my time on repeating trivial "easy mode";
instead, I would like to enter the hard mode directly once I start playing.
Perhaps in the future, the ESP game may be able to give different labels of
difficulty levels for each picture; it may give pictures of proper difficulties
to the players of the corresponding level and keep the existing population of
players.
Paper B
This paper adapts the game-theory point of view
to analyze the ESP game. In this paper, the authors proposed two incentive
structures- match-early preferences and rare-words-first preferences, and
further derive the results showing that difference incentive structures leads
to different equilibrium structures. This result creates the possibility of
formal incentive design that brings desirable system-wide outcomes, while
preventing cheating and other human factors at the same time. Such finding is
unquestionably a piece of encouraging news for game designers. However, I have
some questions about it. In the beginning, the authors make assumption about
consistent strategies on both models. That is, in the two subject models, a
player doesn't change her relative ordering of elements according to the
realized dictionary. However, according to my personal experience, when I was
playing the ESP game, I would guess the possible outputs of the other player
from previous matched words. I even guessed which level she might be, and I did
adjust my strategy based on this observations and guesses. I wonder if it is
oversimplified to assume a fixed strategy for each player. On the other hand,
if such consistent strategies do not exist, are we still able to build up a
proper structure for ESP game and achieve those desirable goals?
Malvika Rao
A game-theoretic analysis of games with a
purpose: Very interesting model
formulation.
This paper takes the phenomenon of the ESP game and
establishes
a theoretical formulation. There seem to be plenty of
directions
for future research as listed in the conclusion section.
One thing that comes to mind is what outcomes we
can expect if mixed
strategies
are taken into account. is it reasonable to model
players as
initially
playing low effort and then switching to hight effort when they
realize
that low effort is not succeeding as they would like? or
perhaps
vice-versa: players
play high effort and then tone down as they get tired.
It would also be interesting to look at other
preference models.
The paper models the game as a 2 stage process
where the 1st stage
involves
a decision on effort level and the 2nd stage involves a choice of
dictionary.
Would it be possible to model this game differently? The paper
does mention that
future work could consider taking into account a cost
associated
with effort. Could we treat this as a repeated game with
discounting?
an early win is highly valuable (in the early match
preference
model) but later wins are less valuable. so utility
obtained
later
is discounted - which means that once players realize that an early
win will not
happen they put in less effort.
Might players punish each other for not getting
them a win? Suppose that
one player felt
that somehow the other player was not putting in enough
effort?
Brian Young
Jain and Parkes leave it an open question
whether there exists some method of causing players to choose rare words first.
This seems unlikely to me under the current kinds of games.
As correctness of answers is verified only by
confirmation from other players, getting people to input rare words is a
difficult task, since the rarer a word is, the less likely it is to be matched
by some other player. As such, people with more words in their dictionary than
the average person does would have decreasing incentive to play words that the
average person does not know -- in other words, they would "play
down" to the average. Any incentive structure that could in fact
incentivize rare words would have to deal with the possibility of allowing
unique but irrelevant information to slip through.
My idle thoughts on generating more detailed
results, though, led me to imagine a set of images that had all been tagged
with the same or similar words. We might consider setting up a game in which
one player, the "describer", was given one of those images to
describe, knowing that the "guesser" would have to deduce which of
those similarly-tagged images was being described. It's conceivable that this
might result in more and more detailed information, to more accurately
distinguish between similar images.
I also have to wonder: to what extent does
knowing that the games they are playing have "a purpose" influence
people's enjoyment? I remember how the site FreeRice, which combines a
vocabulary quiz with donations to charity for each correct answer, was widely
popularized, while a game that was nothing but a vocabulary quiz would most
likely not have achieved such popularity on its own.