STUDENT COMMENTS 11/17/2008

Paper A: Designing Games with a Purpose

Paper B: A Game-Theoretic Analysis of Games with a Purpose

Subhash Arja

Paper A

This paper explores and evaluates one type of online games called Games With a Purpose or GWAPs. The authors first the various types of games that can be considered to be a part of the GWAP category. These include output-agreement, inversion-problem, and input-agreement games. I thought the concept of the GWAP was very interesting because it is a win-win situation for the game developer as well as the players. The developer can collect information and, in essense, use the participants as free labor. Participants can have fun while playing and are not required to divulge private information. The authors also provide an evaluation framework to test how successful an GWAP will be. I found this section to be unsubstantiated since there are no real experiments or results presented. Also, it is very difficult to quantify how much a particular user enjoys playing the game and how long he will continue to come back to site to play. Online games' popularity tends to change quite rapidly as some games become more popular, and this sometimes happens for reasons as simple as the GUI or marketing. Following one model of success does not guarantee future success for developers. These characteristics make GWAP a very open field for research.

Paper B

I found this paper very interesting because it was a contrast to the types of papers we have been reading in the Peer Production and Human Computation areas. Mainly, it models one specific type of GWAP, the ESP game, in a theoretical fashion and applies game theoretical concepts. One assumption that the paper makes is that the players will act in a certain order and only "consistent" strategies are considered, where the player will not change his relative ordering based on his realized dictionary. This is a somewhat limiting assumption but is necessary to model the ESP game in game theoretic concepts. I had hoped that such theoretical modeling would have been done in the Peer Production systems because it would provide a certain foundation for developers who want to develop successful peer collaboration website. In class we discussed how the vast majority of such sites fail but there hasn't been deep study as what principles predict whether a system will succeed or fail.

Michael Aubourg

First of all, it is a good point to try to use the time everybody spent and will spend on Video game, to perform AI improvements, in backgrounds. Obviously the goal of GWAP is to replace computer decision and analysis , by human participations. I signed up to GWAP and played the game. I had to identify musics, with a short description. In the same time, my rival was playing. We have limited time to identify each music. We can read our rival's description. Upon that, we can say whether we were listening to the same music or not.

- First drawback. Providing people are honest, and their are incentives to report accurately, the idea is not too bad, but it is not a long term solution. I think it is a temporary way to improve search engines or Internet Algorithm. But I think it should have been more effective to make the computer learning HOW a human being identify a music (or picture) instead of the result itself. For instance, instead of saying "this is rock". We could say "I identify a characteristic rhythm inherent to rock music, the voices, guitar, and sound level are that of rock.." I guess it would be more effective so that computers could learn to focus on identifying a particular kind of music (rhythm, timbre, number of different frequencies, frequency spread in the frequency spectrum...)

- Second drawback : An important point here is that, these approaches rely on the fact that people want to be entertained and nothing else (no financial incentive...) So the goal of people here is to have fun. But honestly, how cool are these games compared to all others free-game available online ? I don't think it is a good idea to bank all hopes on the willingness of people to help. There are so many others fun games we can play. (just go on the first website I found : http://www.gamefun.gr). You don't need to sign up, it is free, very fast, no download required, and much more funny. So I think the should turn the game in something different. Maybe more exciting. A shoot-them all game with pictures, where players should identify animals and should shoot dogs for instance would be more fun ! Another idea could be to organize tournament. Indeed, think about this point : Let's imagine I am a good player, and I want to win a lot. The more I play, the better I am. But it becomes frustrating to play with people who provide very unclear descriptions or report. So the better I am, the better people I am going to play with. Then, people are going to be out of the tournament. And the algorithm could consider the answers and results of best players of the tournament more seriously. The limit now with ranking is that the best player is the guy who best guess what an unclear statement refers to considering the rival's way of playing. However, this is maybe not very useful for the algorithm.

- Third drawback : In order to make the games more fun, the time is often counted. But I think one of the side-effects is that people think less deeply on their outputs, hence neglecting the quality of their answers. And we could also think about an organization where 3 or more people play in the same time, instead of 2 to analyse the impact on answers accuracy.

Brett Harisson

Both of these papers pertain to "Games with a Purpose" (GWAP), i.e. games designed to take advantage of human computational power by soliciting humans to do work in exchange for entertainment.

The second paper attempts to analyze, from a high-level perspective, the design elements of GWAPs. In addition, this paper tries to quantify the effectiveness of such games in terms of how "efficient" the games are at producing computational results. The first paper gives a theoretical framework with which to understand the ESP Game, a game in which two players try to label images in such a way that one of their labels will match one of the other player's labels, resulting in a game that provides human-computed labels for images. This framework allows for discussion of relationships between the incentive models of the game and the resulting equilibrium strategies (regarding which labels each player chooses to submit in which order).

I like the first paper in that it provides a nice, natural model with which to understand these games. My criticism of the second paper is exactly this lack of formalism, with only intuitive notions of things such as "fun" and "entertainment". There are also several claims made about the accuracy of the results and the motivations not to cheat, which are not well explained in this paper. Several of the papers referenced by both of these papers do manage to treat the topics of cheating and incentives more formally.

The majority of the human-computation examples referenced in these papers have to do with language and image processing. What other sorts of problems can be done easily by a human but not so easily by a computer, and do these problems lend themselves to a game-like environment?

Haoqi Zhang

Paper A

The main contribution of the paper is in providing a general framework and specific examples of designs for games with a purpose: where as a by-product of humans playing a game they solve a useful problem that is hard for computers but easy (easier) for humans. The players play the game for enjoyment, and this in itself is designed/constructed, for example via challenging problems (timed responses, etc), skill levels and leaderboards, and randomness. The output of the game is easy to verify for correctness based on probabilistic arguments of accuracy when two users have agreed on an output and/or via the lack of incentives for deviation and other mechanisms to thwart cheating / manipulation.

I have a couple of comments/questions about the work:

- As described, it seems that the game systems (e.g. ESP) only use within game data to solve problems, e.g. when two users in game match on words. However, it seems useful to use word matches across games as well, e.g., if one player says cute in game x and another says cute in game y, this should still be a match. Furthermore, this may be something you want to give points for, e.g. some notion of a 'sixth sense'.

- It seems that the high skilled / frequent playing players are quite important (is there a power law here?). What are some ways to maintain their interest and increased participation? One possibility is to give bonus rounds / replay opportunities to players who have done very well together and assign higher probability to them being matched (or to suggest to them that they have great chemistry and play again). Of course, extra care needs to be taken to prevent cheating.

- While the formal design framework generalizes to any computational problem, I am not convinced that all games will be enjoyable. In a sense, many of the games here resemble games that people have traditionally played, e.g. charades, pictionary, etc. Would players play a game of sorting numbers?

- Much of the success seems to be due to small chunks of problems. Does this suggest that certain programming models, e.g. recursion may be especially amenable to this kind of design (e.g. implementing merge sort over large player populations).

Paper B

The main contribution of the paper is in providing a game theoretic model of the ESP Game. The work is the first of its kind in formalizing the game theoretic aspects of game playing in a peer production setting. In particular, the work is interesting because the strategies that players employ have a strong effect on the usefulness of the outputs, e.g., the choosing of high effort, detailed labeled versus lower effort, common words to match on. The authors show that it is a strict Bayes Nash Eq. for both players to play with low effort when players wishes to match on words as early as they can.

The interesting point this work makes (perhaps implicitly) is that the preferences of agents correspond strongly to the points structure of a game, in that by giving players a purpose one specifies the utilities and induced equilibria in the game. This general concept seems useful for more general game design and seem to be generalizable to other games as well.

I had a not so important point about the consistency assumption in strategies: it would seem that a player who plays dog, cat, puppy in that order may wish to play puppy before cat when dog isn't sampled because the non-match on dog implies some information about relative likeli-hoods of other words to match.

Peter Blair

Paper A

The authors of this article introudce the concept of GWAPS, or Games with a Purpose, as training grounds for AI machines. Humans have highly evolved abilities when it comes to recognizing patterns, for example images. To di a similar computation using a computer would be computationally expensive, so it makes sense to create a framework where AI programmers can utlize the human ability of recognition to train AI machines. Given that 200 million + hours are spent playing games in the US on a daily basis, it is natural to use games as a platform for image recognition. This sentiment is summarized in the comments: "We know from the literature on motivation in psychology and organization behavior that goals that are both well-specified and challenging lead to hgher levels of effor and task perfomace than goals that are too easy or vague." The two comments that I have relative to this article center on the issue of accuracy in the game. The first comment: since GWAPs are online games, there is a preference to players with internet access (i.e. wealthy people and people in developped countries) does this limit the number and types of identifications that are made with a particular image. The analogous problem in standardized testing is that of cultural bias in which reading comprehension passages are taken from a given literature that is familar to slivers of the population. A second comment has to do with the issue of collusion and how that affects accuracy. Are there games where a players partner is switched periodically, disrupting attempts between partners to develop a collusion system.

Paper B

In this Paper the authors model Games with a Purposes, in particular the image word matching game ESP. The goal is to first understand the equilibria of the game under the match-early incentive structure; and secondly to consider other incentive schemes, namely rare-words preferences and study how the equilibrium changes. Under the first incentive structure, the authors find that low effort is a Bayseian Nash for the game. Without making additional assumption.conditions on the players valuation it can not be shown that low effort is a Bayseian Nash for the rare-words preference incentive structure. This result is positive in that it results in a game structure that will produce more words describing a given image. I am interested in understanding the connection between choosing to play the game and choosing one's effort level i.e. low, medium and high. It seems reasonable to assume that if a player chooses to play the game that she will choose to play with a high level of effort precisely because playing the game is fun and competitive (one of the underlying motors of GWAPS as computational AI devices). What might be examples of someone choosing to play the game and to play with a low level of effort. One possible application of this model would be in understanding the incentives of standardized test for example the GRE where questions are computerized and become successively harder with time, assuming that previous questions were answered correctly. Could we create an GWAP where the pictures being identified get harder with time, giving the player the impression that he'she needs to think more about the answer and then thow in a simple picture every once in a while to utilize the players high-level of concentration following a hard image identification. This model could analogously be used to help students to derived test taking strategies for exams like the GRE, given that the incentive structure / point structure of the GRE is know.

Nick Wells

Paper A

This article describes games with a purpose, GWAPS, which the authors helped to build. I spent some time playing the games at www.gwap.com and they are quite fun and addicting even. The idea is that two strangers are randomly paired together for a game and given a set of inputs to which they respond. In some cases they respond to the inputs of the other player and try to play a match vs. no-match game or else they their outputs may be compared as in the Squigl game. The idea is that the data generated by players can be used to help a computer learn about the different inputs the users see such as music, pictures or words.

The paper explores general information the game-builders gained while building the website, though not from a theoretical point-of-view. In my opinion, the GWAP games are quite thoughtful solutions to computationally difficult problems.

Paper B

This paper examines games with a purpose, which are games designed to gain useful input from players for computationally difficult problems. This paper shows that a change in the incentives of a game can change the equilibrium outcomes and thus motivate consideration of incentives in game design. In the games, the object is to label as many images as possible in the specified time. The two strategies the paper mentions are the rare-words-first and match-early strategies, both aimed at quickening the game. The paper analyzes the different strategies from a game-theoretic approach.

It would be interesting to explore different strategies as well as to see what happens when alternative strategies are matched with each other. One interesting exploration may be to see how one can evolve the optimal strategies presented in the paper from different strategies using a learning algorithm.

Angela Ying

Paper A

This paper discusses several games that have been created for the purpose of providing training examples for machine learning, particularly for image recognition. These games, known as GWAPs, are shown to both highly entertaining and useful, since machine learning requires many training examples that may not be feasible to get from simply a small group of people. Some GWAPs include the ESP game, Peekaboom, Verbosity, etc that fall into three general categories. The first, output agreement games, are game where the users must agree on an output in order to win points. Inversion-problem games are games where one user gives an input and the other must give the correct output. Finally, input-agreement games are games where both users must give clues to each other to check if their input agrees. All of these games have incentive structures built in, such as point rankings and high score lists, and are able to get many users to contribute to machine learning. The main contribution of the paper was to discuss these games, why they work, and how effective they are.

I think this is a very interesting topic. I went onto the website and played a game of ESP. Although I played as a guest rather than a registered user, I can see how this type of game, given its small time requirement per game, can become addictive. It can compared to charity events, where people are more likely to contribute to a charity because they can enjoy themselves at an event rather than getting nothing out of it. Thus, a strong aspect of this paper is the entertainment value that people get out of these games. For future work, I would be interested in seeing the end result from this machine learning. It would be nice to see how effective the image identifier is before and after a certain number of game plays, and the value that the creators get out of these games.

Paper B

This paper analyzed a simplified version of the ESP game to calculate the Bayesian Nash Equilibrium. It discusses two preferences - match-early preferences where users try to speed through the game, and rare-words preferences where users would rather look for uncommon words than common words. For the first type, the paper demonstrated that the BNE for the game is when both players make a low effort (presumably this means they go for more frequent words). For the second type of game, the paper demonstrated that there is no consistent strategy that stochastically dominates the other strategies, or basically there is no one strategy that will always be better to use than another strategy.

I thought this paper had an interesting analysis of the ESP game, but I wasn't sure what the purpose of it was in relation to the actual purpose of the ESP game, machine learning. This paper seems to suggest that perhaps the ESP game as it is now is not very effective for machine learning because people only use simple, frequent words to describe the images. In addition, I was confused as to how the incentive structure of the ESP game would change to make people look for rarer words - would they get more points? there must be some kind of balance that people would make between using uncommon words and speeding through to get more images. What kind of result would the creators of the ESP game want to have?

Xiaolu Yu

Paper A

The importance of this paper is introducing the idea constructive channeling of human brainpower through computer games. In another word, networked individuals accomplishing work.

The Open Mind Initiative is a worldwide research endeavor developing "intelligent" software by leveraging human skills to train computers. Volunteers participate by providing answers to questions computers cannot answer, aiming to teach computer programs commonsense facts. However, the Open Mind approach involves two drawbacks: reliance on the willingness of unpaid volunteers to donate their time and no guarantee that the information the enter is correct.

Because of spams, many site owners don't even allow comments anymore, but without interactivity the internet might just be a newspaper. A system like such games might be able to provide the human oversight of high latency to out compete the CAPTCHA solving networks. Players would be recruited either from the general population, or the site owners and the site participants (sites as the source of comments). As suggested in the paper that to discourage players from random matching, scoring in input-agreement games strongly penalizes incorrect guesses. An accuracy feedback loop would be useful to rate players so less accurate players could be dropped from the game.

If the spammers begin participating in this game. Two bots from spammers who want to game the system may pit against each other. But the problem is that spammers actually are using humans to bypass these schemes. It definitely does slow them down. However, I wonder if it is economically worth it for some lucrative targets like Google. On the other hand, although systems based on money would be more reliable than those based on human interest, it is very crucial to realize how much farther we should go to gather human intelligence in order to compete with spammers, and how expensive such task is, given this "intelligence" library is somehow infinitely large.

Paper B

The main contribution of this paper is showing that low effort is a Bayesian-Nash equilibrium for all distributions on word frequencies, with players focusing attention on high-frequency words.

It is very important to realize that changes in the incentive structure can lead to different equilibrium structure. Therefore, in order to extend the set of labels to the words with high effort, it is critical to understand the incentive structure that results in playing words in order of increasing frequency in conjunction with high effort for both players.

Although the authors suggest that identifying specific score functions that provide desirable equilibrium and induces large-scale desirable behaviors, it would be interesting to think about goals of players – a certain amount of them are probably just one-time player and do not care about the scores more than the entertainment itself (or how many rounds they match). If this is the case to some extent, I want to ask whether an appropriate score function can eventually lead to an extension of the label (to less frequently used words) set for an image.

Nikhil Srivastava

The Von Ahn and Dabbish article provides a good overview of GWAPs and classifies them based on their structure and the method by which they extract information from participants. It also attempts to formalize the "fun" aspect of playing and ties it to game design. The Jain and Parkes gives a game-theoretic analysis of a particular GWAP - the ESP game, and proves an important result about low effort in the current setup of the game.

Both are interesting analyses, and it is probably very useful to formalize some of the aspects of GWAPs that are "obvious" to people who've played before. The ESP game is certainly an intricate one, with several possible cheating mechanisms. Perhaps more effort could be made in identifying and ruling out these strategies, such as a global strategy of identifying colors in the ESP game and the use of rhyming words and spelling-out in Verbosity.

It also seems important (and largely missing in the theory) to encourage the social aspects of gameplay. The *feeling* of playing against a real human, apart from incentive or reward considerations, is valuable in its own right. Perhaps designers should leverage social networks to allow friends to play with or against each other, or pair up strong players to make their experience more enjoyable and to maximize information output.

Finally, the *type* of information gathered is probably relevant to the design of these games, despite the fact it is difficult to model. A game like ESP relies on accurate descriptive skills and an ability to identify visually presented objects, whereas games like Verbosity reward creativity, generally strong vocabulary, and an ability to relate words and concepts at a higher level. User strategies are probably dictated as much by these considerations as by traditional utility ones.

Andrew Berry

Paper A

This paper describes three types of games which involve tasks that humans can perform with relative ease, but with which computers struggle. These GWAPs can be constructed for data retrieval to train machine learning algorithms. GWAPs are divided into three general categories: output-agreement games, input-agreement games and inversion-problem games. Since the most common purpose for GWAPs appear to be for labeling, I wonder if the collection of individuals who play GWAPs accurately reflect the target population. I think it would have been good to include demographic statistics for GWAP users. For instance, I would imagine the majority of users are from a younger age demographic. The paper also states that making the games fun and challenging are the best ways to ensure user participation and data collection. Timing responses is the first idea presented to achieve these goals. However, couldn’t timed responses create undesirable effects such as allowing less data to be collected or sacrificing the quality of user submissions for rapid quantity? I consider these more minor concerns within the paper, but I am unconvinced of the effectiveness of automated players. I think having an automated player play a set of prerecorded game responses almost defeats the purpose of creating a GWAP in the first place. It doesn’t allow for new labels/moves to enter the system. I do think the evaluation metrics for GWAP are excellent. Thoroughput captures how much data the system is collecting and ALP is a sufficient rough estimate of how fun the game is.

Ziyad Aljarboua

Paper A

I find the idea of turning games into problem solvers very inter sting. Given

the massive number of hours spent on playing games, it is very a appealing idea to put those hours into use. The main limitation here is the line between games with purpose (GWAPs) that are fun to play. It seems to me that once there is more interest directed to designing such games, the first and most challenging obstacle is finding GWAPs that people are untested in playing in or making such tasks fun. From my readings, it seems to me that the scope of such games is very limited . Most the games that i read about revolve around image tagging. While this approach proved effective in tasks like image tagging, i find it hard to apply this approach to other types of tasks.

One might consider other incentives besides enjoyment. As we discussed in Taskcn, GWAPs could also be destined to have monetary incentives.

Paper B

This paper presents an game-theortic model of the ESP game and discusses

implications on the equilibrim of the games structure under incentives. Two methods of payoffs are presented here: Match-early and words-first preferences. In the match-early preference model, players wish to complete as many rounds as possible and receive same score regardless of the number of words they match. In the rare-words-first preference model, the normal scheme of assigning score to players is reflect.d

It was shown in this paper that playing a decreasing frequency in low effort

level game is a Bayesian Nash equilibrium for the ESP game. It was also shown that in the rare-words first preference model, the decreasing frequency is no longer stable. In this model, playing words in order of increasing frequency with high effort level is a Bayesian Nash equilibrium.

Alice Gao

Both papers are concerned with games with a purpose. The contribution of the first paper is to discuss important design issues in making games with a purpose a successful approach for solving computational problems using human game play. I think the idea of this approach is simple but in a sense groundbreaking. However, there are still many problems with these approaches for accomplishing particular goals. When I tried to play these games, I discovered many simple manipulation strategies commonly used by players. For the ESP game, players usually try to match on words of a certain category such as color, or obvious objects, or very common words. For the Verbosity game, even though the describer can only enter two words for each clue, players have thought of all kinds of ways to enter clues that doesn't make sense in the sentence but nonetheless are useful for guessing the word. These are all important factors that we need to consider when we are thinking of modifying the designs of these games.

I am also interested in reading about how the data obtained from these game plays are currently processed and interpreted. I think this is also an important step in obtaining good data. Perhaps, what we can really do is to use some very intelligent way to process these data to filter out the useless ones and keep the good ones. This might be an interesting research direction in addition to the research in improving the game designs.

The main contribution of the second paper is to give a formal game-theoretic analysis of one equilibrium behaviour of the ESP game. This paper seems to be a starting point of many papers to come. I think this analysis is useful because it proves that our intuition on playing frequent words being an equilibrium is correct. We should certainly take advantage of these kinds of analyses and try to come up with design modifications that will promote other player behaviours.

Victor Chan

The two papers presented games with purpose, which leverages human computational powers to solve problems that are hard for computers. The papers Designing Games with a Purpose by von Ahn and Dabbish, talks about the types of GWAP's that they have created so far on the gwap.com website. This article mainly deals with three types of game structure that have been used and briefly touches on the various aspects of the results that are generated. The second paper A Game Theoretic Analysis of Games with a Purpose by Jain (our TF) and Parkes looks specifically at the ESP game, which is one of the games on gwap.com and follows the first form of GWAP's presented in von Ahn's paper. This second paper discusses in detail the Bayesian Nash Equilibrium that is achieved based on the different effort levels of the players.

The main contribution of the first paper, was to give an overview of GWAP's. The authors show the three types of GWAPs that they have implemented, which includes output agreement games, inversion-problem games, and input agreement games. It is interesting to see these three games, since they can all be played on gwap.com. What I found interesting was whether or not these games were created based on the three predefined templates, or were the templates derived from the games. Other interesting points in the paper include discussing how to evaluate the efficiency of the algorithms. I found this interesting, since it deviates from big O style of understanding efficiency, however it should be noted that the authors did not seem to take into account the correctness of the labels in defining throughput. Since it appears that users will choose low effort words to increase matching throughput, this type of cheating does not really reflect the algorithms efficiency at solving the task at hand.

The main contribution of the second paper, is providing a model for the ESP game where it was determined that playing the strategy of decreasing frequency is the Bayesian-Nash equilibrium of the second stage of he ESP game and that Low Effort with the decreasing frequency strategy is the Bayesian-Nash equilibrium for the overall game. These results seem to suggest that the labelling resulting from game will not be very useful, since player will tend to choose the easiest labels, such as color, or shape, etc. This will even be true if Taboo words are factored in, since the new Low Effort will consist of variations of the Taboo words. The paper also presents the idea of using the rare-words first preference. Under this preference scheme, players will likely use increasing frequency in the words they choose to label the image.

The one thing that was unclear was how the data generated by the ESP game or other GWAP's are used. When playing the game myself, I often found that the other players did not care for the content of the game, and did use the decreasing frequency and low effort strategy, and as a results, after a few rounds, I was also using this strategy to maximize the points. Playing in this way tends to generate words such as colors, sizes, shape or other generic nouns, which seem useless to labelling the actual image. Interestingly, Google's Image Labeler does present more points for using less frequent/harder words. However the same problem still occurs. Another problem that I encountered was during the play of verbosity. The describer would often ignore the preformed sentences given to them, and use each field in the inputs to generate a sentence to the guesser. The guesser would also enter questions into the guessing fields, to ask for more information from the describer. This type of cheating seems to defeat the purpose of the game, since the common sense being generated is from the semantics, rather than key words. It would be interesting to see how the incentive structure can be changed so that player will enter more useful data.

Rory Kulz

Paper A

GWAPs are fun, they do at least Google a service, and there are some

basic forms that they tend to fit and that seem to work. Okay, got it,

thanks paper. Maybe I'm being a little dismissive because I saw the

Google Image Labeler a long, long time ago, and I'm not really

interested in game design, but what can I do? It's another soft

Communications of the ACM paper, and I'm a math guy. I'm truly

thankful for Shaili's paper.

Anyway, I think throughput is the obvious metric to consider, although

in the domain of web games, it would probably be useful to include in

the measure an average player count or the probability that a given

visitor to the website chooses to play the game at all, since an

important part of game design seems to be luring the player in in the

first place. That being said, I would have liked to have seen

statistics for the games in question. They toss out a couple of

numbers, but there's no overall picture forming of which game forms

seem so far to work better than others. This would have been more

interesting for me to read, at least, I think.

Paper B

A really interesting paper; this is very nice way to formalize the

output-agreement games with a purpose we've been looking at.

(Intuitively, the "inversion-problem" and "input-agreement" games seem

harder to model due to the sort of "updating" the agents need to do

due to the communication mechanism.) Although I don't think the paper

says so, it does seem to generalize to all such games from just the

ESP Game, at least ones where a notion of frequency on the outputs and

agents' awareness of that frequency is sensible. I think this can even

maybe be stretched -- some modifications would definitely be

necessary, though -- to cover some games like Squigl, replacing

frequency with an idea of degrees of coarseness -- do I just circle

the general area or do I try to make a very detailed outline?

Coarseness seems to encourage an earlier match but also to generate

less useful matches.

I was curious about the Taboo Words also. I'm not sure if I'm right

about this, but it seems like on a single game basis Taboo Words don't

make much of a difference? It's just removing a few words from the

dictionary. It would I suppose really be a problem when you consider

the game being played over and over and over by distinct pairs of

agents who also have the common knowledge that the game has been

played by many other pairs of agents (or actually better, the same

pair of agents with a memory wipe each play-through), because then to

consistently play high-frequency words may have the unintentional

consequence of drying up the utility of strategies.

As for applications -- arguably the least interesting thing to mention

-- it would be neat to test some point scheme that encourages

infrequent words, i.e. incentivizes towards the rare-words-first

model, and compare with the current ESP Game setup. It's arguable that

less data would be extracted from the system because players would

learn towards obscurity, shrinking overlap with their partners.

Travis May

These articles address the construction of games with a purpose, computer games that “trick” people into become computational agents that perform necessary tasks for their system. For example, the ESP game is a fun-to-play game that requires individuals to agree on a label for a photo with an individual with whom they’ve had no communication besides their guesses, ultimately yielding appropriate tags for the photo.

While creating these games seems like a clever way to utilize human computational power, my largest concern is with incentives. While the game is fun to play once, it is unlikely that I return to play it with frequency, and there is nothing besides my curiosity encouraging me to do so. A better system might somehow incentivize me to participate, ideally with something other than fake points. One way to do this might be to utilize human computational power in CAPTCHAs. For example, instead of giving a randomly generated set of letters, there could be a task that requires human computation (such as photo tagging) that decides whether I am approved based on whether I match other results. Thus, instead of relying on a sense of curiosity, the program would incentivize/require my participation.

Avner May

These two papers discuss “Games with a purpose” – games designed to accomplish some computational task while entertaining its players. Typically, these tasks are ones which are easy for humans, but rather difficult for computers. An example of such a task is labeling of images; given that it is very hard for computers to recognize images, it might makes more sense to have humans dedicate time to accomplishing this task, than it does to design a probably less effective, and much more complicated, algorithm to do it. I think that this is quite an interesting approach to problem solving, and potentially very effective. In general I am interested in how to harness people's knowledge and skills over the internet, thus using the enormous computational power of people online. It makes sense to do this in a game setting, in which people participate voluntarily, in order to get more users, and maybe even more reliable data.

With regard to the first paper (Designing Games with a purpose), I thought it did some interesting work with regard to outlining the GWAP structures which have been seen to be quite effective, as well as discussing the different metrics for analyzing the efficiency of such a game. Nonetheless, I did not find the article too insightful; it seemed more like an advice column to someone hoping to create a GWAP.

With regard to the second paper (A Game-Theoretic Analysis…), I thought that it attacked an interesting topic: approaching GWAPs from a game-theoretic angle. I think it is quite important, when designing a GWAP, to make sure that it is in the players’ best interest, as well as an equilibrium state, for the player to give “correct” output. This paper does a good job analyzing the ESP game in this manner.

Sagar Mehta

I felt the most interesting part of these papers was the issue of how to design a game such that it is both entertaining to play and provides useful results. While playing the ESP game, at times I felt that the "best" label for an image was often missed in favor of the "easiest" to think of. This can be remedied by awarding more points to "better" answers, but could also make the game less entertaining as it is not as fast paced. I'd like to see work in this field go a step further and actually use the data that is gathered from the human computation as training data and then measure the success of the program. I also would want to know more information on the users who play GWAPs. Are there a few predominant players that dominate the game? Are they motivated by "fake points" or by the utility they gain from playing the game in general.

I can think of a few potential applications of GWAPs. For example, one particularly difficult AI problem is getting computers to talk like humans. Is there a good game to train computers on how to speak? I think it could also be interesting to apply human computation to the problem of search. Everyone has varying degrees of success in finding information on the web. An interesting game could be one where competing users search for information for a third person. These results could potentially be used to optimize search algorithms, though somewhat paradoxically, the two searchers would probably use another search engine to find the actual data…

Hao-Yuh Su

Paper A

The main goal of GWAP games is to complete tasks that are difficult to machines but easy to human beings. Therefore, a good design that can ensure sufficient population of players to achieve this purpose is essential. This paper investigates all the designing factors of GWAP games includes motivating factors, structures, guidelines and evaluations. I agree on the viewpoints of this paper. I think it is greet to have such systematic analysis on GWAP games, which definitely is beneficial for the designers to make further improvement on their works. However, I have some other opinions GWAP games based on my personal experience on it. The first suggestion is that perhaps ESP games may consider adopting auto-correction function (like Google does) in it. This way, the inputs of each player may increase during a fixed time, and it may enhance the whole throughput, which is the main factor of the evaluation on GWAP games' performance. My second suggestion is about the "taboo words" of the ESP game. I believe such feature broadens the wideness of input vocabulary, but I think it is unnecessary to show a warning sign every time when a player entering those expected words, and it's also a little bit disturbing for players, who should be completely concentrating on the give picture. In my opinion, a better way to implement this feature is just to show the taboo words by the side of the picture and spare the time popping up warning sign, while screening out these vocabularies in the background process. My last suggestion is about the design of "Randomness," randomness in difficulties in particular. In this page, it is claimed that because difficulty varies, the game is able to keep being interesting and engaging for expert and novice players alike. I cannot agree on this point. Everyone that has some experience on video games or PC games knows that most games have several different levels of difficulties for different levels of players. I believe such design is based the psychology of human beings. If I were an expert of, say, Super Mario, I definitely wouldn't want to waste my time on repeating trivial "easy mode"; instead, I would like to enter the hard mode directly once I start playing. Perhaps in the future, the ESP game may be able to give different labels of difficulty levels for each picture; it may give pictures of proper difficulties to the players of the corresponding level and keep the existing population of players.

Paper B

This paper adapts the game-theory point of view to analyze the ESP game. In this paper, the authors proposed two incentive structures- match-early preferences and rare-words-first preferences, and further derive the results showing that difference incentive structures leads to different equilibrium structures. This result creates the possibility of formal incentive design that brings desirable system-wide outcomes, while preventing cheating and other human factors at the same time. Such finding is unquestionably a piece of encouraging news for game designers. However, I have some questions about it. In the beginning, the authors make assumption about consistent strategies on both models. That is, in the two subject models, a player doesn't change her relative ordering of elements according to the realized dictionary. However, according to my personal experience, when I was playing the ESP game, I would guess the possible outputs of the other player from previous matched words. I even guessed which level she might be, and I did adjust my strategy based on this observations and guesses. I wonder if it is oversimplified to assume a fixed strategy for each player. On the other hand, if such consistent strategies do not exist, are we still able to build up a proper structure for ESP game and achieve those desirable goals?

Malvika Rao

A game-theoretic analysis of games with a purpose: Very interesting model

formulation. This paper takes the phenomenon of the ESP game and

establishes a theoretical formulation. There seem to be plenty of

directions for future research as listed in the conclusion section.

One thing that comes to mind is what outcomes we can expect if mixed

strategies are taken into account. is it reasonable to model players as

initially playing low effort and then switching to hight effort when they

realize that low effort is not succeeding as they would like? or perhaps

vice-versa: players play high effort and then tone down as they get tired.

It would also be interesting to look at other preference models.

The paper models the game as a 2 stage process where the 1st stage

involves a decision on effort level and the 2nd stage involves a choice of

dictionary. Would it be possible to model this game differently? The paper

does mention that future work could consider taking into account a cost

associated with effort. Could we treat this as a repeated game with

discounting? an early win is highly valuable (in the early match

preference model) but later wins are less valuable. so utility obtained

later is discounted - which means that once players realize that an early

win will not happen they put in less effort.

Might players punish each other for not getting them a win? Suppose that

one player felt that somehow the other player was not putting in enough

effort?

Brian Young

Jain and Parkes leave it an open question whether there exists some method of causing players to choose rare words first. This seems unlikely to me under the current kinds of games.

As correctness of answers is verified only by confirmation from other players, getting people to input rare words is a difficult task, since the rarer a word is, the less likely it is to be matched by some other player. As such, people with more words in their dictionary than the average person does would have decreasing incentive to play words that the average person does not know -- in other words, they would "play down" to the average. Any incentive structure that could in fact incentivize rare words would have to deal with the possibility of allowing unique but irrelevant information to slip through.

My idle thoughts on generating more detailed results, though, led me to imagine a set of images that had all been tagged with the same or similar words. We might consider setting up a game in which one player, the "describer", was given one of those images to describe, knowing that the "guesser" would have to deduce which of those similarly-tagged images was being described. It's conceivable that this might result in more and more detailed information, to more accurately distinguish between similar images.

I also have to wonder: to what extent does knowing that the games they are playing have "a purpose" influence people's enjoyment? I remember how the site FreeRice, which combines a vocabulary quiz with donations to charity for each correct answer, was widely popularized, while a game that was nothing but a vocabulary quiz would most likely not have achieved such popularity on its own.