NICK WELLS

Paper A

This study finds that most new articles are created after a reference to them is first entered. This type of growth is interesting because Wikipedia is expanding its coverage in a "breadth-first traversal." When a new article is created, Wikipedia includes links to nonexistant articles which help spur their creation by suggesting them to future authors.

I think that this is an interesting study of Wikipedia. It would be interesting

to see if a similar conclusion could be reached regarding other data-driven

websites.

Paper B

This paper observes data from Wikipedia, Essembly, Bugzilla and Digg in order to form theories about their dynanic growth. They examine the power law as a fit for the user participation and found it to be a good fit. They examine the exponent (alpha) and relate it to the effort required for user participation. They found that these alphas were good indicators of effort required to contribute. They also examine the distribution of the participation points (i.e. edits, diggs) in order to assess them and found them to be lognormal.

This is an interesting article with substantive results. I would be interested

in seeing if the results hold in more than just these four cases examined.

ANDREW BERRY

Paper A

Even though the paper states that the growth of Wikipedia lies comfortably between the inflationary and deflationary hypothesis, I was most surprised that Wikipedia is built from the inside out. I always thought that users made particular pages based on their own interests. However, it does make sense that references must lead to definitions for sustainable and connected growth. Yet, because of the loose framework regarding contributions to Wikipedia, it is an interesting phenomenon that the ratio between complete and incomplete articles remains constant. The Barabasi model is a well thought out model of Wikipedia, but how does one quantify a vertex’s connectivity? Also, the model assumes that at each timestep the maximum number of vertices that can be added to the network is equal to the number of vertices already in the network. However, in reality it is quite feasible for this to be more. The authors do adjust the model which provides a better fit to the Wikipedia data, but I do wonder if some assumptions are flawed.

Paper B

The Wilkinson paper demonstrates that the probability a person stops contributing varies inversely with the number of entries the user has already made. This paper also shows that a small number of popular topics account for the majority of contributions. These conclusions are supported rather robustly as they hold for Wikipedia, Digg, Bugzilla and Essembly which are different peer production systems for different purposes. Wilkinson uses a power law to describe the user participation levels in all systems where the probability of quitting is inversely proportional to previous contributions. The conclusions of the paper also suggest that participation is dependent mostly on this probability and the difficulty of contributing to the system. Even though this model fits the data for all four sites, I do not think it can be generalized to all online peer production systems. Suppose there is a peer production system similar to Wikipedia where you could contribute to a particular topic easily and incrementally. In this case there would be some sweet spot where as the number of contributors grows, the user has a lower probability of quitting. The user may have an incentive to not contribute if he has to start the entry or do much of the buildwork, but when the number of previous contributors is enough, the user’s probability of quitting may decrease up to a certain point.

PETER BLAIR

Paper A

In this article, the authors endeavor to understand and model the growth of Wikkipedia. The point of this study is to determine whether the growth of Wikkipedia is stable or unstable. This work was motivated by the idea of an inflationary process of growth -- whereby as more articles were contributed to Wikkipedia, the number of unwritten articles that are referenced in completed articles would increase without bound, eventually undermining the credibility and trustworthiness of Wikkipedia as a reliable source of information. The converse hypothesis -- a deflationary scenario would arise if the propotion of articles written was outstripping the number of new articles that the existing articles pointed to -- eventually because of a decreasing rate of reference articles, the accumulation of new knowledge to Wikkipedia would stagnate and eventually the site would not evolve nearly quickly enough to contain relevant information or enough new infroamtion.I wonder wether the following asumption is reasonable: that the number of pages with stubs that are useful and the number of non-useful pages that are not marked with stubs cancel identically. It would be more satisfying for the authors to motivate this result some more. In reading the article I am reminded of the attempt of facebook.com to translate its pages into other languges using just its users as translators. From all reports the effort was not successful -- by comparison it seems that wikkipedia has been successful because of its ability to compartmentalize its articles in workable and independent chucks that do not depend on each other in the same was taht an integrated website like facebook depends on all of its functions and pages to have a consistent translation. In this light facebook may be an intersting case study for the sustainable growth of a website.

Paper B

In this paper the authors study the contributions of users to peers production systems. In particular the authors focus on both the volume of contribution in addtion to the time evolution of content contribution by topic. The central results of this paper are : (i) for in active users, the probablilty that a given user will quit after having contributed k times is given by a power law distribution ~ 1/k (ii) contribtion to a particular topic is distributed log normally due to what is termed a "multiplicative reinforcement mechanis" whereby contributions to a topic increases it's popularity, which in turn increases the number of contributions. The authors use as case studies the peer interaction sites Wikkipedia, Digg, Ensemble and Bugzilla.

The results of this paper are dervied under reasonable assumptions in a clean manner, which leads to its readiblity and believeability as model. Particularly noteworthy is the modeling fact that the power law distribution be derived for in active users, who it turns out make up >50% of the contributor for all of the sites. This modeling simplification avoid higly nolinear and complex effect from superusers who as a group can drive the content and direction of the site in a way that is not representative of the larger population of site users. I also appreciated the explanation of the various sites invovled in the study. In fact, I learned an interesting fact about Digg.com -- one can only add positive diggs. It seems that this type of voting rule should be lest to manipulation, if we think about a desirable outcome as being reflected in teh agreegate utlitily of all the voter -- i.e. a person who really, really likes a given gets rewarded for digging the article multiple times in a way that is consistent with the goals of the voting rule and expected outcomes (this is an interesting aside!). The grouping of the plots in Fiigure 1 (a)-(c) is quite suggestive; in particular processes with similar exponents the exponents such as essembly and digg votes are grouped on the same plot. Is this to suggest some universality in human behavior when it comes to participating in a similar peer interaction such as similar voting on sites. This point is made very subtely and suggestively and I would have apprecaited more of a hypothesis as to why we see such a striking similarity, otherwise it leaves the reader thinking that something fishy is going on. The authors deliberate state that they avoid drawing sociological extraopolations from their results. I am personally dissatisfied with this leariness to speculate in light of the highly suggestive representation of the team's data. While on this point, I found it bizzare that the alpha for Wiki edits was higher than that for Digg submissions. This empirical realization seems to be at odds with the suggestion that alpha mentions how hard it is to contribute to the peer site. Clearly it is easier to submit a site for digging than to update a wikkipedia article. THis is an area where the authors could have been a bit more imaginative in their conclusion -- offering a reason for this inconsistency. One potential reason for this discrepancy could be a contributors willingness to participate being postively correlated with the social good that his/her particpation produces -- in this case contributing to a Wikkipedia article presents more of a social good than listing an article that one diggs -- after all the wikki article has a longer shelf-life and has research implications potentially. The log normal result for the topics was also a particularly satisfying result that makes intutitive sense -- initially many articles get edited, as time progresses the number of edits per article increases but the number of articles getting edited decreases, i.e. the community hones in on the important articles that are most beneficial to the community and hence more effort is expended to make these articles more accurate. This reasoning feeds into the social desirabilty explanation that I submit above for why the alpha for wikki edits is lower than that of digg submissions. THis can be a potential future area of research -- developping some metric for social benefit for a given peer production site and howthis factors into or incentives edits. Woudl the distribution again be log normal as a function of percieved social benefit? You tube may also be an intersting case study for the "multiplicative reinforcement mechanism" given that popular videos are highliged on the front of the website.

SUBHASH ARJA

The paper "The Collaborative Organization of Knowledge" seeks to study the online encyclopedia Wikipedia. Specifically, the study involves observing the usual time difference between when an article is referenced and when it is actually created. Also, the authors seek to find out the ration of complete to incomplete articles to test the "inflationary hypothesis". This kind of study is important to gauge the usefulness and feasibility of a system like Wikipedia. Because it depends on the pooling of knowledge by every contributor, there must be some bound on whether creating articles in turn leads to creating an unbounded number of undefined articles. One result that was very interesting was the finding that the ratio of incomplete articles to complete articles started close to 3 and has reached an almost steady-state value of close to 1. This is a testament to great increase of participation in Wikipedia by the general public. Another result from the paper that is very important and supports the Wikipedia concept is that new articles are more likely to be written by someone who is not the author of the original article referencing it. This shows that there is no one person that is an overwhelming contributer but that Wikipedia truly is a collaborative effort.

The second paper, "Strong Regularities in Online Peer Production", shows that in submission and user-edited sites, like Digg and Wikipedia, a user is more likely to contribute if he has been contributing regularly. Also, the study finds that the level of activity for a certain topic follows a log normal distribution. Both results are not very surprising considering the purpose of the type of websites being studied. The author states that since Digg submissions and Wikipedia edits have a higher barrier to entry than Digg and Essembly votes. While I agree with this finding, I think the barrier to entry study is incomplete. For instance, in Digg, many submissions are duplicated, and, if one of these ends up in the top 10, it is most likely because it was submitted by a very popular user, even in cases where he have been the third or fourth submitter of the same story. Thus, there is also a "popular user" barrier to entry. Also, stories from a certain news site or tech site tend be more likely to receive more Diggs. This could be an interesting study in addition to the one conducted in this paper as a complement to the presented results.

ALICE GAO

Paper A

This paper studies the process of Wikipedia growth. Specifically, it asks the question of whether the Wikipedia development is a sustainable process, and what triggers the creation of new articles on Wikipedia. The results are that Wikipedia¡¯s development is a sustainable process and references to non-existent articles trigger the eventually creation of a corresponding article.

I thought the idea of sustainable growth is pretty interesting. It characterizes growth trends at two extremes that both lead to bad consequences. Also, I had a general thought about using Wikipedia data for research purposes. Basically, we have a lot of data available that we can do analysis on. Also, I believe that we can easily write software programs to perform these analyses. Therefore, all we really need is to ask an interesting and valuable question. Having a good question that motivates the research would be the most crucial step in the research process in my opinion. I don¡¯t think the topic explored in this paper is very well motivated. For example, I can think of some more specific motivations in studying Wikipedia, such as how to make the information aggregation process in Wikipedia more effective, how to reduce vandalism of pages, and how to create incentives such that people will follow appropriate guidelines for contributing articles, and etc.

Paper B

First of all, I think one important contribution of this paper is find common regularities in different peer production systems. This means that the result is not specific to a particular peer production system, but reflects a general property of such system. Also, the idea of momentum associated with participation is an interesting idea. This result tells us the pattern of contributions made. We can use this as a starting to point to study the underlying reasons for these observations.

The author claims that this paper only serves as a starting point for the general study of peer production system. Indeed, I think future studies can take on different perspectives to study behaviours of contributors as well as their interactions in these kinds of systems. In particular, this reminds me of a study on the interactions of users in Wikipedia in terms of participants interacting in a social network.

MICHAEL AUBOURG

Paper A

What is for me one strength of wikipedia is its incentive to complete its content with new article. When links are red, it means that Wikipedia thinks a topic deserves an article, but cannot currently provide an article with an adequate content. For instance, if I go to the Eiffel tower webpage, on Wikipedia : http://en.wikipedia.org/wiki/Eiffel_Tower There is a red link for the "Avenue de Suffren", and when we click on that link, there is a pre-existing page http://en.wikipedia.org/wiki/Avenue_de_Suffren with the immediate possibility to write the article. "Start the Avenue de Suffren article "

Another remarkable point is that Wikipedia is powerful thanks to the genuine desire people have to share, and to make wikipedia progress : Indeed the subsequent definition oof an article in Wikipedia

appear to be a collaborative phenomenon at the rate of 97%.

What are the disadvantage of Wikipedia ? The fact that people who write or modify articles are hidden behind Wiki nickname is great. However, in means that everybody is allowed to modify every pages, even when a certain person shouldn't try to modify a page about her/himself.

This is the case with the Wikiscanner : This is a relatively new site that will track the edits made on Wikipedia. The purpose of this service is to see who’s behind edits made, and how these actions generally lend themselves towards the self-interested corporations hoping to promote and protect brand identities, which is a shame. Created by a student, WikiScanner searches the entirety of the XML-based records in Wikipedia and cross-references them with public and private IP and domain information to see who is behind the edits made on the online encyclopedia. This is an awesome idea. Personal anonymity is preserved, but now we can reveal institutions identity. With WikiScanner, there are a few levels on which you can search for info, including organization name, exact Wikipedia URL, or IP address, among others.

This student found that a good portion of edits for company entries are being made by the companies themselves. This isn’t really surprising and it was expected. The team behind Wikipedia is aware of it, and has been working to deal with issues such as this. Wikipedia’s policies have changed since it’s onset, and the user-generated system has been improved as a result.

For this reason, I think anonimity should be preserved at the individual scale, but we should reveal some informations about the person who edit/modify an article such has his/her interests like politics, companies... Let's not forget one of the five founding rules of Wikipedia : "Wikipedia has a neutral point of view, which means we strive for articles that advocate no single point of view. Sometimes this requires representing multiple points of view, presenting each point of view accurately, providing context for any given point of view, and presenting no one point of view as "the truth" or "the best view." "

VICTOR CHAN

The two papers, The Collaborative Organization of Knowledge and Strong Regularities in Online Peer Production, by Spinellis&Louridas and Wilkinson respectively dealt with the subject of online peer production systems. The first paper deals

specifically with Wikipedia.org while the second paper touches on Wikipedia.org, Digg.com, Buggzilla.com and Essembly.com.

These websites all contain user generated content and is good for the analysis of user participation and content growth.

The main contribution of Spinellis&Louridas' paper is the evaluation of wikipedia and how new content is created in the system. The results the paper present shows that the growth of wikipedia scales fairly well, even though the number of undefined links outnumbers the number of defined pages. Wikipedia's development lies between the extremes of link inflation and deflation, which can be attributed to the fact that undefined links drive the creation of new pages. It is shown in the results that the majority of links are defined within one month of their first reference.

The second paper discuss the four websites and presents results on the participation of users and the contributions to a given topic in these peer production systems. The main results show that user participation levels follow a power law and it is suggested that a small group of users contribute the majority o f the content. The paper then elaborates on the momentum that a user will have to participate in the create of content. It is shown that the probability of quitting is inversely proportional to the number of previous contribution the user has already had. The next main point the paper touches on is that the distribution of contributions to a topic is lognormal. Based on this, the paper suggests that a small number of popular topics makes up the majority of contributions.

The ideas in the second paper seem interesting, because it is suggesting that a small amount of users and a small number of popular topics seems to drive the growth of these online peer systems. However, the overall system still grows fairly quickly as shown in the first paper. I am curious how the size of theses heavy users and popular topics grow as the size of the entire system grows.

Another interesting point is that in this social computing platform, we once again have a small number of "experts" that make the most difference, as we saw in the prediction markets. Social computing was defined as deriving the intelligence of a whole group of people, rather than an expert, however these paper suggest that the whole group should rather be defined as a group of "experts". Only users that have information are useful to these peer production systems or prediction markets.

XIAOLU YU

This first paper presents an empirical study of regularities in four online peer production systems.

The paper first showed that the distribution of number of participations per user is strongly right skewed and well described by a power law. The heavy right skew means that a small fraction of very active participants are responsible for the large majority of the activity, an unfortunate reality for recommender systems attempting to provide accuracy for all users.

Secondly, the authors showed that user activity levels in all the systems can be represented as lognormal due to a reinforcement mechanism where more contributions lead to higher popularity. This explains the famous 80%-20% rule, that a few popular topics dominate all the activities of the whole system even though various systems have different variations.

I believe this heavy tail form is almost ubiquitous in most forms of online activity, including peer production, online discussion, rating and commenting, among others. In other words, the power law was very typical and representative in online collaborative systems as well as peer systems.

In the second paper, the authors started with two hypotheses: when Wikipedia is expanding, Wikipedia will be become less useful as more and more of the terms in the average article are not covered; during its growth, Wikipedia's growth will slow or stop as the number of links to uncreated articles approaches zero. In the first case, Wikipedia's coverage will decrease as it will contain articles drowned in an increasing number of undefined concepts. In the second case, Wikipedia's growth may stop. It shows that Wikipedia grows at a speed between the two extremes.

The authors examined a snapshot of the Wikipedia corpus, 485 GB of data, adding up to 1.9 million pages and 28.2 million revisions. They analyze the relationship between references to non-existent articles and the creation of new articles. Their experiments showed that the ratio of non-existent articles to defined articles in Wikipedia is stable over time. In addition, the authors discovered that missing links is what drives Wikipedia growth. They found that new articles are contributed by users in a collaborative fashion: users often add new entries when they find a missing links. It also showed that the connection between missing links in existent articles and new articles is a collaborative one, and that adding missing links in existent articles actually spurs others to create new articles. The study also showed that new articles were created within the first month that they were always referenced in another article. I believe that this way of growth (called preferential attachment in the paper), may be able to explain some collaborative systems in some areas.

BRETT HARISSON

Both these papers offer empirical studies of online collaboration and peer-editing systems, including such internet giants as Wikipedia and Digg.

The power law result (describing the distribution of contributions among users) is an intuitive yet interesting result. It directly implies that an online-collaboration system with a lower "alpha" will solicit more contributions and hence more traffic (and probably, for systems such as Wikipedia, more accurate information). Digg, for example, provides an extremely easy way for users to "Digg" stories, i.e. they just have to click on a single button. This suggests that one of the most important components of such systems are their user interfaces, and that much time and thought should be taken to design UIs that are as intuitive and easy to use as possible.

I also am curious about the results regarding the correlation between number of links to non-existing pages and the creation of new pages. While there is certainly empirical evidence to back up this claim, what will happen in the following scenario: if Wikipedia mandates that every new article contain at least 10 out going links (replace 10 with any number if you wish), will that stimulate further growth? Or what if I start making articles in which every other word is an outgoing link to a non-existing page... will that stimulate the creation of pages? I am curious about the direction of causality with this claim.

ANGELA YING

Paper A

This paper discussed the similarities between different online peer production services, including digg, wikipedia, Essembly, and Bugzilla, all of whom are websites where users register and are able to share information, either by editing existing articles or by posting articles to the site. This paper had some interesting results about the probability of a user stopping contributions inversely proportional to the number of contributions made, and found constants for all 4 websites. It also derived a formula for the number of people who made contributions greater than a certain k. In addition, an important result found was that the value of alpha, the constant, is related to the amount of effort required in contributing - for example, editing a wikipedia article or submitting a digg article has similar alphas that are larger than alphas for digg'ing an article or voting on essembly. Finally, the paper found that the lognormal parameters for contributions per topic will vary linearly over time.

I thought that this was an interesting paper because it actually looked at data from 4 fairly popular websites that cater to different audiences and have different types of people making contributions. An idea for future work would be to look at the beginning of these websites and analyze how the probabilities derived from this paper were different when the websites were not so popular. Perhaps even a year by year analysis could work.

Paper B

This paper explored Wikipedia and the effectiveness of peer revision. In particular, it examined how the number of articles grow and reference non-existent articles. Originally, the author was concerned about the possibility that the number of non-existent articles referenced would grow at a rate great than the number of articles created, which would make Wikipedia's growth slow and increase the proportion of stubs. However, it examined two models and looked at real data and concluded that the ratio of non-existent articles to created articles would remain about the same, creating sustainable growth. A particularly interesting result found that a number of articles created on Wikipedia were actually created because of the non-existent references or the stubs.

I thought that this was an interesting study, although I would be curious to learn more about the models themselves. The author briefly explained how they worked but did not really describe their context, and whether they were formulated according to a real example or simply just out of theory. A possible extension would be to look at websites other than Wikipedia (perhaps smaller wikis or Wikipedia extensions focused on certain topics) and see if they follow similar trends.

RORY KULZ

Paper A

This paper is pretty straightforward. It's basically to me one of

those papers where there's a very intuitive result (e.g., everything

follows a power law, and the harder it is for people to participate,

the higher the rate of participation dropoff), but you know, someone

had to go out in the world and prove / explain it. In other words,

this paper would have been more interesting if something very

unexpected happened.

The most engaging part of this paper is definitely the explanation of

multiplicative reinforcement and seeing how the stochastic formulas

fit the data, especially Digg with the discount factor. As for the

other results, I was a little surprised about the dropoff for

Wikipedia edits, and I was wondering if the data set had incorporated

unregistered users, which wasn't clear to me (they just mention a

"user ID"). It at least seems like "barrier to participation" might

not be the most natural explanatory concept for Wikipedia, since the

barrier is actually the lowest (anyone visiting can click "edit" --

not any visitor can, e.g., "digg" up a story), while the desire of

most visitors to participate might be the lowest (most people come

seeking information and may only edit if they see a blatant error).

I'm not sure.

Paper B

I'm sorry, is this paper for real? "We hypothesize that the addition

of new Wikipedia articles is not a purely random process following the

whims of its contributors but that references to nonexistent articles

trigger the eventual creation of a corresponding article." The shock!

When there is a demand for an article that has not been written, the

people who frequently contribute to Wikipedia will eventually fill

that demand! (It should be noted that this article does not also

consider the case of links being removed to articles that editors have

deemed not appropriate to write (following the policies at

http://en.wikipedia.org/wiki/Wikipedia:What_Wikipedia_is_not), which

should also be viewed as a definitive statement on its status in the

way the addition of an entry is.)

Okay, so there are some nice graphs, Figures 1 and 2b being the main

ones, and it is a nice result that the growth of Wikipedia appears

sustainable, but seriously, regarding the growth mechanism, would it

actually be remotely plausible for any other situation? Independent

pockets working on their own articles and then gradually linking

together? Anyone who has spent time on Wikipedia can see this isn't,

at least in recent times, the case; the most interesting thing, I

think, to consider is why this might be a function of a large, diverse

user base, comparing for example circa-2001 Wikipedia's growth with

circa-2008 Wikipedia's growth and seeing if distinctions can be drawn.

The authors touch on the importance of the user base very briefly in

the conclusion -- "...the scalability of the endeavor is limited not

by the capacity of individual contributors but by the total size of

the contributor pool" -- but come on, what's the deal with Figure 1?

They say the coverage ratio is basically stable after 2003, but they

don't look into why it seems substantially different from 2001-2003.

This I think is a missed opportunity.

HAOQI ZHANG

Paper A

The main contributions of this paper is in its systematic analysis of wikipedia's growth over time and the reasons behind its growth (and kind of growth). In particular, the authors show empirically that links to nonexistent articles seem to contribute to the site's growth (by these pages being filled in, often by others), and furthermore, that the nonexistent links are growing at a rate that is neither much faster nor much slower than content is being contributed, showing signs of sustainable growth. What I found most interesting about the paper is the discussion of system designs that aid in the site's growth. For example, I found that a watchlist for users to be alerted of changes to articles they are interested in to be significant for regulation and growth. As another example, I found that the style guidelines leading to splitting of overly long articles into shorter ones to be interesting in terms of how the growth forms.

I found the empirical analysis to be interesting and convincing. However, while the development has followed these patterns so far, I can imagine that this will begin to change as more content is added and the population of users who are able and willing to contribute to shrink as the basic topics are already covered. I think an interesting direction to go and extend the current work is to introduce a model that captures the affect of various system designs to predict how we may wish to modify various aspects of wikipedia to facilitate long term growth.

Paper B

The main contribution of this paper is in quantitatively classifying some commonalities among peer production systems, that (1) user participation in all tested systems are described by power law, where few active users are contributing a large amount of the content and (2) a few popular topics dominant most of the activity on a site. In describing power laws, the authors show that different barriers for entry (e.g., the amount of effort required to participate) leads to different power law constants, that is, it has a direct effect on the amount of participation. This is not surprising, but nevertheless significant in thinking about how few users are encouraged to do most of the contributing.

It is not clear to me where this paper leaves us. One interesting question is how we can encourage participation from all participants while guaranteeing the quality / level of dedication required. For wikipedia in particular, how do we get users to contribute high quality content? Part of the answer relies on how we believe knowledge is distributed, that is, is there knowledge that is not being entered into the system that could be otherwise. What motivates the heavy contributors to contribute?

NIKHIL SRIVASTAVA

The Spinellis and Louridas provide a strong argument for the sustainability of Wikipedia's growth by examining the addition of new articles and the links that connect them to the existing network, both with empirical evidence and in comparison to a scale-free graph model. I found the paper to be intellectually interesting, but I think its focus straddled two even more interesting topics. First, what can examining the growth of Wikipedia tell us about the people who contribute to it and their incentives for doing so? (I think we'll see more of this later in the course). Second, how can this information tell us something interesting about *knowledge* itself, or about the relationships between sets of information on the internet. (I plan to do my project proposal on this idea).

The Wilkinson paper begins to address the first issue, but is framed still in terms of understanding the dynamics of the peer production system instead of the users behind it. I imagine the idea is to optimize the quality and quantity of information aggregation, but I still think there are interesting ideas to be discussed in the other issues I mentioned. Specifically, the Wilkinson paper shows that a wide range of peer production systems display a power law distribution of user contribution and a lognormal distribution of topic activity.

AVNER MAY

I found these articles rather interesting, particularly the one about Wikipedia’s growth. I thought that the model they proposed for Wikipedia’s growth, in which references to non-existent articles spur the creation of new articles, is very enlightening, and not immediately obvious. It was nice to see that the Wikipedia’s growth falls into neither the “inflationary” nor “deflationary” hypotheses, and thus that the growth appears sustainable. Thinking of all human knowledge as a graph, and imagining Wikipedia slowly but surely covering more and more of this graph in a breadth-first-search manner, is an interesting model. Maybe this breadth first search is one with multiple start nodes, or with start nodes appearing randomly at every time step (a start node appearing randomly would be equivalent to a random generation of a new article, as opposed to an article being created due to the fact that it was already referenced). This is kind of like imagining a candle being melted by multiple lit wicks, and a new wick being lit occasionally. I would be interested in studying how quickly a graph would be covered in this fashion (fraction covered vs. time), and how this depended on the topology of the graph. What is a good model for the topology of the graph of all human knowledge? Can the growth rates of Wikipedia, together with the insights from this article, be used to study this question? In this article, they looked at Wikipedia’s growth as the creation of a graph, as opposed to a traversal of a graph. I would be interested in modeling what Wikipedia does as a traversal of a graph by a large group of contributors. With regard to the second article, I thought it was interesting how “the probability a person stops contributing varies inversely with the number of contributions he has made,” and the implications this has about how much of the content is provided by the x% most active users.

MALVIKA RAO

Paper B

Strong Regularities in Online Peer Production: I did not find the findings

of this paper to be surprising. It is reasonable to see that in these

systems a very few active users account for most of the contributions, a

few visible popular topics dominate the total activity, and that the

probability of quitting is inversely proportional to the number of

previous contributions. Good to know that statistical models such as power

law, lognormal distribution, etc. can fairly accurately describe these

phenomena.

SAGAR MEHTA

Paper A

This paper provided empirical results on how Wikipedia grows, which the authors argue is primarily through undefined references. I found the findings to be rather intuitive. A contributor to Wikipedia seems less likely to start a new article from scratch (which may have a higher barrier since it requires more work) than editing an existing stub. Furthermore, it's more likely that he or she will stumble upon an article stub that is linked to in another article of interest to him or her. I think this paper could have given further insight into how Wikipedia grows if it had focused not only on which pages were being edited/completed, but also on who was editing them. Are there a few contributors fueling the growth of new articles or many? Does this matter in deciding how accurate the information is in Wikipedia? Is a newer article on average less accurate than an older one? Theoretically, this should be true as more revisions could lead to better results/information aggregation. However, in some cases one could argue that in the presence of too many sources, Wikipedia becomes a tool to convey the consensus opinion on a matter rather than the "actual" truth.

Paper B

I found it pretty interesting that online peer production systems share several general results with regard to how people contribute. One thing I wondered about in the multiplicative reinforcement section was their assumption that for Wikipedia we do not need a discount factor to describe dN(t), the number of contributions to a given topic made between time t and t + dt, but we do need one for Digg to account for the "decay in novelty of news stories over time". Votes to a story in Digg surely should be discounted over time because of this, but shouldn't edits on Wikipedia also be discounted to account for the idea that knowledge on a particular topic is complete? So, for Wikipedia, we don't necessarily need a discount factor to account for "decay in novelty", but I would argue we do need one to account for the idea that information becomes more complete as time progresses.

HAO-YUH SU

Paper A

This paper investigates the growth of Wikipedia. It claims that Wikipedia has sustainable growth by offering the following two evidences: 1. the growth of unresolved references 2. references lead to definitions. Moreover, the paper builds up a scale-free network to describe the process of adding references and entries. Things are pretty clear when the first part of argument is laid out. However, when it comes to the second part, the scale-free network, I am a little confused. First, I'd like to know the "connectivity" the authors mentioned about when deriving the network model. How is the connectivity of the entry defined in Wikipedia? Second, I am confused about the formula of: {k}=rP(k). Doesn't k represent for connectivity?

Why is it used to denote the expected number of added references? Third, in the model, when the authors say r references are added to a given entry, are they talking about incoming references or outgoing references or both? My last question is about some argument in the conclusion. In the conclusion, the authors claim they found that new articles are typically written by different authors from the ones behind the references to them, and further draw a conclusion that the scalability of the endeavor is limited by the size of the contributor pool. This is a remarkable statement. However, I think the authors should provide quantitative data to support this important argument. It will be rarely exciting if we can see the exact probability of such a phenomenon in Wikipedia.

Paper B

This paper probes the mechanism inside of the four big online peer productions: Wikipedia, Digg, Bugzillan and Essembly. In the beginning, it investigates the user participation in the four

systems. Then, it examines the number of contributions per story. My first question is about the data in Table 1. In the table, it shows that Wikipedia have 1.5 M topics. However, in the paper, the authors say Wikipedia has over 9 million articles. Is the difference from difference time frame

or from different definitions of "topics" and "articles?" Second, when the authors are trying to interpret the data in Section 3, they say: when the required effort to contribute is higher, a larger value of alpha is expected. I agree that voting in Digg and Essembly, productions with smaller value of alpha, can be quickly done. However, when it comes to the Digg submissions and Wikipedia edits, the two with highest alpha values in order, the statement, in my opinion, cannot be easily made. The data shows Digg submission has an alpha value of 2.4, while the one of Wikipedia edit

is 2.28. After a thorough observation of these two website, I find that Wikipedia edit seems to need more effort than Digg submission. Therefore, I think besides the required effort of contribution, there must be some other factors affecting the value of alpha. My second question is about the heavy-tail property of the lognormal distribution. Although the authors have given some explanation on this point, I'm still unclear about why this property would lead to the difficulties in predicting popularity in the peer production.

ZHENMING LIU

Both papers describe the fundamental behavior of an online production site Wikipedia from different aspects. In particular, Spinellis and Louridas’ paper analyzed the rate between complete and incomplete articles in wiki and tried to reason why wikipedia’s remarkable growth is sustainable. While on the other hand, Wilkinson collected data for users’ macro level behavior and attempted to model these behavior mathematically.

Many of the empirical results were already well known and the models in both papers are good (though not particularly impressive). In addition, like many recent papers attempt to analyze users’ behavior, these two papers also oversimplifies the user’s behavior. For example, in Wilkinson’s paper, defining inactive users as those who fail to contribute any articles for 3 or 6 months is ridiculous to me. Furthermore the time span of 3 months or 6 months in this definition sounds arbitrary to me.

Both papers emphasize the “predictability” in their models. I am not sure whether that means their regression model fits the existing model well or their models are trained and tested over two set of data. Perhaps in their context, these two definitions are equivalent. But “predictability” sometimes sounds misleading.

The tools (e.g., the design of crawlers, the link analysis software or the backend database) that allow the researchers to study data in this scale has never been described in details in papers of this type. Many traditional computer science papers tend to be precise on describing the setup of their experiments. While for these papers in the emerging areas, they tend to avoid detailed description of their methodologies while these methodologies are usually more complicated than experiments in more traditional computer science research. I would like to see more discussion on the data processing part because this is the starting point of doing any research in these type of systems.