Focus: Wikipedia Articles Separate into Four Categories

Physics 9, 8
A study of the entire editing history of English Wikipedia shows that the articles cluster into four categories based on how frequently and how aggressively they are edited.
Wikimedia Foundation, CC BY-SA 3.0, via Wikimedia Commons
All the answers. Wikipedia is edited by volunteers worldwide but still has a surprising order, according to an analysis of its edit history.

Wikipedia allows anyone to contribute to its millions of articles and doesn't exert any central control, yet striking order has emerged, according to an analysis of the entire editing history of the English portion of the website. Researchers found that articles fall into four main categories based on the way they are edited and that a relatively small number of editors have a major influence on the site.

Researchers have previously studied Wikipedia as a social network, looking at phenomena such as “edit wars,” where strong differences of opinion lead to skirmishes of back-and-forth revisions between disagreeing individuals. Now a team led by Jinhyuk Yun of the Korea Advanced Institute of Science and Technology in Daejeon, South Korea, has focused on a more general question—does the full history of Wikipedia growth show any general patterns or regularities, either in the structure of articles, or the behavior of editors?

To find out, Yun and colleagues examined the data for the entire edit history of English Wikipedia, including more than 5 million articles, millions of “talk” pages, and 587 million editing events. The length of a typical article increases with age, as do the numbers of edits and editors, so the researchers decided to rescale the data by age—essentially dividing each number by the age of the article—to allow fair comparisons.

The team found that Wikipedia articles fall into four distinct groups based on two independent classifications—edit frequency and length of each editor's contributions. For edit frequency, one group is edited roughly twice as often as the other. For the second classification, editor contribution length, one group of articles had the typical editor contributing roughly 30 times as many words as editors in the other group, even for articles with the same total length. Most articles fell into one of the four combinations of these attributes. “The categories are clearly distinguished by the editor-editor or editor-article relationships,” says Yun, “but this only becomes clear when the data is re-scaled by article age.”

To understand these patterns, the researchers built a model involving a large, random network of editors who can interact with their neighbors and who start with a randomly assigned opinion of an article topic (represented by a number between zero and one). Editors are more likely to edit the article if their opinion on the topic differs significantly from the average opinion of others. Editors’ opinions also evolve as they meet and share views with others, and the model’s interactions are arranged to make the editors’ opinions more similar over time.

The model was able to simulate the growth of an artificial Wikipedia. The team found that their version also showed the same four distinct groups, but only if two key parameters, q and p, took the right values. Roughly speaking, q reflects how likely it is that an editor will choose to edit an article if his or her opinion on the topic differs from that of Wikipedia. p reflects a more subtle property linked to the general level of trust that editors have toward Wikipedia, relative to other media sources. As editors interact with one another and their opinions shift, higher p makes opinions move more quickly toward those expressed by Wikipedia.

Yun and colleagues argue that the real-world values of these parameters can be estimated independently by using other data showing how frequently people edit Wikipedia and how frequently web users refer to Wikipedia relative to other sources. The estimates from such data, they found, are just those that give their Wikipedia growth model a close fit to the entire Wikipedia edit history.

“The model successfully explains their empirical observations,” says Taha Yasseri of the Oxford Internet Institute in the UK. More importantly, he suggests, the model also makes an implicit prediction about future trends in Wikipedia. The four distinct categories found point to a persisting inequality of influence—with a small number of super-editors controlling the form of many articles. The model results, says Yasseri, imply that editing inequality is increasing with time, with fewer editors gaining an ever more dominant role.

Yun emphasizes a similar message: “There are already reports that the growth of Wikipedia is slowing down,” he says, “and our observation indicates that this will continue unless something is done about it.” He suggests the encyclopedia needs to recruit more new participants to sustain rich, collaborative environments and to avoid the monopolization of content by a few people.

This research is published in Physical Review E.

–Mark Buchanan

Mark Buchanan is a freelance science writer based in Normandy, France.


Subject Areas

Interdisciplinary PhysicsComplex Systems

Related Articles

Synopsis: Diversity Breeds Conformity
Complex Systems

Synopsis: Diversity Breeds Conformity

Coupled oscillators in certain networks will—counterintuitively—only move in tandem if each oscillator is different. Read More »

Viewpoint: How Stereotypes Impact Women in Physics
Interdisciplinary Physics

Viewpoint: How Stereotypes Impact Women in Physics

Two studies by social scientists have discovered evidence of both subtle and blatant stereotyping of women in physics laboratories. Read More »

Focus: How to Compare Books or Genomes
Complex Systems

Focus: How to Compare Books or Genomes

A mathematical technique for comparing large symbol sets suggests that less frequently used words are mainly responsible for the evolution of the English language over the past two centuries. Read More »

More Articles