Whose article is it anyway? - Detecting authorship distribution in Wikipedia articles over time with WIKIGINI

Published: 2012 Juli

Buchtitel: Proceedings of the Wikipedia Academy 2012
Verlag: Online-Publikation
Erscheinungsort: Berlin
Organisation: Wikipedia Academy 2012, Wikimedia Deutschland

Referierte Veröffentlichung


In this work, we present a novel approach to detecting authorship of words in Wikipedia, which significantly outperforms the baseline method in terms of accuracy. This is achieved by reducing the necessary word-based text-to-text comparisons, which are the most fallible steps in the process. We moreover argue that the concentration of words to just a few authors can be an indicator for a lack of quality and/or neutrality in an article. To provide an aggregated measure of the concentration, we calculate a gini coefficient for each revision of an article based on our word-author-assignments. The coefficient development over time in an article is visualized and provided online as an easily accessible and useful tool to investigate how the content of an article evolved. We present examples where the gini curve gives useful insights into differences between articles and may help to spot crucial events in the past evolution of an article.

