Stage-oe-small.jpg

Inproceedings3581: Unterschied zwischen den Versionen

Aus Aifbportal
Wechseln zu:Navigation, Suche
(Die Seite wurde neu angelegt: „{{Publikation Erster Autor |ErsterAutorNachname=Flöck |ErsterAutorVorname=Fabian }} {{Publikation Author |Rank=2 |Author=Kenan Erdogan }} {{Publikation Author |R…“)
 
 
(2 dazwischenliegende Versionen desselben Benutzers werden nicht angezeigt)
Zeile 12: Zeile 12:
 
}}
 
}}
 
{{Inproceedings
 
{{Inproceedings
|Referiert=False
+
|Referiert=True
 
|Title=TokTrack: A Complete Token Provenance and Change Tracking Dataset for the English Wikipedia
 
|Title=TokTrack: A Complete Token Provenance and Change Tracking Dataset for the English Wikipedia
 
|Year=2017
 
|Year=2017
 
|Month=Mai
 
|Month=Mai
|Booktitle=TokTrack: A Complete Token Provenance and Change Tracking Dataset for the English Wikipedia
+
|Booktitle=Proceedings of the Eleventh International Conference on Web and Social Media
|Pages=408--417
+
|Pages=408-417
 
|Organization=International Conference on Web and Social Media (ICWSM)
 
|Organization=International Conference on Web and Social Media (ICWSM)
 
|Publisher=AAAI Press
 
|Publisher=AAAI Press
Zeile 44: Zeile 44:
 
of editors like partial reverts and re-additions and other
 
of editors like partial reverts and re-additions and other
 
metrics, in the process gaining several novel insights.
 
metrics, in the process gaining several novel insights.
 +
|Link=https://arxiv.org/abs/1703.08244
 +
|DOI Name=10.5281/zenodo.789289
 
|Forschungsgruppe=Web Science
 
|Forschungsgruppe=Web Science
 
}}
 
}}

Aktuelle Version vom 30. April 2018, 16:44 Uhr


TokTrack: A Complete Token Provenance and Change Tracking Dataset for the English Wikipedia


TokTrack: A Complete Token Provenance and Change Tracking Dataset for the English Wikipedia



Published: 2017 Mai

Buchtitel: Proceedings of the Eleventh International Conference on Web and Social Media
Seiten: 408-417
Verlag: AAAI Press
Organisation: International Conference on Web and Social Media (ICWSM)

Referierte Veröffentlichung

BibTeX

Kurzfassung
We present a dataset that contains every instance of all tokens (≈ words) ever written in undeleted, non-redirect English Wikipedia articles until October 2016, in total 13, 545, 349, 787 instances. Each token is annotated with (i) the article revision it was originally created in, and (ii) lists with all the revisions in which the token was ever deleted and (potentially) re-added and re-deleted from its article, enabling a complete and straightforward tracking of its history. This data would be exceedingly hard to create by an average potential user as it is (i) very expensive to compute and as (ii) accurately tracking the history of each token in revisioned documents is a non-trivial task. Adapting a state-of-the-art algorithm, we have produced a dataset that allows for a range of analyses and metrics, already popular in research and going beyond, to be generated on complete-Wikipedia scale; ensuring quality and allowing researchers to forego expensive textcomparison computation, which so far has hindered scalable usage. We show how this data enables, on token-level, computation of provenance, measuring survival of content over time, very detailed conflict metrics, and fine-grained interactions of editors like partial reverts and re-additions and other metrics, in the process gaining several novel insights.

Weitere Informationen unter: Link
DOI Link: 10.5281/zenodo.789289



Forschungsgruppe

Web Science


Forschungsgebiet