HistorEx: Exploring Historical Text Corpora Using Word and Document Embeddings

Sven Müller, Michael Brunzel, Daniela Kaun, Russa Biswas, Maria Koutraki, Tabea Tietz, Harald Sack

Published: 2019 Juni
Herausgeber: Hitzler P. et al. (eds)
Buchtitel: The Semantic Web: ESWC 2019 Satellite Events. ESWC 2019. Lecture Notes in Computer Science
Ausgabe: 11762
Seiten: 302
Verlag: Springer
Erscheinungsort: Cham

Referierte Veröffentlichung

BibTeX

Kurzfassung
Written text can be understood as a means to acquire in-sights into the nature of past and present cultures and societies. Numer-ous projects have been devoted to digitizing and publishing historicaltextual documents in digital libraries which scientists can utilize as valu-able resources for research. However, the extent of textual data availableexceeds humans’ abilities to explore the data efficiently. In this paper, aframework is presented which combines unsupervised machine learningtechniques and natural language processing on the example of histor-ical text documents on the 19thcentury of the USA. Named entitiesare extracted from semi-structured text, which is enriched with com-plementary information from Wikidata. Word embeddings are leveragedto enable further analysis of the text corpus, which is visualized in aweb-based application.

ISBN: 978-3-030-32326-4
Download: Media:2019-ESWC-D-HistorEx-Exploring-Historical-Text-Corpora.pdf
Weitere Informationen unter: Link
DOI Link: https://doi.org/10.1007/978-3-030-32327-1_27

Forschungsgruppe

Information Service Engineering

Forschungsgebiet