Stage-oe-small.jpg

Inproceedings3834: Unterschied zwischen den Versionen

Aus Aifbportal
Wechseln zu:Navigation, Suche
(Die Seite wurde neu angelegt: „{{Publikation Erster Autor |ErsterAutorNachname=Hoppe |ErsterAutorVorname=Fabian }} {{Publikation Author |Rank=2 |Author=Tabea Tietz }} {{Publikation Author |R…“)
 
K
 
Zeile 33: Zeile 33:
 
|Month=Juni
 
|Month=Juni
 
|Booktitle=Proceedings of Workshop on Humanities in the Semantic Web co-located with ESWC 2020
 
|Booktitle=Proceedings of Workshop on Humanities in the Semantic Web co-located with ESWC 2020
 +
|Pages=15-20
 
|Organization=WHiSe Workshop
 
|Organization=WHiSe Workshop
 
|Publisher=CEUR
 
|Publisher=CEUR
 +
|Series=CEUR Workshop Proceedings
 +
|Volume=2695
 
}}
 
}}
 
{{Publikation Details
 
{{Publikation Details
 +
|Abstract=Document exploration in archives is often challenging due to the lack of organization in topic-based categories. Moreover, archival records only provide short text which is often insufficient for capturing the semantic. This paper proposes and explores a dataless categoriza- tion approach that utilizes word embeddings and TF-IDF to categorize archival documents. Additionally, it introduces a visual approach built on top of the word embeddings to enhance the exploration of data. Pre- liminary results suggest that current vector representations alone do not provide enough external knowledge to solve this task.
 +
|Download=paper2.pdf
 +
|Link=https://ceur-ws.org/Vol-2695/paper2.pdf
 
|Forschungsgruppe=Information Service Engineering
 
|Forschungsgruppe=Information Service Engineering
 
}}
 
}}

Aktuelle Version vom 17. November 2022, 08:49 Uhr


The Challenges of German Archival Document Categorization on Insufficient Labeled Data


The Challenges of German Archival Document Categorization on Insufficient Labeled Data



Published: 2020 Juni

Buchtitel: Proceedings of Workshop on Humanities in the Semantic Web co-located with ESWC 2020
Ausgabe: 2695
Reihe: CEUR Workshop Proceedings
Seiten: 15-20
Verlag: CEUR
Organisation: WHiSe Workshop

Referierte Veröffentlichung

BibTeX

Kurzfassung
Document exploration in archives is often challenging due to the lack of organization in topic-based categories. Moreover, archival records only provide short text which is often insufficient for capturing the semantic. This paper proposes and explores a dataless categoriza- tion approach that utilizes word embeddings and TF-IDF to categorize archival documents. Additionally, it introduces a visual approach built on top of the word embeddings to enhance the exploration of data. Pre- liminary results suggest that current vector representations alone do not provide enough external knowledge to solve this task.

Download: Media:paper2.pdf
Weitere Informationen unter: Link



Forschungsgruppe

Information Service Engineering


Forschungsgebiet