Techreport783: Unterschied zwischen den Versionen

Version vom 11. September 2009, 08:09 Uhr

Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis

Philipp Cimiano, Andreas Hotho, Steffen Staab

Published: 2004 November
Institution: Insitute AIFB, University of Karlsruhe
Archivierungsnummer:783

Kurzfassung
We present a novel approach to the automatic acquisition of taxonomies or concept hierarchies from a text corpus. The approach is based on Formal Concept Analysis (FCA), a method mainly used for the analysis of data, i.e. for investigating and processing explicitly given information. We follow Harris' distributional hypothesis and model the context of a certain term as a vector representing syntactic dependencies which are automatically acquired from the text corpus with a linguistic parser. On the basis of this context information, FCA produces a lattice that we convert into a special kind of partial order constituting a concept hierarchy. The approach is evaluated by comparing the resulting concept hierarchies with hand-crafted taxonomies for two domains: tourism and finance. We also directly compare our approach with hierarchical agglomerative clustering as well as with Bi-Section-KMeans as an instance of a divisive clustering algorithm. Furthermore, we investigate the impact of using different measures weighting the contribution of each attribute as well as of applying a particular smoothing technique to cope with data sparseness.

Download: Media:2004_783_Cimiano_Learning_Concep_1.pdf,Media:2004_783_Cimiano_Learning_Concep_1.ps

Projekt

Dot.Kom

Forschungsgebiet

Ontology Learning

@@ Zeile 2: / Zeile 2: @@
 |ErsterAutorNachname=Cimiano
 |ErsterAutorVorname=Philipp
+}}
+{{Publikation Author
+|Rank=2
+|Author=Andreas Hotho
 }}
 {{Publikation Author
 |Rank=3
 |Author=Steffen Staab
-}}
-{{Publikation Author
-|Rank=2
-|Author=Andreas Hotho
 }}
 {{Techreport