Techreport783: Unterschied zwischen den Versionen
K (Added from ontology) |
K (Added from ontology) |
||
Zeile 1: | Zeile 1: | ||
− | {{Publikation | + | {{Publikation Erster Autor |
− | | | + | |ErsterAutorNachname=Cimiano |
− | | | + | |ErsterAutorVorname=Philipp |
}} | }} | ||
{{Publikation Author | {{Publikation Author |
Version vom 8. September 2009, 10:03 Uhr
Published: 2004
November
Institution: Insitute AIFB, University of Karlsruhe
Archivierungsnummer:783
Kurzfassung
We present a novel approach to the automatic acquisition of taxonomies
or concept hierarchies from a text corpus. The approach is based on
Formal Concept Analysis (FCA), a method mainly used for the analysis of data,
i.e. for investigating and processing explicitly given information.
We follow Harris' distributional hypothesis and model the context
of a certain term as a vector representing syntactic dependencies
which are automatically acquired from the text corpus with a linguistic parser.
On the basis of this context information, FCA produces a lattice
that we convert into a special kind of partial order constituting
a concept hierarchy.
The approach is evaluated by comparing the resulting concept hierarchies
with hand-crafted taxonomies for two domains: tourism and finance.
We also directly compare our approach with hierarchical agglomerative
clustering as well as with Bi-Section-KMeans as an instance of a divisive clustering
algorithm. Furthermore, we investigate the impact of using different
measures weighting the contribution of each attribute as well as of applying
a particular smoothing technique to cope with data sparseness.
Download: Media:2004_783_Cimiano_Learning_Concep_1.pdf,Media:2004_783_Cimiano_Learning_Concep_1.ps