Techreport783: Unterschied zwischen den Versionen
K (Added from ontology) |
K (Added from ontology) |
||
Zeile 2: | Zeile 2: | ||
|ErsterAutorNachname=Cimiano | |ErsterAutorNachname=Cimiano | ||
|ErsterAutorVorname=Philipp | |ErsterAutorVorname=Philipp | ||
+ | }} | ||
+ | {{Publikation Author | ||
+ | |Rank=2 | ||
+ | |Author=Andreas Hotho | ||
}} | }} | ||
{{Publikation Author | {{Publikation Author | ||
|Rank=3 | |Rank=3 | ||
|Author=Steffen Staab | |Author=Steffen Staab | ||
− | |||
− | |||
− | |||
− | |||
}} | }} | ||
{{Techreport | {{Techreport |
Version vom 11. September 2009, 08:09 Uhr
Published: 2004
November
Institution: Insitute AIFB, University of Karlsruhe
Archivierungsnummer:783
Kurzfassung
We present a novel approach to the automatic acquisition of taxonomies
or concept hierarchies from a text corpus. The approach is based on
Formal Concept Analysis (FCA), a method mainly used for the analysis of data,
i.e. for investigating and processing explicitly given information.
We follow Harris' distributional hypothesis and model the context
of a certain term as a vector representing syntactic dependencies
which are automatically acquired from the text corpus with a linguistic parser.
On the basis of this context information, FCA produces a lattice
that we convert into a special kind of partial order constituting
a concept hierarchy.
The approach is evaluated by comparing the resulting concept hierarchies
with hand-crafted taxonomies for two domains: tourism and finance.
We also directly compare our approach with hierarchical agglomerative
clustering as well as with Bi-Section-KMeans as an instance of a divisive clustering
algorithm. Furthermore, we investigate the impact of using different
measures weighting the contribution of each attribute as well as of applying
a particular smoothing technique to cope with data sparseness.
Download: Media:2004_783_Cimiano_Learning_Concep_1.pdf,Media:2004_783_Cimiano_Learning_Concep_1.ps