Stage-oe-small.jpg

Instance Matching for Heterogeneous Structured Data: Unterschied zwischen den Versionen

Aus Aifbportal
Wechseln zu:Navigation, Suche
(Die Seite wurde neu angelegt: „{{Veranstaltung |Titel DE=Instance Matching for Heterogeneous Structured Data |Titel EN=Instance Matching for Heterogeneous Structured Data |Beschreibung DE=fol…“)
 
 
(2 dazwischenliegende Versionen desselben Benutzers werden nicht angezeigt)
Zeile 1: Zeile 1:
 
{{Veranstaltung
 
{{Veranstaltung
|Titel DE=Instance Matching for Heterogeneous Structured Data  
+
|Titel DE=Instance Matching for Heterogeneous Structured Data
|Titel EN=Instance Matching for Heterogeneous Structured Data  
+
|Titel EN=Instance Matching for Heterogeneous Structured Data
|Beschreibung DE=folgt
+
|Beschreibung DE=Structured data is abundantly available in enterprises and also largely increasing in the Web setting.
|Beschreibung EN=folgt
+
Generally speaking, it can be conceived as structured descriptions of real-world entities. One main problem towards the effective usage of structured data is instance matching, where the goal is to find instance
 +
representations referring to the same real-world thing. However, the structured data on the Web is heteroge-neous, e.g. type information of instances is missing or too general to be useful. Besides, the challenges that lie ahead for typical instance matching approaches also include dealing with the low-quality data and high computation complexity.
 +
We tackle these challenges in different steps of the instance matching process. The first step is typification, in which the type semantics is derived by an unsupervised approach. The second step, blocking, aims to reduce the quadratic complexity of the instance matching process through the efficient and effective generation of match candidates. We propose an unsupervised approach to learn the most representative attributes of instances called keys, based on which two instances are considered as a match candidate if they share the same value of the key. The third step classification aims to deal with the low quality of data, for which we propose an almost-parameter-free approach for learning instance-matching rules to classify candidate instance pairs into matches and non-matches. In the last filtering step, we propose a parameter-free solution that leverages only simple Boolean functions and exploits fine-grained word-level dissimilarity evidences to further filter out the non-matches. We evaluate our approaches against the latest baselines. The results show advances beyond the state-of-the-art.
 +
|Beschreibung EN=Structured data is abundantly available in enterprises and also largely increasing in the Web setting.
 +
Generally speaking, it can be conceived as structured descriptions of real-world entities. One main problem towards the effective usage of structured data is instance matching, where the goal is to find instance
 +
representations referring to the same real-world thing. However, the structured data on the Web is heteroge-neous, e.g. type information of instances is missing or too general to be useful. Besides, the challenges that lie ahead for typical instance matching approaches also include dealing with the low-quality data and high computation complexity.
 +
We tackle these challenges in different steps of the instance matching process. The first step is typification, in which the type semantics is derived by an unsupervised approach. The second step, blocking, aims to reduce the quadratic complexity of the instance matching process through the efficient and effective generation of match candidates. We propose an unsupervised approach to learn the most representative attributes of instances called keys, based on which two instances are considered as a match candidate if they share the same value of the key. The third step classification aims to deal with the low quality of data, for which we propose an almost-parameter-free approach for learning instance-matching rules to classify candidate instance pairs into matches and non-matches. In the last filtering step, we propose a parameter-free solution that leverages only simple Boolean functions and exploits fine-grained word-level dissimilarity evidences to further filter out the non-matches. We evaluate our approaches against the latest baselines. The results show advances beyond the state-of-the-art.
 
|Veranstaltungsart=Graduiertenkolloquium
 
|Veranstaltungsart=Graduiertenkolloquium
|Start=2014/02/28 14:00:00
+
|Start=2014/03/05 14:00:00
|Ende=2014/02/28 15:00:00
+
|Ende=2014/03/05 15:00:00
 
|Gebäude=11.40
 
|Gebäude=11.40
 
|Raum=231
 
|Raum=231
 
|Vortragender=Yongtao Ma
 
|Vortragender=Yongtao Ma
 
|Eingeladen durch=Rudi Studer
 
|Eingeladen durch=Rudi Studer
 +
|PDF=5 3 14 Ma.pdf
 
|Forschungsgruppe=Wissensmanagement
 
|Forschungsgruppe=Wissensmanagement
 
|In News anzeigen=True
 
|In News anzeigen=True
 
}}
 
}}

Aktuelle Version vom 25. Februar 2014, 08:39 Uhr

Instance Matching for Heterogeneous Structured Data

Veranstaltungsart:
Graduiertenkolloquium




Structured data is abundantly available in enterprises and also largely increasing in the Web setting. Generally speaking, it can be conceived as structured descriptions of real-world entities. One main problem towards the effective usage of structured data is instance matching, where the goal is to find instance representations referring to the same real-world thing. However, the structured data on the Web is heteroge-neous, e.g. type information of instances is missing or too general to be useful. Besides, the challenges that lie ahead for typical instance matching approaches also include dealing with the low-quality data and high computation complexity. We tackle these challenges in different steps of the instance matching process. The first step is typification, in which the type semantics is derived by an unsupervised approach. The second step, blocking, aims to reduce the quadratic complexity of the instance matching process through the efficient and effective generation of match candidates. We propose an unsupervised approach to learn the most representative attributes of instances called keys, based on which two instances are considered as a match candidate if they share the same value of the key. The third step classification aims to deal with the low quality of data, for which we propose an almost-parameter-free approach for learning instance-matching rules to classify candidate instance pairs into matches and non-matches. In the last filtering step, we propose a parameter-free solution that leverages only simple Boolean functions and exploits fine-grained word-level dissimilarity evidences to further filter out the non-matches. We evaluate our approaches against the latest baselines. The results show advances beyond the state-of-the-art.

(Yongtao Ma)




Start: 05. März 2014 um 14:00
Ende: 05. März 2014 um 15:00


Im Gebäude 11.40, Raum: 231

Veranstaltung vormerken: (iCal)


Veranstalter: Forschungsgruppe(n) Wissensmanagement
Information: Media:5 3 14 Ma.pdf