Published: 2013 Dezember
Institution: Institut AIFB, KIT
Erscheinungsort / Ort: Karlsruhe
Instance matching is an important step in data integration where the goal is to find instance representations referring to the same real-world thing. State-of-the-art methods use training data to learn combinations of attributes, similarity functions and thresholds, called instance matching rules, for finding matches. The learning of complex rules with thresh- olds is however complex and thus, very sensitive to training data and parameters. In this paper, we explore a different avenue, proposing an approach that does not use thresholds but more simple boolean similarity functions. We show that the simple boolean nature of the employed rules allows for a parameter-free learning approach. For high effectiveness, we propose to incorporate fine-grained word-level evidences into rule learning. That is, instead of capturing the similarity of entire attribute values in the rules, our approach employes words extracted from attribute values. Using benchmark matching tasks, we show the proposed solution greatly out- performs state-of-the-art approaches in terms of result quality and most importantly, is not sensitive to the choice of training data and parameters.