Techreport3039: Unterschied zwischen den Versionen
Awa (Diskussion | Beiträge) |
Awa (Diskussion | Beiträge) |
||
Zeile 20: | Zeile 20: | ||
}} | }} | ||
{{Publikation Details | {{Publikation Details | ||
− | |Abstract= | + | |Abstract=Many RDF descriptions today are text-rich: besides struc- |
− | + | tured data they also feature much unstructured text. Text-rich RDF data is frequently queried via predicates matching structured data, combined with string predicates for textual constraints (hybrid queries). Evaluating hybrid queries efficiently requires means for selectivity estimation. | |
− | + | Previous works on selectivity estimation, however, suffer from inherent drawbacks, which are reflected in efficiency and effectiveness issues. We propose a novel estimation approach, TopGuess, which exploits topic models as data synopsis. This way, we capture correlations between structured and unstructured data in a uniform and scalable manner. We study | |
− | complexity w.r.t. text data size, | + | TopGuess in a theoretical analysis and show it to guarantee a linear space complexity w.r.t. text data size. Further, we show selectivity estimation time complexity to be independent from the synopsis size. In experiments on real-world data, TopGuess allowed for great improvements in estimation accuracy, without sacrificing efficiency. |
− | |Download=Awa-topguess- | + | |Download=Awa-topguess-selectivity-estimation-tr.pdf.pdf |
|Projekt=IZEUS | |Projekt=IZEUS | ||
|Forschungsgruppe=Wissensmanagement | |Forschungsgruppe=Wissensmanagement |
Aktuelle Version vom 15. Januar 2014, 16:43 Uhr
Published: 2013
Mai
Institution: Institute AIFB, KIT
Erscheinungsort / Ort: Karlsruhe
Archivierungsnummer:3039
Kurzfassung
Many RDF descriptions today are text-rich: besides struc-
tured data they also feature much unstructured text. Text-rich RDF data is frequently queried via predicates matching structured data, combined with string predicates for textual constraints (hybrid queries). Evaluating hybrid queries efficiently requires means for selectivity estimation.
Previous works on selectivity estimation, however, suffer from inherent drawbacks, which are reflected in efficiency and effectiveness issues. We propose a novel estimation approach, TopGuess, which exploits topic models as data synopsis. This way, we capture correlations between structured and unstructured data in a uniform and scalable manner. We study
TopGuess in a theoretical analysis and show it to guarantee a linear space complexity w.r.t. text data size. Further, we show selectivity estimation time complexity to be independent from the synopsis size. In experiments on real-world data, TopGuess allowed for great improvements in estimation accuracy, without sacrificing efficiency.
Download: Media:Awa-topguess-selectivity-estimation-tr.pdf.pdf