Techreport3039: Unterschied zwischen den Versionen

Aktuelle Version vom 15. Januar 2014, 16:43 Uhr

Topic-based Selectivity Estimation for Hybrid Queries over RDF Graphs

Andreas Wagner, Veli Bicer, Duc Thanh Tran

Published: 2013 Mai
Institution: Institute AIFB, KIT
Erscheinungsort / Ort: Karlsruhe
Archivierungsnummer:3039

Kurzfassung
Many RDF descriptions today are text-rich: besides struc- tured data they also feature much unstructured text. Text-rich RDF data is frequently queried via predicates matching structured data, combined with string predicates for textual constraints (hybrid queries). Evaluating hybrid queries efficiently requires means for selectivity estimation. Previous works on selectivity estimation, however, suffer from inherent drawbacks, which are reflected in efficiency and effectiveness issues. We propose a novel estimation approach, TopGuess, which exploits topic models as data synopsis. This way, we capture correlations between structured and unstructured data in a uniform and scalable manner. We study TopGuess in a theoretical analysis and show it to guarantee a linear space complexity w.r.t. text data size. Further, we show selectivity estimation time complexity to be independent from the synopsis size. In experiments on real-world data, TopGuess allowed for great improvements in estimation accuracy, without sacrificing efficiency.

Download: Media:Awa-topguess-selectivity-estimation-tr.pdf‎.pdf

Projekt

IZEUS

Forschungsgruppe

Wissensmanagement

Forschungsgebiet

Semantische Suche

@@ Zeile 20: / Zeile 20: @@
 }}
 {{Publikation Details
-|Abstract=The Resource Description Framework (RDF) has
+|Abstract=Many RDF descriptions today are text-rich: besides struc-
-become an accepted standard for describing entities on the Web. Many such RDF descriptions are text-rich – besides structured data, they also feature large portions of unstructured text. As a result, RDF data is frequently queried using predicates matching structured data, combined with string predicates for textual constraints: hybrid queries. Evaluating hybrid queries requires accu-
+tured data they also feature much unstructured text. Text-rich RDF data is frequently queried via predicates matching structured data, combined with string predicates for textual constraints (hybrid queries). Evaluating hybrid queries efficiently requires means for selectivity estimation.
-rate means for selectivity estimation. Previous works on selectivity estimation, however, suffer from inherent drawbacks, reflected in efficiency and effective issues. In this paper, we present a general framework for hybrid selectivity estimation. Based on its requirements, we study the applicability of existing approaches. Driven by our findings, we propose a novel estimation approach, TopGuess, exploiting topic models as data synopsis. This enables us to capture correlations between structured and unstructured data in a uniform and scalable manner. We study TopGuess in theorical manner, and show TopGuess to guarantee a linear space
+Previous works on selectivity estimation, however, suffer from inherent drawbacks, which are reflected in efficiency and effectiveness issues. We propose a novel estimation approach, TopGuess, which exploits topic models as data synopsis. This way, we capture correlations between structured and unstructured data in a uniform and scalable manner. We study
-complexity w.r.t. text data size, and a selectivity estimation time complexity independent from its synopsis size. In experiments on real-world data, TopGuess allowed for great improvements in estimation accuracy, without sacrificing runtime performance.
+TopGuess in a theoretical analysis and show it to guarantee a linear space complexity w.r.t. text data size. Further, we show selectivity estimation time complexity to be independent from the synopsis size. In experiments on real-world data, TopGuess allowed for great improvements in estimation accuracy, without sacrificing efficiency.
-|Download=Awa-topguess-selectivityestimation-tr.pdf‎
+|Download=Awa-topguess-selectivity-estimation-tr.pdf‎.pdf
 |Projekt=IZEUS
 |Forschungsgruppe=Wissensmanagement