Published: 2013 Oktober
Institution: Institut AIFB, KIT
Erscheinungsort / Ort: Karlsruhe
In recent years, the amount of RDF data on the Web has drastically increased. For an effective search over such a large Web of data, ranking of results is crucial. To allow efficient query processing of ranked queries, top-k join processing has been proposed, which aims at computing k top-ranked results, without complete result materialization. However, Web search poses novel challenges. Most notably, users are frequently willing to sacrifice result accuracy in favor of result computation time. Thus, there is a strong need to approximate the top-k results. Previous work on approximate top-k processing, however, is not applicable for the join top-k setting – it solely targets the selection top-k problem. Further, existing approaches require offline computed score statistics to be available. Unfortunately, many important kinds of Web search queries, e.g., keyword or spatial queries, commonly query-/user-dependent ranking is employed. Thus, one only has score information at runtime. In this paper, we address these shortcomings and propose a novel approximate top-k join framework, well-suited for Web search queries and ranking. For this, all necessary score statistics are learned via a pay-as-you-go training at runtime. We study our approach in a theoretical manner, and formally show its efficiency as well as effectiveness. Further, we conducted experiments on benchmarks comprising synthetic as well as real-world Web data/- queries. Our results are very promising, as we could achieve up to 65% time savings, while maintaining a high precision/recall.