Published: 2012 November
Type: Technical Report
Institution: Institute AIFB, KIT
Erscheinungsort / Ort: Karlsruhe
There are queries in Linked Data processing that cannot always be optimally answered through traditional data base management techniques. More often than not answering such queries relies on information that is incomplete, incorrect, or fuzzily specified; and on mere approximations of computationally advanced functionality for matching, aggregating, and ranking such information. As a means to deal with these limitations, we propose CrowdSPARQL, a novel approach to SPARQL query answering that brings together machine- and human-driven capabilities. We define extensions of the SPARQL query language and the Linked Data vocabulary VoID in order to capture those aspects of Linked Data query processing that per design are likely to benefit from the use of human-based computation. Based on this information, and on a set of statistics gathered during the use of our system, CrowdSPARQL is able to decide at run time which parts of a query are going to be evaluated using automatic query execution techniques, and which will be answered by the crowd via a microtask platform such as Amazon's Mechanical Turk. We evaluated CrowdSPARQL in a scenario handling a representative subset of tasks that are amenable to crowdsourcing - ontological classification, entity resolution and subjective rankings - on the DBpedia and MusicBrainz data sets, in order to learn how specific parameters of microtask design influence the success of crowdsourced query answering.