Relational Schemata for Distributed SPARQL Query Processing

Victor Anthony Arrascue Ayala, Polina Koleva, Anas Alzogbi, Matteo Cossu, Michael Färber, Patrick Philipp, Guilherme Schievelbein, Io Taxidou, Georg Lausen

Published: 2019

Buchtitel: Proceedings of the International Workshop on Semantic Big Data (SBD∂SIGMOD'19)
Verlag: ACM

Referierte Veröffentlichung

BibTeX

Kurzfassung
To benefit from mature database technology RDF stores are built on top of relational databases and SPARQL queries are mapped into SQL. Using a shared-nothing computer cluster is a way to achieve scalability by carrying out query processing on top of large RDF datasets in a distributed fashion. Aiming to this the current paper elaborates on the impact of relational schema design when queries are mapped into Apache Spark SQL. A single triple table, a set of tables resulting from partitioning by predicate, a single wide table covering all properties, and a set of tables based on the application model specification called domain-dependent-schema, are the considered designs. For each of the mentioned approaches, the rows of the corresponding tables are stored in the distributed file system HDFS using the columnar-store Parquet. Experiments using standard benchmarks demonstrate that the single wide property table approach, despite its simplicity, is superior to other approaches. Further experiments demonstrate that this single table approach continues to be attractive even when repartitioning by key (RDF subject) is applied before executing queries.

Download: Media:RelationalSchemata_SBD2019.pdf
Weitere Informationen unter: Link
DOI Link: 10.1145/3323878.3325804

Forschungsgruppe

Web Science

Forschungsgebiet

Verteilte Algorithmen, Semantic Web

Inproceedings3771

Relational Schemata for Distributed SPARQL Query Processing

Relational Schemata for Distributed SPARQL Query Processing