Stage-oe-small.jpg

Thema3458

Aus Aifbportal
Wechseln zu:Navigation, Suche



OLAP of Linked Data: Aggregation functions and summarizability





Informationen zur Arbeit

Abschlussarbeitstyp: Master
Betreuer: Benedikt Kämpgen
Forschungsgruppe: Wissensmanagement

Archivierungsnummer: 3458
Abschlussarbeitsstatus: Abgeschlossen
Beginn: unbekannt
Abgabe: unbekannt

Weitere Informationen

Interessiert an Entscheidungsunterstützung mittels Web-Daten? In der folgenden Arbeit sollen mathematische Aspekte von OLAP auf Daten im Semantic Web untersucht werden.

Die Beschreibung des Themas ist auf Englisch. Die Arbeit kann jedoch auch auf Deutsch geschrieben werden.

Bei Fragen bitte an Benedikt Kämpgen wenden.

Interested in decision support using web data? In this work, mathematical aspects about OLAP of Linked Data shall be investigated.

OLAP of Linked Data: Aggregation functions and summarizability

Background

  • Online Analytical Processing (OLAP) is a common decision support method used in business to allow analysts intuitive analysis of large amount of statistical data (for more information about OLAP, see "An overview of data warehousing and OLAP technology" [1] and "Providing OLAP to User-Analysts: An IT Mandate" [2]).
  • Linked Data ("Linked Data - The Story So Far" [3], Linked Data design issues [4]) is a set of best practices to use Semantic Web technologies to publish data on the web in a format that is easily accessible and processable by machines.
  • Since more and more statistics have been published as Linked Data on the web (for examples, see PlanetData wiki datasets [5]) and are interesting for decision-support, in our research, we investigate how to allow OLAP of Linked Data (for more information, see [6]).

Topic

  • The topic of this work are aggregation and summarizability aspects of OLAP on Linked Data:
  • OLAP requires a multidimensional model of data cubes.
  • A data cube is defined as the result of the CUBE operator on a relational table with n columns for Dimensions, m columns for Measures, and for each measure an aggregation function ("Data cube: a relational aggregation operator generalizing GROUP-BY, CROSS-TAB, and SUB-TOTALS" [7])
  • However, not all aggregation functions make sense, e.g., to use as aggregation function the SUM operator for a Measure giving the current stock of a product in a certain period of time.
  • This is known as summarizability problem and has been investigated much in the OLAP literature, already ("A survey on summarizability issues in multidimensional modeling" [8])
  • The RDF Data Cube Vocabulary allows to publish statistics as Linked Data and is well-adopted but does not describe how to represent aggregation functions [9].
  • Therefore OLAP systems have to guess the correct aggregation function ("Transforming Statistical Linked Data for Use in OLAP Systems" [10]).
  • Ontological information may be useful in finding correct aggregation functions, but this so far has not been investigated much in related work ("Ontologies and summarizability in OLAP" [11] and "Semantics of Governmental Statistics Data" [12]).
  • Interesting related work for this thesis may be:
    • Work on using mathematical information on the Semantic Web: "Enabling Collaboration on Semiformal Mathematical Knowledge by Semantic Web Integration" [13] and on "Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification".
    • Work on including semantic information in Data Warehouses: "Thinking Structurally Helps Business Intelligence Design" and "Exploring Strategic Indexes by Semantic OLAP Operators" and "Semantic Enrichment of Strategic Datacubes".

Thus, the work will concentrate on two questions:

  • How to represent (complex) aggregation functions in RDF?
  • How to automatically generate correct aggregation functions from RDF.

A recommendation of how to represent (complex) aggregation functions in RDF shall be given. Also, different approaches of generating correct aggregation functions from RDF shall be developed. Evaluation of results shall be done in terms of applicability by generating and using correct aggregation functions from real-world datasets.

Students will gain a good understanding of business-relevant OLAP of data sources published on the web using the increasingly popular Linked Data principles.

Requirements

  • Some background knowledge in Semantic Web technologies (RDF, SPARQL, Linked Data)
  • Interest in OLAP and data warehousing
  • Some interest in mathematical aspects may be useful

If you are interested or have questions, please contact Benedikt Kämpgen.