An RDF Database
RDF is an expressive graph-based language for representing meta-data about resources. It is complemented with the query language SPARQL, and both are developed as standards within the W3C. In RDF, the smallest component is a triple: (<subject>, <predicate>, <object>), which in graph language translates to a vertex (<subject>), path (<predicate>), vertex (<object>). Similarly, a SPARQL query is a list of triples where a position in each triple can be either bound or unbound. This allows for complex path traversal and star-queries when querying an RDF-graph. Although the basics of both RDF and SPARQL are simple, there are many challenges when designing large RDF-databases. In this research we are especially focusing on how to provide scalable representation and management for efficient storage and querying of large RDF graphs distributed over several physical machines.
This research is done in the context of Stellaris, a metadata management service originally developed within the AstroGrid-D project. Our original focus was to provide a flexible way to store and query metadata relevant for e-science and grid-computing, ranging from resource description of grid resources (compute clusters, robotic telescopes, etc.) to application specific job metadata or dataset annotations. Stellaris uses common web-standards such as RDF to describe metadata and the accompanying query language SPARQL together with a REST-based interface.