Big Data/MonetDB

From Wikiversity
Jump to navigation Jump to search

MonetDB is a column-store, i.e., for each column in a relational table a binary association table (BAT) is created that maps an object identifier to the corresponding value. It exploits the main memory but the database is still persisted on disk. MonetDB can be used as a distributed database. Its design focusses on a read-dominated workload and updates consist of appending large data chunks at a time.

MonetDB consists of three layers:

provides the user-level data model and the query languages SQL, XQuery (for XML), SciQL (for arrays) and SPARQL (for RDF).
First, the query is translated into relational algebra. Then, a domain-specific strategic optimization is applied, which tries to reduce the amount of data to be processed. The resulting optimization plan is finally translated into the MonetDB Assembly Language (MAL).
consists of the MAL optimizer and interpreter.
The performed tactical optimization is inspired by programming language optimization and ranges from symbolic processing up to just-in-time data distribution and execution.
provides BATs and a library of optimized implementations of the binary relational algebra operators.
The operational optimization chooses at runtime the optimal algorithm and implementation to perform the defined operators on the used input data.


RDF triples consist of three object identifiers: S (Subject), P (Property) and O (Object). Each triple is stored in six triple tables: SPO, SOP, PSO, POS, OPS and OSP. This leads to 18 BATs. Before inserting a triple into the database, a dictionary module decomposes an URI into the largest common prefix, which is stored only once, and the unique ID of a subject, property or object.

Current research: Use characteristic sets (CS) to derive relational tables. For each subject in a (CS) an own table is formed. Each column represents a property and the values the corresponding objects. An object which is another subject is expressed via a foreign key. Irregular triples (belonging to no CS are stored separately in a basic triple storage. The relational schema should adapt during runtime.

In order to reduce the number of CSs, attributes of king 0..n are allowed. Since a column of MonetDB must be exactly of one type, for each distinct object type of one property an own CS is created. Furthermore, a schema fine-tuning like the unification of 1..1 related CSs.