WikiJournal Preprints/Functional Tuple Store
This article is an unpublished pre-print not yet undergoing peer review.
To submit this article for peer review, please:
Article information
Abstract
Introduction
[edit | edit source]Nowadays, there is a lot of ordered key-value store available. Some of those are distributed like TiKV or FoundationDB. Some are embedded like WiredTiger or RocksDB. It is nice abstraction that has some success in the industry, among others there is Google Spanner[1].
This work goes beyond the work done on OSTRICH[2] in sense that fstore handles tuple of items and tuples are versioned using branches like in git.
The first section describe how to implement a generic tuple store, the second part dive into how to use the previous abstraction to implement an efficient versioned n-tuple store. The paper ends with a conclusion and future work.
Triple store
[edit | edit source]The generic tuple store is a database abstraction that store tuple of items in set. Every tuple contains items and every tuple is unique.
The problem is: how to query 3-tuples in an ordered key-value store (okvs).
We model the ordered key-value store using a table with two columns that stores bytes where keys are ordered using lexicographic order (dictionary order). Here is an example:
key | value |
---|---|
01 | 01 |
01 02 | 02 |
02 | 03 |
10 01 | 04 |
There exists an algorithm that allows to translate any tuple of integer and string into bytes that preserve their natural order. The lexicographic packing algorithm will order integers before strings. That is, we can consider that the ordered key-store can host tuples of integers and strings and that every (key, value) appears sorted by the key. Here is an example:
key | value |
---|---|
("hello", "world", 42) | 0 |
("hello", "world", 1337) | 0 |
("hello", "world", 2019) | 0 |
As you can see this forms already some kind of tuple store where (subject, predicate, object) are stored as key and the value is a filler.
One of the property of ordered key-value store is that they allow to query by key range and key prefix. For instance, in any ordered key-value store it would be very easy to query for every tuple that starts with "hello". We also use that property to allow to query any pattern. A pattern is a tuple (subject, predicate, object) where one or more item as variable. Here are all the patterns for 3-tuples where variables are in bold:
- (subject, predicate, object)
- (subject, predicate, object)
- (subject, predicate, object)
- (subject, predicate, object)
- (subject, predicate, object)
- (subject, predicate, object)
To be able to query any pattern, one could use every permutations of (subject, predicate, object) and query by pattern prefix. For instance, the following pattern:
((variable 'one) "world" 1337)
Can be queried by constructing a subspace of the key-value store where (predicate, object) or (object, predicate) is prefix. For every tuple the database contains, it must permute each of the tuple 6 times to be able to query every pattern. That being said, we can recognize a given permutation can allow to query multiple pattern. So there is some duplication. Let's say by trial and error, we discover that the minimal set that allow to query any 3-tuple pattern in one hop is the following permutations:
- (subject, predicate, object)
- (predicate, object, subject)
- (object, subject, predicate)
Re-using the above example, we will need to construct the following okvs:
Versioned Generic Tuple Store
[edit | edit source]There are other solutions. One can find them all by computing all 3-set of permutations of (subject, predicate, value) and checking that the 3 set verify the following property:
(define (permutation-prefix? c o)
(any (lambda (p) (prefix? p o)) (permutations c)))
(define (ok? combinations candidate)
(every (lambda (c) (any (lambda (p) (permutation-prefix? c p)) candidate)) combinations))
Conclusion and future work
[edit | edit source]The functional tuple store rely on an ordered key-value store to deliver a pragmatic versatile ACID-compliant versioned generic tuple store. The fact that it is generic allows to represent many kinds of data and metadata. The fact that it is versioned allows to keep track of what happened through time travelling queries. Hopefully, more generic, possibly versioned, tuple store will be created in other programming languages.
Additional information
[edit | edit source]Acknowledgements
[edit | edit source]Any people, organisations, or funding sources that you would like to thank.
Competing interests
[edit | edit source]Any conflicts of interest that you would like to declare. Otherwise, a statement that the authors have no competing interest.
Ethics statement
[edit | edit source]An ethics statement, if appropriate, on any animal or human research performed should be included here or in the methods section.
References
[edit | edit source]- ↑ "Spanner: Google's Globally-Distributed Database". ai.google. Retrieved 2019-06-16.
- ↑ Verborgh, Ruben; Mannens, Erik; Herwegen, Joachim Van; Sande, Miel Vander; Taelman, Ruben (2019-01-01). "Triple Storage for Random-Access Versioned Querying of RDF Archives". Journal of Web Semantics 54 (1): 4–28. doi:10.1016/j.websem.2018.08.001. https://rdfostrich.github.io/article-jws2018-ostrich/.