Global online services have millions of customers with thousands of servers located throughout the world. At this scale, components fail continuously and it is difficult to maintain a consistent state while hiding failures from the application.

Peer-to-peer protocols provide self-management by replicating services among peers, but they are mostly limited to write-once/read-many data sharing. To extend them beyond the typical file sharing, the support of fast transactions on distributed hash tables (DHTs) is an important yet missing feature.

We designed and implemented a distributed key/value store based on a DHT that supports consistent writes. Our system comprises three layers, all of them implemented in Erlang:Scalaris three layer architecture

  • At the bottom, a structured overlay network with logarithmic routing performance builds the basis for a multi-dimensional key/value store. In contrast to many other DHTs, our overlay stores the keys in lexicographical order, hence range queries are possible.
  • The middle layer implements the ACID properties (atomicity, concurrency, isolation, durability) in the face of concurrent write operations. It uses an improved Paxos Commit protocol with a low communication overhead that has been optimally integrated into the overlay.
  • The top layer hosts the application, a distributed key/value store. This layer can be used as a scalable, fault-tolerant backend for online services like shopping, banking, data sharing, online gaming, or social networks.

Our work is similar to Amazon's SimpleDB, but additionally supports full ACID properties. Dynamo, in contrast, restricts itself to eventual consistency only. As a test case, we chose Wikipedia, the free encyclopedia, that anyone can edit. Our implementation serves approx. 2,500 transactions per second with just 16 CPUs, which is better than the public Wikipedia.

Publications

2020
Approximate Distributed Set Reconciliation with Defined Accuracy Doctoral thesis, Humboldt-Universität zu Berlin, Alexander Reinefeld (Advisor), 2020 Nico Kruber BibTeX
DOI
URN
Scalaris
2019
Ein System zur deterministischen Wiedergabe von verteilten Algorithmen auf Anwendungsebene Master's thesis, Humboldt-Universität zu Berlin, Alexander Reinefeld, Björn Scheuermann (Advisors), 2019 Dirk Mattes BibTeX
Scalaris
2017
Weakening Paxos Consensus Sequences for Commutative Commands Master's thesis, Humboldt-Universität zu Berlin, Alexander Reinefeld, Björn Scheuermann (Advisors), 2017 (preprint available as ZIB-Report 17-64) Jan Skrzypczak PDF (ZIB-Report)
BibTeX
Scalaris
2016
Evaluating the Scalability of Scalaris Master's thesis, Freie Universität Berlin, Katinka Wolter, Florian Schintke (Advisors), 2016 Jens Fischer BibTeX
Scalaris
2015
Approximate Hash-Based Set Reconciliation for Distributed Replica Repair 2015 IEEE 34th International Symposium on Reliable Distributed Systems (SRDS), pp. 166-175, 2015 Nico Kruber, Maik Lange, Florian Schintke BibTeX
DOI
Scalaris
2014
A Gossiping Framework for Scalaris Bachelor's thesis, Freie Universität Berlin, Katinka Wolter, Florian Schintke (Advisors), 2014 Jens V. Fischer PDF
BibTeX
URN
Scalaris
A Relational Database Schema on the Transactional Key-Value Store Scalaris Proceedings of 2nd Workshop on Scalable Cloud Data Management, 2014 Nico Kruber, Florian Schintke, Michael Berlin BibTeX
DOI
Scalaris
Map Reduce on Distributed Hash Tables Master's thesis, Humboldt-Universität zu Berlin, Alexander Reinefeld, Björn Scheuermann (Advisors), 2014 Jan Fajerski BibTeX
Scalaris
Request-Based Load Balancing in Distributed Hash Tables Master's thesis, Freie Universität Berlin, Agnès Voisard, Florian Schintke (Advisors), 2014 Maximilian Michels BibTeX
Scalaris
2013
Approximate Algorithms for Distributed Systems Master's thesis, Freie Universität Berlin, Thorsten Schütt, Katinka Wolter (Advisors), 2013 Marie Hoffmann PDF
PDF
BibTeX
URN
Scalaris
Flexible Routing Tables in a Distributed Key-Value Store Master's thesis, Humboldt-Universität zu Berlin, Alexander Reinefeld, Björn Scheuermann (Advisors), 2013 Magnus Müller PDF
BibTeX
URN
Scalaris
RECODE: Reconfigurable, Consistent and Decentralized Data Services IEEE International Conference on Peer-to-Peer Computing (P2P’13), pp. 1-10, 2013 Mikael Högqvist, Alexander Reinefeld BibTeX
DOI
Scalaris
Snapshots in Scalaris Master's thesis, Humboldt-Universität zu Berlin, Alexander Reinefeld, Florian Schintke (Advisors), 2013 Stefan Keidel PDF
BibTeX
URN
Scalaris
XtreemFS & Scalaris Science & Technology, pp. 54-55, 2013 Florian Schintke BibTeX
Scalaris
2012
Consistent Key-Based Routing in Decentralized and Reconfigurable Data Services Doctoral thesis, Humboldt-Universität zu Berlin, 2012 Mikael Högqvist BibTeX
URN
Scalaris
Effiziente Reparatur von Repliken in Distributed Hash Tables Master's thesis, Humboldt-Universität zu Berlin, 2012 Maik Lange PDF
BibTeX
URN
Scalaris
Erweiterung eines verteilten Key-Value-Stores (Riak) um einen räumlichen Index Bachelor's thesis, Freie Universität Berlin, 2012 Philipp Borgers BibTeX
Scalaris
2011
The Benefits of Estimated Global Information in DHT Load Balancing Cluster Computing and the Grid, IEEE International Symposium on, Vol.0, pp. 382-391, 2011 Nico Kruber, Mikael Högqvist, Thorsten Schütt BibTeX
DOI
Scalaris
2010
Enhanced Paxos Commit for Transactions on DHTs CCGRID, pp. 448-454, 2010 (preprint available as ZIB-Report 09-28) Florian Schintke, Alexander Reinefeld, Seif Haridi, Thorsten Schütt PDF (ZIB-Report)
BibTeX
DOI
Scalaris
Management verteilter Daten in Grid- und Peer-to-Peer-Systemen Doctoral thesis, Humboldt-Universität zu Berlin, 2010 Florian Schintke BibTeX
Scalaris
Range queries in distributed hash tables Doctoral thesis, Humboldt-Universität zu Berlin, 2010 Thorsten Schütt BibTeX
Scalaris
2009
A Scalable, Transactional Data Store for Future Internet Services Towards the Future Internet - A European Research Perspective, G. Tselentis, J. Domingue, A. Galis, A. Gavras, D. Hausheer, S. Krco, V. Lotz, T. Zahariadis (Eds.), IOS Press, pp. 148-159, 2009 Alexander Reinefeld, Florian Schintke, Thorsten Schütt, Seif Haridi BibTeX
DOI
Scalaris
DHT Load Balancing with Estimated Global Information Master's thesis, Humboldt-Universität zu Berlin, Alexander Reinefeld, Miroslaw Malek (Advisors), 2009 Nico Kruber PDF
BibTeX
URN
Scalaris
Ein Peer-to-Peer System mit Bereichsabfragen in PlanetLab Master's thesis, Freie Universität Berlin, Jochen Schiller, Alexander Reinefeld (Advisors), 2009 Christian von Prollius PDF
BibTeX
URN
Scalaris
Ein agglomeratives, gossipbasiertes Clustering-Verfahren für verteilte Systeme implementiert in Scalaris Bachelor's thesis, Freie Universität Berlin, 2009 Marie Hoffmann BibTeX
Scalaris
Gossip-based Topology Inference for Efficient Overlay Mapping on Data Centers Peer-to-Peer Computing, pp. 147-150, 2009 Thorsten Schütt, Alexander Reinefeld, Florian Schintke, Marie Hoffmann BibTeX
DOI
Scalaris
Passive/Active Load Balancing with Informed Node Placement in DHTs IWSOS, Thrasyvoulos Spyropoulos, Karin Hummel (Eds.), pp. 101-112, Vol.5918, Lecture Notes in Computer Science, 2009 Mikael Högqvist, Nico Kruber BibTeX
DOI
Scalaris
Self-Adaptation in Large-Scale Systems: A Study on Structured Overlays Across Multiple Datacenters Architectures and Languages for Self-Managing Distributed Systems (SelfMan@SASO), 2009 Thorsten Schütt, Alexander Reinefeld, Florian Schintke, C. Hennig BibTeX
DOI
Scalaris
Towards Explicit Data Placement in Scalable Key/Value Stores Architectures and Languages for Self-Managing Distributed Systems (SelfMan@SASO), 2009 Mikael Högqvist, Stefan Plantikow BibTeX
Scalaris
Transactional DHT Algorithms ZIB-Report 09-34 Monika Moser, Seif Haridi, Tallat Shafaat, Thorsten Schütt, Mikael Högqvist, Alexander Reinefeld PDF
BibTeX
URN
Scalaris
2008
A Transactional Scalable Distributed Data Store 1st IEEE International Scalable Computing Challenge, co-located with CCGrid’08, 2008 Thorsten Schütt, Monika Moser, Stefan Plantikow, Florian Schintke, Alexander Reinefeld BibTeX
Scalaris
Key-Based Consistency and Availability in Structured Overlay Networks International ICST Conference on Scalable Information Systems, 2008 Tallat Shafaat, Monika Moser, Ali Ghodsi, Thorsten Schütt, Seif Haridi, Alexander Reinefeld BibTeX
Scalaris
On Consistency of Data in Structured Overlay Networks Coregrid Integration Workshop, 2008 Tallat Shafaat, Monika Moser, Ali Ghodsi, Thorsten Schütt, Alexander Reinefeld BibTeX
Scalaris
Range Queries on structured overlay networks Computer Communications, 31(2), pp. 280-291, 2008 Thorsten Schütt, Florian Schintke, Alexander Reinefeld BibTeX
DOI
Scalaris
Scalaris: Reliable Transactional P2P Key/Value ERLANG '08: Proceedings of the 7th ACM SIGPLAN workshop on ERLANG, pp. 41-47, 2008 Thorsten Schütt, Florian Schintke, Alexander Reinefeld BibTeX
DOI
Scalaris
Self Management for Large-Scale Distributed Systems Formal Methods for Components and Objects 2007 (FMCO 2007), 2008 P. Roy, Seif Haridi, Alexander Reinefeld, J. B. Stefani, R. Yap, T. Coupaye BibTeX
Scalaris
Transactions and Concurrency Control for Peer-to-Peer-Wikis: An Evaluation Making Grids Work, Marco Danelutto, Paraskevi Fragopoulou, Vladimir Getov (Eds.), pp. 337-349, 2008 Stefan Plantikow, Alexander Reinefeld, Florian Schintke BibTeX
DOI
Scalaris
Using Global Information for Load Balancing in DHTs Workshop on Decentralized Self Management for Grids, P2P, and User Communities, 2008 Mikael Högqvist, Seif Haridi, Nico Kruber, Alexander Reinefeld, Thorsten Schütt BibTeX
Scalaris
2007
A Structured Overlay for Multi-dimensional Range Queries Euro-Par Conference, Anne-Marie Kermarrec (Ed.), pp. 503-513, Vol.4641, LNCS, 2007 Thorsten Schütt, Florian Schintke, Alexander Reinefeld BibTeX
DOI
Scalaris
Atomic Commitment in Transactional DHTs Proceedings of the CoreGRID Symposium, 2007 Monika Moser, Seif Haridi BibTeX
Scalaris
Device and method for retrieving / storing electronic data in a system with a plurality of data processing units United States Patent Application Publication No. US 2007/0165619 A1, 2007 Alexander Reinefeld, Florian Schintke, Thorsten Schütt BibTeX
Scalaris
Distributed Wikis on Structured Overlays CoreGrid Workshop on Grid Programming Models, Grid and P2P System Architecture, Grid Systems, Tools and Environments, 2007 Stefan Plantikow, Alexander Reinefeld, Florian Schintke BibTeX
Scalaris
P2P Routing of Range Queries in Skewed Multidimensional Data Sets ZIB-Report 07-23 Alexander Reinefeld, Florian Schintke, Thorsten Schütt PDF
BibTeX
URN
Scalaris
Transactions for Distributed Wikis on Structured Overlays DSOM, Alexander Clemm, Lisandro Granville, Rolf Stadler (Eds.), pp. 256-267, Vol.4785, LNCS, 2007 Stefan Plantikow, Alexander Reinefeld, Florian Schintke BibTeX
DOI
Scalaris
Transaktionen für verteilte Wikis auf strukturierten Overlay Netzwerken Master's thesis, Humboldt-Universität zu Berlin, Alexander Reinefeld, Miroslaw Malek (Advisors), 2007 Stefan Plantikow PDF
BibTeX
URN
Scalaris
Transaktionen für verteilte Wikis auf strukturierten Overlay-Netzwerken ZIB-Report 07-43 Stefan Plantikow PDF
BibTeX
URN
Scalaris
Vorrichtung und Verfahren zum Abrufen / Speichern von elektronischen Daten in einem System mit mehreren Datenverarbeitungseinheiten Europäisches Patent EP 1 744 257 A1, 2007 Alexander Reinefeld, Florian Schintke, Thorsten Schütt BibTeX
Scalaris
Vorrichtung und Verfahren zum Speichern / Abrufen von Objekten mit mehrdimensional adressierten, elektronischen Daten Europäische Patentanmeldung Nr. 06012030.0 vom 12.06.2006, Patent Nr: 1868114, 2007 Alexander Reinefeld, Florian Schintke, Thorsten Schütt BibTeX
Scalaris
2006
Structured Overlay without Consistent Hashing: Empirical Results Sixth Workshop on Global and Peer-to-Peer Computing (GP2PC’06) at Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2006), 16-19 May 2006, Singapore, 2006 Thorsten Schütt, Florian Schintke, Alexander Reinefeld BibTeX
DOI
Scalaris
2005
Chord#: Structured Overlay Network for Non-Uniform Load-Distribution ZIB-Report 05-40 Thorsten Schütt, Florian Schintke, Alexander Reinefeld PDF
BibTeX
URN
Scalaris