Scalaris - Distributed Scalable Key-Value Store with Transactions
Global online services have millions of customers with thousands of servers located throughout the world. At this scale, components fail continuously and it is difficult to maintain a consistent state while hiding failures from the application.
Peer-to-peer protocols provide self-management by replicating services among peers, but they are mostly limited to write-once/read-many data sharing. To extend them beyond the typical file sharing, the support of fast transactions on distributed hash tables (DHTs) is an important yet missing feature.
We designed and implemented a distributed key/value store based on a DHT that supports consistent writes. Our system comprises three layers, all of them implemented in Erlang:
- At the bottom, a structured overlay network with logarithmic routing performance builds the basis for a multi-dimensional key/value store. In contrast to many other DHTs, our overlay stores the keys in lexicographical order, hence range queries are possible.
- The middle layer implements the ACID properties (atomicity, concurrency, isolation, durability) in the face of concurrent write operations. It uses an improved Paxos Commit protocol with a low communication overhead that has been optimally integrated into the overlay.
- The top layer hosts the application, a distributed key/value store. This layer can be used as a scalable, fault-tolerant backend for online services like shopping, banking, data sharing, online gaming, or social networks.
Our work is similar to Amazon's SimpleDB, but additionally supports full ACID properties. Dynamo, in contrast, restricts itself to eventual consistency only. As a test case, we chose Wikipedia, the free encyclopedia, that anyone can edit. Our implementation serves approx. 2,500 transactions per second with just 16 CPUs, which is better than the public Wikipedia.
Publikationen
2020 |
|||
Nico Kruber | Approximate Distributed Set Reconciliation with Defined Accuracy | Doctoral thesis, Humboldt-Universität zu Berlin, Alexander Reinefeld (Advisor), 2020 |
BibTeX
DOI URN |
2019 |
|||
Dirk Mattes | Ein System zur deterministischen Wiedergabe von verteilten Algorithmen auf Anwendungsebene | Master's thesis, Humboldt-Universität zu Berlin, Alexander Reinefeld, Björn Scheuermann (Advisors), 2019 |
BibTeX
|
2017 |
|||
Jan Skrzypczak | Weakening Paxos Consensus Sequences for Commutative Commands | Master's thesis, Humboldt-Universität zu Berlin, Alexander Reinefeld, Björn Scheuermann (Advisors), 2017 (preprint available as ZIB-Report 17-64) |
PDF (ZIB-Report)
BibTeX |
2016 |
|||
Jens Fischer | Evaluating the Scalability of Scalaris | Master's thesis, Freie Universität Berlin, Katinka Wolter, Florian Schintke (Advisors), 2016 |
BibTeX
|
2015 |
|||
Nico Kruber, Maik Lange, Florian Schintke | Approximate Hash-Based Set Reconciliation for Distributed Replica Repair | 2015 IEEE 34th International Symposium on Reliable Distributed Systems (SRDS), pp. 166-175, 2015 |
BibTeX
DOI |
2014 |
|||
Jens V. Fischer | A Gossiping Framework for Scalaris | Bachelor's thesis, Freie Universität Berlin, Katinka Wolter, Florian Schintke (Advisors), 2014 |
PDF
BibTeX URN |
Nico Kruber, Florian Schintke, Michael Berlin | A Relational Database Schema on the Transactional Key-Value Store Scalaris | Proceedings of 2nd Workshop on Scalable Cloud Data Management, 2014 |
BibTeX
DOI |
Jan Fajerski | Map Reduce on Distributed Hash Tables | Master's thesis, Humboldt-Universität zu Berlin, Alexander Reinefeld, Björn Scheuermann (Advisors), 2014 |
BibTeX
|
Maximilian Michels | Request-Based Load Balancing in Distributed Hash Tables | Master's thesis, Freie Universität Berlin, Agnès Voisard, Florian Schintke (Advisors), 2014 |
BibTeX
|
2013 |
|||
Marie Hoffmann | Approximate Algorithms for Distributed Systems | Master's thesis, Freie Universität Berlin, Thorsten Schütt, Katinka Wolter (Advisors), 2013 |
PDF
BibTeX URN |
Magnus Müller | Flexible Routing Tables in a Distributed Key-Value Store | Master's thesis, Humboldt-Universität zu Berlin, Alexander Reinefeld, Björn Scheuermann (Advisors), 2013 |
PDF
BibTeX URN |
Mikael Högqvist, Alexander Reinefeld | RECODE: Reconfigurable, Consistent and Decentralized Data Services | IEEE International Conference on Peer-to-Peer Computing (P2P’13), pp. 1-10, 2013 |
BibTeX
DOI |
Stefan Keidel | Snapshots in Scalaris | Master's thesis, Humboldt-Universität zu Berlin, Alexander Reinefeld, Florian Schintke (Advisors), 2013 |
PDF
BibTeX URN |
Florian Schintke | XtreemFS & Scalaris | Science & Technology, pp. 54-55, 2013 |
BibTeX
|
2012 |
|||
Mikael Högqvist | Consistent Key-Based Routing in Decentralized and Reconfigurable Data Services | Doctoral thesis, Humboldt-Universität zu Berlin, 2012 |
BibTeX
URN |
Maik Lange | Effiziente Reparatur von Repliken in Distributed Hash Tables | Master's thesis, Humboldt-Universität zu Berlin, 2012 |
PDF
BibTeX URN |
Philipp Borgers | Erweiterung eines verteilten Key-Value-Stores (Riak) um einen räumlichen Index | Bachelor's thesis, Freie Universität Berlin, 2012 |
BibTeX
|
2011 |
|||
Nico Kruber, Mikael Högqvist, Thorsten Schütt | The Benefits of Estimated Global Information in DHT Load Balancing | Cluster Computing and the Grid, IEEE International Symposium on, Vol.0, pp. 382-391, 2011 |
BibTeX
DOI |
2010 |
|||
Florian Schintke, Alexander Reinefeld, Seif Haridi, Thorsten Schütt | Enhanced Paxos Commit for Transactions on DHTs | CCGRID, pp. 448-454, 2010 (preprint available as ZIB-Report 09-28) |
PDF (ZIB-Report)
BibTeX DOI |
Florian Schintke | Management verteilter Daten in Grid- und Peer-to-Peer-Systemen | Doctoral thesis, Humboldt-Universität zu Berlin, 2010 |
BibTeX
|
Thorsten Schütt | Range queries in distributed hash tables | Doctoral thesis, Humboldt-Universität zu Berlin, 2010 |
BibTeX
|
2009 |
|||
Alexander Reinefeld, Florian Schintke, Thorsten Schütt, Seif Haridi | A Scalable, Transactional Data Store for Future Internet Services | Towards the Future Internet - A European Research Perspective, G. Tselentis, J. Domingue, A. Galis, A. Gavras, D. Hausheer, S. Krco, V. Lotz, T. Zahariadis (Eds.), IOS Press, pp. 148-159, 2009 |
BibTeX
DOI |
Nico Kruber | DHT Load Balancing with Estimated Global Information | Master's thesis, Humboldt-Universität zu Berlin, Alexander Reinefeld, Miroslaw Malek (Advisors), 2009 |
PDF
BibTeX URN |
Christian von Prollius | Ein Peer-to-Peer System mit Bereichsabfragen in PlanetLab | Master's thesis, Freie Universität Berlin, Jochen Schiller, Alexander Reinefeld (Advisors), 2009 |
PDF
BibTeX URN |
Marie Hoffmann | Ein agglomeratives, gossipbasiertes Clustering-Verfahren für verteilte Systeme implementiert in Scalaris | Bachelor's thesis, Freie Universität Berlin, 2009 |
BibTeX
|
Thorsten Schütt, Alexander Reinefeld, Florian Schintke, Marie Hoffmann | Gossip-based Topology Inference for Efficient Overlay Mapping on Data Centers | Peer-to-Peer Computing, pp. 147-150, 2009 |
BibTeX
DOI |
Mikael Högqvist, Nico Kruber | Passive/Active Load Balancing with Informed Node Placement in DHTs | IWSOS, Thrasyvoulos Spyropoulos, Karin Hummel (Eds.), pp. 101-112, Vol.5918, Lecture Notes in Computer Science, 2009 |
BibTeX
DOI |
Thorsten Schütt, Alexander Reinefeld, Florian Schintke, C. Hennig | Self-Adaptation in Large-Scale Systems: A Study on Structured Overlays Across Multiple Datacenters | Architectures and Languages for Self-Managing Distributed Systems (SelfMan@SASO), 2009 |
BibTeX
DOI |
Mikael Högqvist, Stefan Plantikow | Towards Explicit Data Placement in Scalable Key/Value Stores | Architectures and Languages for Self-Managing Distributed Systems (SelfMan@SASO), 2009 |
BibTeX
|
Monika Moser, Seif Haridi, Tallat Shafaat, Thorsten Schütt, Mikael Högqvist, Alexander Reinefeld | Transactional DHT Algorithms | ZIB-Report 09-34 |
PDF
BibTeX URN |
2008 |
|||
Thorsten Schütt, Monika Moser, Stefan Plantikow, Florian Schintke, Alexander Reinefeld | A Transactional Scalable Distributed Data Store | 1st IEEE International Scalable Computing Challenge, co-located with CCGrid’08, 2008 |
BibTeX
|
Tallat Shafaat, Monika Moser, Ali Ghodsi, Thorsten Schütt, Seif Haridi, Alexander Reinefeld | Key-Based Consistency and Availability in Structured Overlay Networks | International ICST Conference on Scalable Information Systems, 2008 |
BibTeX
|
Tallat Shafaat, Monika Moser, Ali Ghodsi, Thorsten Schütt, Alexander Reinefeld | On Consistency of Data in Structured Overlay Networks | Coregrid Integration Workshop, 2008 |
BibTeX
|
Thorsten Schütt, Florian Schintke, Alexander Reinefeld | Range Queries on structured overlay networks | Computer Communications, 31(2), pp. 280-291, 2008 |
BibTeX
DOI |
Thorsten Schütt, Florian Schintke, Alexander Reinefeld | Scalaris: Reliable Transactional P2P Key/Value | ERLANG '08: Proceedings of the 7th ACM SIGPLAN workshop on ERLANG, pp. 41-47, 2008 |
BibTeX
DOI |
P. Roy, Seif Haridi, Alexander Reinefeld, J. B. Stefani, R. Yap, T. Coupaye | Self Management for Large-Scale Distributed Systems | Formal Methods for Components and Objects 2007 (FMCO 2007), 2008 |
BibTeX
|
Stefan Plantikow, Alexander Reinefeld, Florian Schintke | Transactions and Concurrency Control for Peer-to-Peer-Wikis: An Evaluation | Making Grids Work, Marco Danelutto, Paraskevi Fragopoulou, Vladimir Getov (Eds.), pp. 337-349, 2008 |
BibTeX
DOI |
Mikael Högqvist, Seif Haridi, Nico Kruber, Alexander Reinefeld, Thorsten Schütt | Using Global Information for Load Balancing in DHTs | Workshop on Decentralized Self Management for Grids, P2P, and User Communities, 2008 |
BibTeX
|
2007 |
|||
Thorsten Schütt, Florian Schintke, Alexander Reinefeld | A Structured Overlay for Multi-dimensional Range Queries | Euro-Par Conference, Anne-Marie Kermarrec (Ed.), pp. 503-513, Vol.4641, LNCS, 2007 |
BibTeX
DOI |
Monika Moser, Seif Haridi | Atomic Commitment in Transactional DHTs | Proceedings of the CoreGRID Symposium, 2007 |
BibTeX
|
Alexander Reinefeld, Florian Schintke, Thorsten Schütt | Device and method for retrieving / storing electronic data in a system with a plurality of data processing units | United States Patent Application Publication No. US 2007/0165619 A1, 2007 |
BibTeX
|
Stefan Plantikow, Alexander Reinefeld, Florian Schintke | Distributed Wikis on Structured Overlays | CoreGrid Workshop on Grid Programming Models, Grid and P2P System Architecture, Grid Systems, Tools and Environments, 2007 |
BibTeX
|
Alexander Reinefeld, Florian Schintke, Thorsten Schütt | P2P Routing of Range Queries in Skewed Multidimensional Data Sets | ZIB-Report 07-23 |
PDF
BibTeX URN |
Stefan Plantikow, Alexander Reinefeld, Florian Schintke | Transactions for Distributed Wikis on Structured Overlays | DSOM, Alexander Clemm, Lisandro Granville, Rolf Stadler (Eds.), pp. 256-267, Vol.4785, LNCS, 2007 |
BibTeX
DOI |
Stefan Plantikow | Transaktionen für verteilte Wikis auf strukturierten Overlay Netzwerken | Master's thesis, Humboldt-Universität zu Berlin, Alexander Reinefeld, Miroslaw Malek (Advisors), 2007 |
PDF
BibTeX URN |
Stefan Plantikow | Transaktionen für verteilte Wikis auf strukturierten Overlay-Netzwerken | ZIB-Report 07-43 |
PDF
BibTeX URN |
Alexander Reinefeld, Florian Schintke, Thorsten Schütt | Vorrichtung und Verfahren zum Abrufen / Speichern von elektronischen Daten in einem System mit mehreren Datenverarbeitungseinheiten | Europäisches Patent EP 1 744 257 A1, 2007 |
BibTeX
|
Alexander Reinefeld, Florian Schintke, Thorsten Schütt | Vorrichtung und Verfahren zum Speichern / Abrufen von Objekten mit mehrdimensional adressierten, elektronischen Daten | Europäische Patentanmeldung Nr. 06012030.0 vom 12.06.2006, Patent Nr: 1868114, 2007 |
BibTeX
|
2006 |
|||
Thorsten Schütt, Florian Schintke, Alexander Reinefeld | Structured Overlay without Consistent Hashing: Empirical Results | Sixth Workshop on Global and Peer-to-Peer Computing (GP2PC’06) at Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2006), 16-19 May 2006, Singapore, 2006 |
BibTeX
DOI |
2005 |
|||
Thorsten Schütt, Florian Schintke, Alexander Reinefeld | Chord#: Structured Overlay Network for Non-Uniform Load-Distribution | ZIB-Report 05-40 |
PDF
BibTeX URN |