Global online services have millions of customers with thousands of servers located throughout the world. At this scale, components fail continuously and it is difficult to maintain a consistent state while hiding failures from the application.
Peer-to-peer protocols provide self-management by replicating services among peers, but they are mostly limited to write-once/read-many data sharing. To extend them beyond the typical file sharing, the support of fast transactions on distributed hash tables (DHTs) is an important yet missing feature.
We designed and implemented a distributed key/value store based on a DHT that supports consistent writes. Our system comprises three layers, all of them implemented in Erlang:
- At the bottom, a structured overlay network with logarithmic routing performance builds the basis for a multi-dimensional key/value store. In contrast to many other DHTs, our overlay stores the keys in lexicographical order, hence range queries are possible.
- The middle layer implements the ACID properties (atomicity, concurrency, isolation, durability) in the face of concurrent write operations. It uses an improved Paxos Commit protocol with a low communication overhead that has been optimally integrated into the overlay.
- The top layer hosts the application, a distributed key/value store. This layer can be used as a scalable, fault-tolerant backend for online services like shopping, banking, data sharing, online gaming, or social networks.
Our work is similar to Amazon's SimpleDB, but additionally supports full ACID properties. Dynamo, in contrast, restricts itself to eventual consistency only. As a test case, we chose Wikipedia, the free encyclopedia, that anyone can edit. Our implementation serves approx. 2,500 transactions per second with just 16 CPUs, which is better than the public Wikipedia.