Quorum consensus algorithms like the Paxos algorithm are widely used as basic building blocks for fault-tolerance in distributed systems. Unfortunately, distributed quorum consensus causes much overhead to negotiate and safely store the consensus. We therefore plan to optimize Paxos-based fault-tolerance for sequences of consensus in three ways:

  1. exploit multicast and reduce operations of modern interconnects to reduce the latency and number of messages,
  2. use remote direct memory access (RDMA) in combination with NVRAM to manage a distributed shared state,
  3. modify Paxos to support a sequence of consensus decisions in-place and avoid separate memory resources for each Paxos instance.
  4. We then build efficient custom datatypes on top of consensus sequences, that support partial updates, multiple-reader-single-writer locks, or compare-and-swap semantics.

The resulting distributed fault-tolerant consensus will provide low latency and high-throughput decisions. It will allow to apply recoverable distributed consensus in new scenarios where it was avoided before due to its high latency. The optimized consensus can be used as a building block in current and future distributed data management and database systems – including those developed in SPP 2037 – that often rely on a sequence of decisions to process locks, transactions, to make atomic changes like compare and swap, to support replicated state machines, or to elect the next master etc.

This project is part of the DFG SPP 2037 on scalable data management for future hardware.

Publications

2021
DFI: The Data Flow Interface for High-Speed Networks SIGMOD '21: International Conference on Management of Data, pp. 1825-1837, 2021 Lasse Thostrup, Jan Skrzypczak, Matthias Jasny, Tobias Ziegler, Carsten Binnig BibTeX
DOI
rbrcseq
2020
RMWPaxos: Fault-Tolerant In-Place Consensus Sequences IEEE Transactions on Parallel and Distributed Systems, 31(10), pp. 2392-2405, 2020 Jan Skrzypczak, Florian Schintke, Thorsten Schütt BibTeX
DOI
arXiv
rbrcseq
Towards Log-Less, Fine-Granular State Machine Replication Datenbank Spektrum, 20(3), pp. 231-241, 2020 Jan Skrzypczak, Florian Schintke BibTeX
DOI
rbrcseq
Transactions on Red-black and AVL trees in NVRAM arXiv, 2020 Thorsten Schütt, Florian Schintke, Jan Skrzypczak BibTeX
arXiv
rbrcseq
2019
DPI: The Data Processing Interface for Modern Networks 9th Biennial Conference on Innovative Data Systems Research, 2019 Gustavo Alonso, Carsten Binnig, Ippokratis Pandis, Kenneth Salem, Jan Skrzypczak, Ryan Stutsman, Lasse Thostrup, Tianzheng Wang, Zeke Wang, Tobias Ziegler PDF
BibTeX
rbrcseq
Linearizable State Machine Replication of State-Based CRDTs without Logs Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing, PODC 2019, pp. 455-457, 2019 Jan Skrzypczak, Florian Schintke, Thorsten Schütt BibTeX
DOI
rbrcseq
Linearizable State Machine Replication of State-Based CRDTs without Logs arXiv, 2019 Jan Skrzypczak, Florian Schintke, Thorsten Schütt BibTeX
arXiv
rbrcseq
2018
Atomarer Datenzugriff auf verteilten persistenten Arbeitsspeicher Master's thesis, Humboldt-Universität zu Berlin, Alexander Reinefeld, Jens-Peter Redlich (Advisors), 2018 Lennart Steininger BibTeX
rbrcseq
2017
Weakening Paxos Consensus Sequences for Commutative Commands Master's thesis, Humboldt-Universität zu Berlin, Alexander Reinefeld, Björn Scheuermann (Advisors), 2017 (preprint available as ZIB-Report 17-64) Jan Skrzypczak PDF (ZIB-Report)
BibTeX
rbrcseq