Vertical Paxos and Primary-Backup Replication

The ACM Symposium on Principles of Distributed Computing (PODC 2009) |

Published by Association for Computing Machinery, Inc.

Publication

Large-scale distributed storage systems built over failure-prone commodity components are increasingly popular. Failures are common in those large systems, and replication is often the solution to data reliability. A clear gap remains between the well-known consensus algorithms and the practical replication protocols in real systems: consensus algorithms such as Paxos [2] are used mostly to maintain global configuration information, not for the actual data replication.

This paper came out of much discussion between Malkhi, Zhou, and myself about reconfiguration. Some day, what we did may result in a long paper about state-machine reconfiguration containing these results and others that have not yet been published. The ideas here are related to the original, unpublished version of [151].