How to Build a Highly Available System
Using Consensus

Butler W. Lampson

 

Citation: An earlier version appeared in Distributed Algorithms, ed. Babaoglu and Marzullo, Lecture Notes in Computer Science 1151, Springer, 1996, pp 1-17.

Links: Abstract, Postscript, Acrobat, Web page, Word. Slides for a talk are here.

Email: blampson@microsoft.com. This paper is at http://research.microsoft.com.

 

Abstract:

Lamport showed that a replicated deterministic state machine is a general way to implement a highly available system, given a consensus algorithm that the replicas can use to agree on each input. His Paxos algorithm is the most fault-tolerant way to get consensus without real-time guarantees. Because general consensus is expensive, practical systems reserve it for emergencies and use leases (locks that time out) for most of the computing. This paper explains the general scheme for efficient highly available computing, gives a general method for understanding concurrent and fault-tolerant programs, and derives the Paxos algorithm as an example of the method.