Byzantizing Paxos by Refinement

Distributed Computing: 25th International Symposium: DISC 2011, David Peleg, editor. Springer-Verlag. | , pp. 211-224

The Castro-Liskov algorithm (Miguel Castro and Barbara Liskov, Practical Byzantine Fault Tolerance and Proactive Recovery, TOCS 20:4 [2002] 398-461) intuitively seems like a modification of Paxos [122] to handle Byzantine failures, using 3n+1 processes instead of 2n+1 to handle n failures. In 2003 I realized that a nice way to think about the algorithm is that 2n+1 non-faulty processes are trying to implement ordinary Paxos in the presence of n malicious processes–each good process not knowing which of the other processes are malicious. Although I mentioned the idea in lectures, I didn’t work out the details.

The development of TLAPS, the TLA+ proof system, inspired me to write formal TLA+ specifications of the two algorithms and a TLAPS-checked proof that the Castro-Liskov algorithm refines ordinary Paxos. This paper describes the results.