﻿ Mechanically Checked Safety Proof of a Byzantine Paxos Algorithm

# Mechanically Checked Safety Proof of a Byzantine Paxos Algorithm

## What We've Done

Algorithm BPCon is a consensus algorithm that tolerates Byzantine faults.  It abstracts and generalizes the consensus algorithm that is at the heart of the Castro-Liskov algorithm.  That algorithm is described in the paper Practical Byzantine Fault Tolerance and Proactive Recovery by Miguel Castro and Barbara Liskov, which appeared in ACM Transactions on Computer Systems 20, 4 (2002), pages 398-461.  BPCon also admits two other implementations: a simple one that employs digital signatures and a previously unpublished one that, like the Castro-Liskov algorithm, relies only on message authenticators.

We have proven the safety of algorithm BPCon by showing that its safety specification refines (implements) that of algorithm PCon, a variant of the classic Paxos consensus algorithm.  The classic algorithm was first published in the paper The Part-Time Parliament by Leslie Lamport, which appeared in ACM Transactions on Computer Systems 16, 2 (1998), pages 133-169.  (It was also presented without proof by Brian Oki in his Ph.D. thesis.)  The variant, although never presented in this form, was essentially used as part of two previous algorithms: Vertical Paxos and an unpublished (but patented) version of Cheap Paxos.

Our proof has been checked by TLAPS, the TLA+ proof system.

The paper Byzantizing Paxos by Refinement informally explains the BPCon algorithm and shows how it can be derived from PCon.

We have also written the following two additional specifications.

• Consensus: A trivial, high-level specification of consensus.
• Voting: A high-level consensus algorithm that describes ballots and voting that underlie the classic Paxos consensus algorithm.
We have proved that the Voting algorithm refines the Consensus specification (including liveness).  Most of the non-temporal steps of the refinement proof have been checked by TLAPS.  (TLAPS does not yet check proof obligations involving temporal logic formulas.)

Algorithm PCon refines the Voting algorithm.  The safety part of this refinement, as well as of Consensus by Voting, has been checked by the TLC model checker.

## The Specifications

Our specifications are all written as algorithms in the PlusCal algorithm language, which are automatically translated into TLA+.  Click here for typeset versions of the PlusCal code. This code, and its translation, specify only safety.  The liveness properties of specifications Consensus and Voting are written directly in TLA+.  (Constructs for specifying liveness were added to PlusCal after these specifications were written, but they are not general enough to specify the rather subtle liveness assumptions needed by the algorithms.)  We have not written formal liveness properties for BPCon and PCon.  The specifications are in the following files:

Consensus   Consensus.tla.

Voting   VoteProof.tla

PCon   PConProof.tla

BPCon   BPConProof.tla

All the specifications have extensive comments.  If you are interested primarily in the specifications and proofs, then you should read them in order, starting with the Consensus specification.  If you are primarily interested in Byzantine Paxos, unless you are already familiar with TLA+ and PlusCal you should still read the Consensus specification first.  You might then want to go directly to BPConProof and refer back to the other specifications as needed.

## Verifying the Refinements

Each algorithm refines (implements) the previous one under a suitable refinement mapping.  Before attempting any proofs, we checked the safety part of each refinement except the last with the TLC model checker, using large enough models that we felt confident that there were no "coding" errors.  Our understanding of these algorithms is good enough that we were confident that they were basically correct before writing any proofs.  We also model checked BPCon to find simple errors, but TLC cannot check it on large enough models to give any confidence in correctness of the algorithm.

The final refinement step, that BPCon refines PCon, has been proved and the proof completely checked by TLAPS, except for:

• A few simple mathematical facts about finite sets are assumed without proof--for example, that a finite, non-empty set of integers contains a maximal element.  These assumptions have been checked with TLC to avoid possible errors in their statements.
• The complete TLA+ proof involves a few lines of very simple temporal-logic reasoning.  (Those proof steps are literally identical for every TLA+ refinement proof of safety properties.)  Since TLAPS does not yet handle temporal reasoning, that part of the proof is omitted.
We have written a complete proof of the first refinement step, that Voting refines Consensus, including the liveness properties.  All the non-temporal steps have been checked with TLAPS except for the following.
• Seven proof obligations containing ENABLED formulas were not checked because TLAPS cannot yet reason about TLA's ENABLED operator.
• We assumed a few of the same kind of simple mathematical facts as in the refinement proof of BPCon, where possible checking them with TLC.

The major part of our proof that we would like to complete is the liveness part of the proof that BPCon refines PCon, which includes writing the two algorithms' liveness properties.  We will probably wait to do that until TLAPS can check temporal proofs and we have checked complete proof that Voting refines Consensus.

It would also be nice to complete the entire refinement sequence by proving that PCon refines Voting.  However, there is little other motivation to write such a proof because: (i) we already believe that PCon is correct, and (ii) its proof would be less of a test of our ability to write and machine check such proofs than the refinement proof of BPCon.

This page can be found by searching the Web for the 23-letter string uidlamportbyzpaxosproof.  Please do not put this string in any document that could wind up on the web--including email messages and Postscript and Word documents.  You can refer to it in Web documents as "the string obtained by removing the - from uid-lamportbyzpaxosproof".