Mechanically Checked Safety Proof of a Byzantine Paxos Algorithm
Last modified 1 September 2011
Algorithm BPCon is a consensus algorithm that tolerates
Byzantine faults. It abstracts and generalizes the consensus
algorithm that is at the heart of the Castro-Liskov algorithm. That
algorithm is described in the paper
Practical Byzantine Fault Tolerance and Proactive Recovery
by Miguel Castro and Barbara Liskov,
which appeared in
ACM Transactions on Computer Systems 20, 4 (2002), pages 398-461.
BPCon also admits two other implementations: a simple one that employs
digital signatures and a previously unpublished one that, like the
Castro-Liskov algorithm, relies only on message authenticators.
We have proven the safety of algorithm BPCon by showing that its
safety specification refines (implements) that of algorithm PCon, a
variant of the classic Paxos consensus algorithm. The classic
algorithm was first published in the paper
The Part-Time Parliament by Leslie Lamport,
which appeared in
ACM Transactions on Computer Systems 16, 2 (1998), pages 133-169.
(It was also presented without proof by Brian Oki in his Ph.D. thesis.)
The variant, although never presented in this form, was essentially
used as part of two previous algorithms:
Vertical Paxos
and an unpublished (but patented) version of
Cheap Paxos.
Our proof has been checked by
TLAPS,
the TLA+ proof system.
The paper Byzantizing Paxos by Refinement
informally explains the BPCon algorithm and shows how it can be
derived from PCon.
We have also written the following two additional specifications.
- Consensus:
A trivial, high-level specification of consensus.
- Voting:
A high-level consensus algorithm that describes ballots and voting
that underlie the classic Paxos consensus algorithm.
We have proved that the Voting algorithm refines the Consensus
specification (including liveness). Most of the non-temporal steps of
the refinement proof have been checked by TLAPS. (TLAPS does not yet
check proof obligations involving temporal logic formulas.)
Algorithm PCon refines the Voting algorithm. The safety part of this
refinement, as well as of Consensus by Voting, has been checked by the
TLC model checker.
Our specifications are all written as algorithms in the PlusCal algorithm language, which are
automatically translated into TLA+.
Click here for typeset
versions of the PlusCal code.
This code, and its translation, specify only safety. The liveness
properties of specifications Consensus and Voting are written directly
in TLA+. (Constructs for specifying liveness were added to PlusCal
after these specifications were written, but they are not general
enough to specify the rather subtle liveness assumptions needed by the
algorithms.) We have not written formal liveness properties for BPCon
and PCon. The specifications are in the following files:
Consensus Consensus.tla.
Voting
VoteProof.tla
PCon
PConProof.tla
BPCon
BPConProof.tla
All the specifications have extensive comments. If you are interested
primarily in the specifications and proofs, then you should read them
in order, starting with the Consensus specification. If you are
primarily interested in Byzantine Paxos, unless you are already
familiar with TLA+ and PlusCal you should still read the Consensus
specification first. You might then want to go directly to BPConProof
and refer back to the other specifications as needed.
Each algorithm refines (implements) the previous one under a suitable
refinement mapping. Before attempting any proofs, we checked the
safety part of each refinement except the last with the TLC model checker, using large enough models that
we felt confident that there were no "coding" errors. Our
understanding of these algorithms is good enough that we were
confident that they were basically correct before writing any proofs.
We also model checked BPCon to find simple errors, but TLC cannot
check it on large enough models to give any confidence in correctness
of the algorithm.
The final refinement step, that BPCon refines PCon, has
been proved and the proof completely checked by TLAPS, except for:
- A few simple mathematical facts about finite sets are
assumed without proof--for example, that a finite, non-empty set of integers
contains a maximal element. These assumptions have been checked with
TLC to avoid possible errors in their statements.
- The complete TLA+ proof involves a few lines of very simple
temporal-logic reasoning. (Those proof steps are literally identical
for every TLA+ refinement proof of safety properties.) Since TLAPS
does not yet handle temporal reasoning, that part of the proof is
omitted.
We have written a complete proof of the first refinement step, that
Voting refines Consensus, including the liveness properties.
All the non-temporal steps have been checked with TLAPS except for
the following.
-
Seven proof obligations containing ENABLED formulas were not checked
because TLAPS cannot yet reason about TLA's ENABLED operator.
- We assumed a few of the same kind of simple mathematical
facts as in the refinement proof of BPCon, where possible checking
them with TLC.
The major part of our proof that we would like to complete is the
liveness part of the proof that BPCon refines PCon, which includes
writing the two algorithms' liveness properties. We will probably
wait to do that until TLAPS can check temporal proofs and we have
checked complete proof that Voting refines Consensus.
It would also be nice to complete the entire refinement sequence by
proving that PCon refines Voting. However, there is little other
motivation to write such a proof because: (i) we already believe
that PCon is correct, and (ii) its proof would be less of a test
of our ability to write and machine check such proofs than the
refinement proof of BPCon.
This page can be found by searching the Web for the 23-letter
string
uidlamportbyzpaxosproof.
Please do not put this string in any document that could wind up on
the web--including email messages and Postscript and Word documents.
You can refer to it in Web documents as "the string obtained by
removing the - from
uid-lamportbyzpaxosproof".