→
Mechanically Checked Safety Proof of a Byzantine Paxos Algorithm
Leslie Lamport
Last modified on 25 July 2020
|
This web page links to the formal PlusCal specifications of the
algorithms, and the machine-checked TLA+ proofs of their correctness,
that are presented informally in the paper
Byzantizing Paxos by Refinement.
That paper describes an algorithm BPCon that abstracts and generalizes
the consensus algorithm at the heart of the
algorithm described in the seminal paper
Practical Byzantine Fault Tolerance and Proactive Recovery
by Miguel Castro and Barbara Liskov
(ACM Transactions on Computer Systems 20, 4 (2002),
pages 398-461).
BPCon also admits two other implementations: a simple one
that employs digital signatures and a previously unpublished one that,
like the Castro-Liskov algorithm, relies only on message
authenticators.
It is derived by Byzantizing algorithm
PCon, a variant of the classic Paxos consensus algorithm
first published in
The Part-Time Parliament by Leslie Lamport,
(ACM Transactions on Computer Systems 16, 2 (1998),
pages 133-169).
(Paxos also appeared without proof in Brian Oki's Ph.D.
thesis.) Although never presented in this form, PCon was
essentially used as part of two previous algorithms:
Vertical Paxos
and an unpublished (but patented) version of
Cheap Paxos.
We have proved the safety properties of algorithm BPCon by showing
that its safety specification refines (implements) that of algorithm
PCon.
The proof has been checked by
TLAPS,
the TLA+ proof system.
The initial version of the proof was written and checked in
2011. Since then, improvements to TLAPS have allowed the proof
to be shortened---mainly by Stephan Merz.
We have also written the following two additional specifications.
- Consensus:
A trivial, high-level specification of consensus.
- Voting:
A high-level consensus algorithm that describes ballots and voting
that underlie the classic Paxos consensus algorithm.
We have proved that the Voting algorithm refines the Consensus
specification (including liveness). The proofs of the safety
properties have been completely checked by TLAPS, but some steps in the
liveness proofs have not because TLAPS cannot yet check them.
Algorithm PCon refines the Voting algorithm. The safety part of
this refinement, as well as of the refinement of Consensus by Voting,
has been checked by the TLC model checker.
Our specifications are all written as algorithms in the
PlusCal algorithm language,
which are
automatically translated into TLA+.
Click here for an
explanation of the PlusCal code.
This code, and its translation, specify only safety. The liveness
properties of specifications Consensus and Voting are written directly
in TLA+. (Constructs for specifying liveness were added to PlusCal
after these specifications were written, but they are not general
enough to specify the rather subtle liveness assumptions needed by the
algorithms.) We have not written formal liveness properties for BPCon
and PCon. The specifications and their pretty-printed
versions are in the following files.
They are also all in this
compressed zip file.
All the specifications have extensive comments. If you are interested
primarily in the specifications and proofs, then you should read them
in order, starting with the Consensus specification. If you are
primarily interested in Byzantine Paxos, unless you are already
familiar with TLA+ and PlusCal, you should still read the Consensus
specification first. You might then want to go directly to BPConProof
and refer back to the other specifications as needed.
Each algorithm refines (implements) the previous one under a suitable
refinement mapping. Before attempting any proofs, we checked the
safety part of each refinement except the last with the TLC model checker, using large enough models that
we felt confident that there were no coding errors. Our
understanding of these algorithms is good enough that we were
confident that they were basically correct before writing any proofs.
We also model checked BPCon to find simple errors, but TLC cannot
check it on large enough models to give any confidence in correctness
of the algorithm.
The final refinement step, that BPCon refines PCon, has
been proved and the proof completely checked by TLAPS, except that
a few simple mathematical facts about finite sets are
assumed without proof--for example, that a finite, non-empty set of integers
contains a maximal element. These assumptions were checked with
TLC to avoid possible errors in their statements.
Theorems in library modules developed since the proofs were written would
now make it easy to prove those statements.
We have written a proof of the first refinement step, that Voting
refines Consensus, including the liveness properties. All safety
properties were completely checked, except that we assumed a few of
the same kind of simple mathematical facts as in the refinement proof
of BPCon, where possible checking them with TLC.
It would be nice to complete the entire refinement sequence by
proving that PCon refines the safety part of the
Voting spec. However, there is little
motivation to do that because: (i) we already believe
that PCon is correct, (ii) its proof would be less of a test
of our ability to write and machine check such proofs than the
refinement proof of BPCon, and (iii) a proof of correctness
of ordinary Paxos has already been written and checked with TLAPS.
The proof that the Voting spec implements the liveness property of
Consensus has not been completely checked, since TLAPS cannot yet
reason about TLA's ENABLED operator or do non-propositional temporal
logic reasoning.
This page can be found by searching the Web for the 23-letter
string
uidlamportbyzpaxosproof.
Please do not put this string in any document that could wind up on
the web--including email messages and Postscript and Word documents.
You can refer to it in Web documents as the string obtained by
removing the - from
uid-lamportbyzpaxosproof .
|