Synchronizing Clocks in the Presence of Faults

Journal of the Association for Computing Machinery | , Vol 32(1): pp. 52-78

Practical implementation of Byzantine agreement requires synchronized clocks. For an implementation to tolerate Byzantine faults, it needs a clock synchronization algorithm that can tolerate those faults. When I arrived at SRI, there was a general feeling that we could synchronize clocks by just having each process use a Byzantine agreement protocol to broadcast its clock value. I was never convinced by that hand waving. So, at some point I tried to write down precise clock-synchronization algorithms and prove their correctness. The two basic Byzantine agreement algorithms from [46] did generalize to clock-synchronization algorithms. In addition, Melliar-Smith had devised the interactive convergence algorithm, which is also included in the paper. (As I recall, that algorithm was his major contribution to the paper, and I wrote all the proofs.)

Writing the proofs turned out to be much more difficult than I had expected (see [27]). I worked very hard to make them as short and easy to understand as I could. So, I was rather annoyed when a referee said that the proofs seemed to have been written quickly and could be simplified with a little effort. In my replies to the reviews, I referred to that referee as a “supercilious bastard”. Some time later, Nancy Lynch confessed to being that referee. She had by then written her own proofs of clock synchronization and realized how hard they were.

Years later, John Rushby and his colleagues at SRI wrote mechanically verified versions of my proofs. They found only a couple of minor errors. I’m rather proud that, even before I knew how to write reliable, structured proofs (see [101]), I was sufficiently careful and disciplined to have gotten those proofs essentially correct.