Specifying and Verifying Fault-Tolerant Systems

Formal Techniques in Real-Time and Fault-Tolerant Systems, H. Langmaack, W.-P. de Roever, J. Vytopil editors. Lecture Notes in Computer Science, Springer-Verlag |

Willem-Paul de Roever invited me to give a talk at this symposium. I was happy to have a podium to explain why verifying fault-tolerant, real-time systems should not be a new or especially difficult problem. This was already explained in [106] for real-time systems, but I knew that there would be people who thought that fault-tolerance made a difference. Moreover, de Roever assured me that Lübeck, where the symposium was held, is a beautiful town. (He was right.) So, I decided to redo in TLA the proof from [51]. However, when the time came to write the paper, I realized that I had overextended myself and needed help. The abstract states

We formally specify a well known solution to the Byzantine generals problem and give a rigorous, hierarchically structured proof of its correctness. We demonstrate that this is an engineering exercise, requiring no new scientific ideas.

However, there were few computer scientists capable of doing this straightforward engineering exercise. Stephan Merz was one of them. So, I asked him to write the proof, which he did with his usual elegance and attention to detail. I think I provided most of the prose and the initial version of the TLA specification, which Merz modified a bit. The proof was all Merz’s.