Distributed Snapshots: Determining Global States of a Distributed System

ACM Transactions on Computer Systems | , pp. 63-75

The distributed snapshot algorithm described here came about when I visited Chandy, who was then at the University of Texas in Austin. He posed the problem to me over dinner, but we had both had too much wine to think about it right then. The next morning, in the shower, I came up with the solution. When I arrived at Chandy’s office, he was waiting for me with the same solution. I consider the algorithm to be a straightforward application of the basic ideas from [27].

In 2012, a reader noticed that the paper’s reference list includes a paper by Chandy and me titled On Partially-Ordered Event Models of Distributed Computations, claiming it had been submitted for publication. Several times I have made the mistake of referencing a paper of mine “to appear” that never appeared. But I can’t imagine that I would have claimed that a nonexistent paper had been submitted for publication. However, neither Chandy nor I have any memory of that paper or the reference. My guess is that we inserted the reference in a preliminary version when we expected to write and submit the other paper, and then we forgot to remove it.