Lazy Checkpoint Coordination for Bounding Rollback Propagation

  • Yi-Min Wang ,
  • W. Kent Fuchs

Published by Institute of Electrical and Electronics Engineers, Inc.

In this paper, we propose the technique of lazy checkpoint coordination which preserves process autonomy while employing communication-induces checkpoint coordination for bounding rollback propagation. The notion of laziness in introduced to control the coordination frequency and allow a flexible trade-off between the cost of checkpoint coordination and the average rollback distance. Worst-case overhead analysis provides a means for estimating the extra checkpoint overhead. Communication trace-driven simulation for several parallel programs is used to evaluate the benefits of the proposed scheme.