Graceful degradation via versions: specifications and implementations

Symposium on Principles of Distributed Computing (PODC 2007) |

Correctness of a fault-tolerant system hinges on the failure model, which typically constrains the number of concurrent failures in the system. These assumptions are sometimes violated in practice, inevitably leading to degraded system behavior that deviates from the system’s specification and even causing complete unavailability of the system. This paper advocates the notion of graceful degradation as a complementary mechanism to fault tolerance in the design of highly available distributed systems. It provides three specifications for meaningful system behavior under degradation. The different specifications capture different tradeoffs between the gracefulness of degradation and the semantics preserved by a degraded view. The paper further demonstrates the practical relevance of the specifications by presenting three designs of versioned distributed storage systems that implement the specifications.