Fault Tolerance via Idempotence

G. Ramalingam; Kapil Vaswani

Fault Tolerance via Idempotence

Principles of Programming Languages (POPL) | January 2013

Download BibTex

Writing applications for distributed systems is challenging because of the pitfalls of distribution such as process failures, communication failures, asynchrony and concurrency. Abstractions such as distributed transactions and workflows address some of these pitfalls, but many challenges remain. One common requirement and challenge is the need for distributed applications that are idempotent. Idempotence ensures that the application functions correctly even when clients send duplicate requests, perhaps because the application failed to generate a response due to process failures, or because the response was generated but lost.

In this paper, we study the inter-related aspects of process failures, duplicate messages, and idempotence. We first introduce a simple core language (based on lambda calculus) inspired by modern distributed computing platforms. This language formalizes notions of process failure and recovery, duplicate requests, data partitioning, and local atomic transactions that are restricted to a single store. We formalize a generic correctness criterion, namely idempotent failure-freedom, for applications written in this language, focusing on their ability to tolerate process failures and duplicate requests. We propose language support in the form of a monad that automatically ensures idempotence, without requiring distributed coordination. We show that idempotence gives us the essence of a transactional workflow, namely fault-tolerant composition of computations for free.

The idempotence monad serves as a basic building block for other useful constructs, as described in the paper, such as an extension that allows the application to handle logical failures by associating each transaction in a workflow with a compensating action. We have implemented the idempotence monad (and its variants) in F# and C# and used our implementation to build realistic applications on Windows Azure.We find that the monad has low runtime overheads and leads to more declarative applications.