On implementing Omega in systems with weak reliability and synchrony assumptions

  • Marcos K. Aguilera ,
  • Carole Delporte-Gallet ,
  • Hugues Fauconnier ,
  • Sam Toueg

Distributed Computing |

We study the feasibility and cost of implementing Omega–a fundamental failure detector at the core of many algorithms–in systems with weak reliability and synchrony assumptions. Intuitively, Omega allows processes to eventually elect a common leader.

We first give an algorithm that implements Omega in a weak system S where (a) except for some unknown timely process s, all processes may be arbitrarily slow or may crash, and (b) only the output links of s are eventually timely (all other links can be arbitrarily slow and lossy). Previously known algorithms for Omega worked only in systems that are strictly stronger than S in terms of reliability or synchrony assumptions.

We next show that algorithms that implement Omega in system S are necessarily expensive in terms of communication complexity: all correct processes (except possibly one) must send messages forever; moreover, a quadratic number of links must carry messages forever. This result holds evenfor algorithms that tolerate at most one crash.

Finally, we show that with a small additional assumption to system S–the existence of some unknown correct process whose links can be arbitrarily slow and lossy but fair–there is a communication-efficient algorithm for Omega such that eventually only one process (the elected leader) sends messages.

Some recent experimental results indicate that two of the algorithms for Omega described in this paper can be used in dynamically-changing systems and work well in practice [36].