Self-Stabilizing Autonomic Recoverers

This talk introduces theoretical foundations for system architectures and algorithms for creating truly robust autonomic systems – systems that are able to recover automatically from unexpected failures. The suggested approach is generic and can be applied to various applications and areas, such as cloud computing, long running continuesly executing systems, control systems. We consider various settings of system transparency. We consider black box and transparent box software packages.The general assumption is that a software package fails when it encounters an unexpected environment state – a state the package was not programmed to cope with. Creating a system that anticipates every possible environment state is not feasible due to the size of the environment. Thus, an autonomic system design should imply that a system is able to overcome an unexpected environment state either by executing a recovery action that restores a legal state or by finding a new program thatrespects the specifications and achieves the software package goals in the current environment.

In the first part of this talk, we consider software packages to be black boxes.We propose modeling software package flaws (bugs) by assuming eventual Byzantine behavior of the package. A general, yet practical, framework and paradigm for the monitoring and recovery of systems called autonomic recoverer is proposed. In the second part we consider a software package to be a transparent box and introduce the recovery oriented programming paradigm. Programs designed according to the recovery oriented programming paradigm include important safety and liveness properties and recovery actions as an integral part of the program. We design a pre-compiler that produces augmented code for monitoring the properties and executing the recovery actions upon a property violation. Finally, in the third part, we consider a highly dynamic environment, which typically implies that there are no realizable specifications for the environment, i.e., there does not exist a program that respects the specifications for every given environment. We suggest searching for a program in run time by trying all possible programs on environment replicas in parallel. We design control search algorithms that exploit various environment properties.

Speaker Details

Olga had graduated with PhD in Computer Science under supervision of Prof. Shlomi Dolev from Ben-Gurion University, Beer-Sheva, Israel in 2008. The main focus of her work was exploring formal definitions of the system architectures for autonomic systems and using techniques from self-stabilization to create true self-recovery systems. Since then she worked in Microsoft Israel in PC Health group on creating monitoring and recovery scenarios for PC Advisor. Next, she spent a year in Dautche Telecom Labs in Ben-Gurion University working on spam mitigation techniques and security concerns in IPv6. Currently, she is with Xsignnet.com, an innovative storage systems startup.

Date:
Speakers:
Olga Brukman
Affiliation:
Computer Science Department, Ben-Gurion University, Israel and Xsignnet.com
    • Portrait of Jeff Running

      Jeff Running