|
Phoenix: Making Applications Robust
People
The Problem
Dealing with errors or exceptions is a very large part of getting applications right. Failures are not only an application programming problem but an operational and an availability problem as well. The Phoenix project is an effort to increase the availability of applications and in many cases avoid the operational task of coping with an error. System Crashes
Database systems recover the database to the last committed transaction. Incomplete transactions are aborted. While database state is recovered, the states of applications using the database, and their surrounding sessions are "blown away" (erased). This behavior results in longer outages. Our intent is to reduce the period of unavailability by extending database recovery to include session and application state. This will also enable stateful applications to survive failures and continue execution. Logical Errors
Transactions abort for logical errors as well as crashes. Aborting transactions in these cases means undoing back to transaction start. In the future we would like to extend database style recovery to support partial rollback as a result of application errors, where the rollback resets not only database state (already supported by savepoints) but also application state. This is compensation, of the multi-level transaction form, that includes application state. The Project
In Phoenix, a project within the Microsoft Research Database Group, we have focused first on application availability and persistence. Technology Exporation
We have explored technology that exploits database redo recovery to enable applications to persist across crashes. This permits applications to safely maintain state across multiple transactions. While forms of program persistence have been proposed, the costs have been high in logging and checkpointing. The techniques developed within Phoenix substantially reduce these costs. Further, the Phoenix techniques leverage the database's recovery mechanisms to accomplish this. While there remains an extra system cost for application persistence, Phoenix continues the trend of expending system resources so as to conserve more expensive and error-prone human resources. Because we are exploiting the database system's recovery mechanism, our approach requires the database system to wrap an application so as to capture its interactions with other system components and log its state changes. Hence, our focus is on database applications, particularly those that are close to the database system. This permits simple robust applications such as database stored procedures or, potentially, client/server database applications. It also enables masking of system failures involving application subcomponents from higher level components in a more distributed environment, such as distributed transaction processing or workflow. Prototype Systems
ODBC Persistent Sessions
Our initial system (Phoenix/ODBC) avoids the difficulty involved with making substantial changes to the internals of the database system by focusing on ODBC session availability. It provides persistent server sessions to ODBC clients, sessions that can survive a server crash without the ODBC client application being aware of the outage, except for timing considerations. This system has been demonstrated on multiple occasions, including at the 1999 SIGMOD Conference. (Click here to see the demo.) Our performance studies indicate that the system overhead for Phoenix/ODBC persistent sessions is modest. Phoenix/APP
The current focus is on recoverable middle-tier applications. The conceptual framework for this work, specified by means of interaction contracts, was published at ICDE'2002. Our prototype is built on Microsoft's .NET framework. It implements interaction contracts by exploiting the .NET interception mechanism, which captures method calls and returns among software components. These calls and returns are logged, enabling Phoenix/APP to replay the components after a system crash, and recover their states, transparent to the application program itself. Work underway explores further optimizations to reduce both logging and recovery costs. Bibliography
Some relevant references are listed below. Entries in red represent research done within this project.
|