*
Quick Links|Home|Worldwide
Microsoft*
Search for



Phoenix: Making Applications Robust

People

David Lomet

Roger Barga

The Problem

Dealing with errors or exceptions is a very large part of getting applications right. Failures are not only an application programming problem but an operational and an availability problem as well. The Phoenix project is an effort to increase the availability of applications and in many cases avoid the operational task of coping with an error.

System Crashes

Database systems recover the database to the last committed transaction. Incomplete transactions are aborted. While database state is recovered, the states of applications using the database, and their surrounding sessions are "blown away" (erased). This behavior results in longer outages. Our intent is to reduce the period of unavailability by extending database recovery to include session and application state. This will also enable stateful applications to survive failures and continue execution.

Logical Errors

Transactions abort for logical errors as well as crashes. Aborting transactions in these cases means undoing back to transaction start. In the future we would like to extend database style recovery to support partial rollback as a result of application errors, where the rollback resets not only database state (already supported by savepoints) but also application state. This is compensation, of the multi-level transaction form, that includes application state.

The Project

In Phoenix, a project within the Microsoft Research Database Group, we have focused first on application availability and persistence. 

Technology Exporation

We have explored technology that exploits database redo recovery to enable applications to persist across crashes. This permits applications to safely maintain state across multiple transactions. While forms of program persistence have been proposed, the costs have been high in logging and checkpointing. The techniques developed within Phoenix substantially reduce these costs. Further, the Phoenix techniques leverage the database's recovery mechanisms to accomplish this. While there remains an extra system cost for application persistence, Phoenix continues the trend of expending system resources so as to conserve more expensive and error-prone human resources.

Because we are exploiting the database system's recovery mechanism, our approach requires the database system to wrap an application so as to capture its interactions with other system components and log its state changes. Hence, our focus is on database applications, particularly those that are close to the database system. This permits simple robust applications such as database stored procedures or, potentially, client/server database applications. It also enables masking of system failures involving application subcomponents from higher level components in a more distributed environment, such as distributed transaction processing or workflow.

Prototype Systems
ODBC Persistent Sessions

Our initial system (Phoenix/ODBC) avoids the difficulty involved with making substantial changes to the internals of the database system by focusing on ODBC session availability. It provides persistent server sessions to ODBC clients, sessions that can survive a server crash without the ODBC client application being aware of the outage, except for timing considerations.  This system has been demonstrated on multiple occasions, including at the 1999 SIGMOD Conference. (Click here to see the demo.) Our performance studies indicate that the system overhead for Phoenix/ODBC persistent sessions is modest.

Phoenix/APP

The current focus is on recoverable middle-tier applications. The conceptual framework for this work, specified by means of interaction contracts, was published at ICDE'2002. Our prototype is built on Microsoft's .NET framework. It implements interaction contracts by exploiting the .NET interception mechanism, which captures method calls and returns among software components. These calls and returns are logged, enabling Phoenix/APP to replay the components after a system crash, and recover their states, transparent to the application program itself. Work underway explores further optimizations to reduce both logging and recovery costs.

Bibliography

Some relevant references are listed below. Entries in red represent research done within this project.

  1. Lomet, D. Persistent Middle Tier Components without Logging. IDEAS Conference, Montreal, Canada (July 2005) 36-47. PDF, .09MB
  2. Lomet, D. Robust Web Services via Interaction Contracts. TES'04 Workshop (2004) 1-14. pdf, .09MB
  3. Barga, R., Chen, S. and Lomet, D. Improving Logging and Recovery Performance in Phoenix/App ICDE Conference, Boston, MA (March 2004) (to appear) Word .34MB
  4. Barga, R.,Lomet, D., Shegalov, G., and Weikum, G. Recovery Guarantees for Internet Applications ACM Trans. on Internet Technology (2004) (to appear). PDF document, .36MB
  5. Barga, R., Lomet, D., Paparizos, S., Yu, H., and Chandresekaran, S.: Persistent Applications Via Automatic Recovery. IDEAS Conference, (July 2003) (to appear). PDF .310MB
  6. Lomet, D. and Tuttle, M.: A Theory of Redo Recovery. SIGMOD Conference, San Diego, CA (June 2003) 397-406. PDF, .132MB
  7. Barga, R.: Phoenix Application Recovery Project . Data Engineering Bulletin 25,4 (Dec. 2002)27-31. PDF .03MB
  8. Shegalov, G., Weikum, G., Barga, R., Lomet, D.: EOS: Exactly-Once E-Service Middleware. VLDB Conference, Hong Kong, China (August 2002) 1043-1046. PDF .2MB
  9. Barga, R., Lomet, D.: Phoenix Project: Fault Tolerant Applications. SIGMOD Record 31, 2 (June 2002) 94-100. PDF .13MB
  10. Barga, R., Lomet, D. and Weikum, G.: Recovery Guarantees for Multi-tier Applications. ICDE Conference, San Jose, CA (March 2002) 543-554 Word .34MB
  11. Barga, R. and Lomet, D.: Measuring and Optimizing a System for Persistent Database Sessions. ICDE Conference, Heidelberg, Germany (April 2001) 21-30. PDF .17MB
  12. Lomet, D.: High Speed On-line Backup When Using Logical Log Operations. SIGMOD Conference Dallas, TX (May, 2000) 34-45. PDF .22MB
  13. Barga, R., Lomet, D., Baby, T., and Agrawal, S.: Persistent Client-Server Database Sessions. EDBT Conference, Lake Constance, Germany (Mar. 2000) 462-477. Word .331MB
  14. Barga, R., Lomet, D.: Measuring and Optimizing a System for Persistent Database Sessions. ICDE Conference(April 2001) 21-30. PDF .17MB
  15. Barga, R., Lomet, D., Baby, T., and Agrawal, S.: Persistent Client-Server Database Sessions. EDBT Conference, Konstanz, Germany (March 2000) 462-477. Word .195MB
  16. Barga, R. and Lomet, D.: Phoenix: Making Applications Robust. SIGMOD Conference, Philadelphia, PA (June 1999) 562-564. PDF .058MB
  17. Lomet, D.: Logical Logging to Extend Recovery to New Domains. SIGMOD Conference, Philadelphia, PA (June 1999) 73-84. postscript 1.232MB
  18. Lomet, D., and Weikum, G.: Efficient Transparent Application Recovery in Client-Server Information Systems. SIGMOD Conference, Seattle, WA (June 1998) 460-471. postscript 1.074MB;(best paper award) Technical Report with appendices Word .570MB.
  19. Lomet, D.: Persistent Applications Using Generalized Redo Recovery. ICDE Conference, Orlando, FL (Feb. 1998) 154-163; postscript, .23MB
  20. Lomet, D.: Application Recovery: Advances Toward an Elusive Goal. HPTS Workshop Asilomar, CA (September, 1997) Word .19MB
  21. Lomet, D. and Tuttle, M.: Redo Recovery after System Crashes. VLDB Conference Zurich, Switzerland (Sept. 1995) 457-468. postscript .274 MB
  22. Lomet, D.: MLR: A recovery method for multi-level systems. SIGMOD Conference (June 1992) 185-194. postscript .214 MB
  23. Weikum, G.: A theoretical foundation of multi-level concurrency control. PODS Conference (March 1986) 31-42.
  24. Randell, B.: System structure for software fault tolerance. IEEE Trans. on Software Eng. SE-1, 2 (June 1975) 220-232.

©2008 Microsoft Corporation. All rights reserved. Terms of Use |Trademarks |Privacy Statement