Persistent-state Checkpoint Comparison for Troubleshooting Configuration Failures
- Yi-Min Wang ,
- Chad Verbowski ,
- Daniel R. Simon
MSR-TR-2003-28 |
Note : To appear in Proc. IEEE International Conference on Dependable Systems and Networks (DSN) , June 2003.
Application failures characterized by the phrases, “it worked yesterday, but it doesn’t work today” and “it worked on that machine, but it doesn’t work on this machine” are a major source of computer user frustration and a major component in the total cost of ownership. The typical symptom-based troubleshooting approach relies too much on creative thinking and may lead users or support technicians in directions far from the actual root cause. In this paper, we propose a state-based troubleshooting approach for configuration failures that aims at making the diagnostic process as mechanical as possible. In the narrow-down phase, we use checkpoint comparison and application tracing to determine which pieces of persistent state have changed and are affecting current application execution; ongoing self-monitoring of persistent-state changes by the machine is used to help eliminate false positives. In the solution-query phase, state-to-task mapping and searches of online databases are used to translate low-level state information into highlevel user interfaces and articles. We describe the design and implementation of a troubleshooter that uses this state-based approach and present preliminary results to demonstrate its effectiveness in diagnosing several actual configuration failures.