Qiang FU, Jian-Guang LOU, Yi WANG, and Jiang LI
9 December 2009
Detection of execution anomalies is very important for the maintenance, development, and performance refinement of large scale distributed systems. Execution anomalies include both work flow errors and low performance problems. People often use system logs produced by distributed systems for troubleshooting and problem diagnosis. However, manually inspecting system logs to detect anomalies is unfeasible due to the increasing scale and complexity of distributed systems. Therefore, there is a great demand for automatic anomalies detection techniques based on log analysis. In this paper, we propose an unstructured log analysis technique for anomalies detection. In the technique, we propose a novel algorithm to convert free form text messages in log files to log keys without any application specific knowledge. The log keys correspond to the log-print statements in the source code which can provide cues of system execution behavior. After converting log messages to log keys, we learn a Finite State Automaton (FSA) from training log sequences to present the normal work flow for each system component. At the same time, a performance measurement model is learned to characterize the normal execution performance based on the log messages’ timing information. With these learned models, we can automatically detect anomalies in newly input log files. Experi-ments on Hadoop and SILK (a distributed computing system) show that the technique can effectively detect running anomalies.
In International conference on Data Mining (full paper)
© 2009 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. http://www.ieee.org/