Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis

Qiang FU, Jian-Guang LOU, Yi WANG, and Jiang LI

Abstract

Detection of execution anomalies is very important for the maintenance, development, and performance refinement of large scale distributed systems. Execution anomalies include both work flow errors and low performance problems. People often use system logs produced by distributed systems for troubleshooting and problem diagnosis. However, manually inspecting system logs to detect anomalies is unfeasible due to the increasing scale and complexity of distributed systems. Therefore, there is a great demand for automatic anomalies detection techniques based on log analysis. In this paper, we propose an unstructured log analysis technique for anomalies detection. In the technique, we propose a novel algorithm to convert free form text messages in log files to log keys without any application specific knowledge. The log keys correspond to the log-print statements in the source code which can provide cues of system execution behavior. After converting log messages to log keys, we learn a Finite State Automaton (FSA) from training log sequences to present the normal work flow for each system component. At the same time, a performance measurement model is learned to characterize the normal execution performance based on the log messages’ timing information. With these learned models, we can automatically detect anomalies in newly input log files. Experi-ments on Hadoop and SILK (a distributed computing system) show that the technique can effectively detect running anomalies.

Details

Publication typeInproceedings
Published inInternational conference on Data Mining (full paper)
PublisherIEEE
> Publications > Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis