Yang Song, Anca Sailer, and Hidayatullah Shaikh
The overwhelming amount of various monitoring and log data generated in multi-tier IT systems makes problem determination one of the most expensive and labor-intensive tasks in IT Services arena. Particularly the initial step of problem classification is complicated by error propagation making secondary problems surfacing on multiple dependent resources. In this paper, we propose to automate the process of problem classification by leveraging machine learning. The main focus is to categorize the problem a user experiences by recognizing the real root cause specificity leveraging available training data such as monitoring and logs across the systems. We transform the structure of the problem into a hierarchy using an existing taxonomy. We then propose an efficient hierarchical incremental learning algorithm which is capable of adjusting its internal local classifier parameters in realtime. Comparing to the traditional batch learning algorithms, this online solution decreases the computational complexity of the training process by learning from new instances on an incremental fashion. Our approach significantly reduces the memory required to store the training instances. We demonstrate the efficiency of our approach by learning hierarchical problem patterns for several issues occurring in distributed web applications. Experimental results show that our approach substantially outperforms previous methods.
|Published in||IEEE Transactions on Services Computing (TSC)|
|Publisher||IEEE Computer Society|