Peter Bodik, Moises Goldszmidt, and Armando Fox
Previous work showed that statistical analysis techniques could successfully be used to construct compact signatures of distinct operational problems in Internet server systems. Because signatures are amenable to well-known similarity search techniques, they can be used as a way to index past problems and identify particular operational problems as new or recurrent. In this paper we use a different statistical technique for constructing signatures (logistic regression with L1 regularization) that improves on previous work in two ways. First, our new approach works for cases where the number of features is an order of magnitude larger than the number of samples and also scales to problems with over 50,000 samples. Second, we get encouraging results regarding the stability of the models and the signatures by cross-validating the accuracy of the models from one section of the data center on another section. We validate our approach on data from an Internet service testbed and also from a production enterprise system comprising hundreds of servers in several data centers.
In Usenix Workshop on Tackling Computer Systems Problems with Machine Learning Techniques
All copyrights reserved by USENIX 2007