Anomaly Detection in Large Networks using Approximation Techniques

A tremendous enthusiasm for amassing enormous amounts of network measurement data has spurred the development of numerous applications that incorporate data mining techniques. In this talk we question the hidden assumption in these applications that one needs to collect “all the data all the time”. We consider this question in the context of an anomaly detection application. We study the popular “Subspace method detector” that is based on PCA analysis. This method normally collects data from many parts of the network, centralizes the data, and then analyzes it to uncover anomalies. In our research, we ask whether we can throw away some of the data. Can we still do anomaly detection accurately without all the data?

To avoid backhauling large amounts of data through networks, we present a framework that couples filtering at local monitors with centralized detectors that can operate on approximate views of the global data (i.e. network state). We show that the errors made by the central detector – due to the use of approximate data – can be upper bounded using matrix perturbation theory. The challenge is to design the filtering parameters; these are determined by the bound on detection errors and the criteria being tracked for detection. Our approximate anomaly detector can detect anomalies with 80 to 90% less data than the original method, and incurs less than a 1% reduction in detection accuracy. Finally, we comment on issues and future directions for data reduction in the context of anomaly detection.

Speaker Details

Nina Taft is currently a senior research scientist at Intel Research Berkeley. At Intel she works focuses on enterprise network traffic characterization, anomaly detection at both the enterprise and host level, and approximation techniques for data mining algorithms. Prior to joining Intel, Nina worked at Sprint for 5 years in the IP Group working on backbone Internet measurement. She conducted research in various areas such as traffic matrix estimation, traffic characterization, routing protocols and IP-over-WDM network design problems. Prior to Sprint, Nina worked at SRI International for four years. There her work focused on congestion control and QoS routing in ATM networks. She received her PhD degree from the University of Berkeley in 1994. Nina is currently serving as an associate editor for the IEEE Transactions on Networking (ToN) journal, was SIGCOMM 2007 PC co-Chair, and is a member of the ACM Internet Measurement Conference (IMC) steering committee.

Date:
Speakers:
Nina Taft
Affiliation:
Intel Research Berkeley