Automating datacenter operations using Machine Learning

Today’s datacenters run many complex and large-scale Web applications that are very difficult to manage. The main challenges are understanding user workloads and application performance, and quickly identifying and resolving performance problems. Statistical Machine Learning (SML) provides a methodology for quickly processing the large quantities of monitoring data generated by these applications, finding repeating patterns in their behavior, and building accurate models of their performance. In this talk, I will argue that SML is a necessary tool for simplifying and automating datacenter operations and I will demonstrate application of SML to two important problems in this area:
quick and accurate identification of recurring performance problems and characterization and synthesis of workload spikes.

Speaker Details

Peter Bodik is a graduate student at the EECS Department of UC Berkeley working in the RAD Lab on applying Statistical Machine Learning techniques to problems in datacenter operations. In particular, he has been working on building accurate performance models of Web applications, using control theory for dynamic resource allocation and characterizing and modeling of workload spikes. He interned at Amazon.com and Microsoft Research where he worked on building tools that help datacenter operators quickly identify and resolve performance problems.

Date:
Speakers:
Peter Bodik
Affiliation:
UC Berkeley