Magpie: online modelling and performance-aware systems

9th Workshop on Hot Topics in Operating Systems (HotOS-IX) |

Published by USENIX

Publication

Understanding the performance of distributed systems requires correlation of thousands of interactions between numerous components—a task best left to a computer. Today’s systems provide voluminous traces from each component but do not synthesise the data into concise models of system performance. We argue that online performance modelling should be a ubiquitous operating system service and outline several uses including performance debugging, capacity planning, system tuning and anomaly detection. We describe the Magpie modelling service which collates detailed traces from multiple machines in an e-commerce site, extracts request-specific audit trails, and constructs probabilistic models of request behaviour. A feasibility study evaluates the approach using an offline demonstrator. Results show that the approach is promising, but that there are many challenges to building a truly ubiquitious, online modelling infrastructure.