Magpie

Magpie extracts the resource usage and control path of individual requests in a distributed system. It currently runs on a typical e-commerce web farm setup comprising IIS, ASP.NET and SQL Server. Events produced by the live system are correlated to extract the individual requests using a temporal join technique. The requests are then clustered according to resource consumption and behaviour in order to construct concise workload models.

Overview

Automated tools for understanding system performance are essential for many management and maintenance tasks. Performance problems are hard to diagnose and subtle to debug, and constructing accurate models of a system's workload is difficult, yet critical for performance prediction and capacity planning. Magpie is a toolchain that helps to understand system behaviour by automatically extracting individual requests from a live system, and then constructing a probabilistic workload model from this data.

The toolchain relies on instrumentation in kernel, middleware and application-level components to generate events. A Magpie event consumer (the "parser") correlates these to extract individual requests from the event stream. It runs online or offline, and a platform-specific schema specifies how the events are related for the particular type of request being tracked. The events in each request indicate its flow of control, its internal synchronization points, and its resource consumption (cpu, disk, network) at each stage.

Once the raw requests have been extracted by the parser, they are canonicalized by removing scheduling artefacts in preparation for clustering. A "string-edit-distance" comparison groups together requests with similar behaviour, from the perspective of request structure, synchronization points and resource consumption. The representative request from each cluster, together with the relative weight of the cluster, give a concise and accurate model of the workload.

 

Publications