Hunting for problems with Artemis

Gabriela Cretu, Mihai Budiu, and Moises Goldszmidt

Abstract

Artemis is a modular application designed for analyzing and troubleshooting the performance of large clusters running datacenter services. Artemis is composed of four modules: (1) distributed log collection and data extraction, (2) a database storing the extracted data, (3) an interactive visualization tool for exploring the data, and (4) a plug-in interface (and a set of sample plug-ins) allowing users to implement data analysis tools including (a) the extraction and construction of new features from the basic measurements collected, and (b) the implementation and invocation of statistical and machine learning algorithms and tools. In this paper we describe each of these components and then we illustrate the power of the plug-in architecture by presenting a case-study using Artemis to analyze a Dryad application running on a 240-machine cluster.

Details

Publication typeInproceedings
Published inUSENIX Workshop on the Analysis of System Logs (WASL)
URLhttp://research.microsoft.com/users/mbudiu/wasl08.pdf
PublisherUSENIX
> Publications > Hunting for problems with Artemis