PAD: Performance Anomaly Detection in Multi-Server Distributed Systems

7th IEEE International Conference on Cloud Computing (IEEE Cloud 2014) |

Published by IEEE – Institute of Electrical and Electronics Engineers

Ref: CLOUD2014-2097

Publication | Publication

Multi-server distributed systems are becoming increasingly popular with the emergence of cloud computing. These systems need to provide high throughput with low latency, which is a difficult task to achieve. Manual performance tuning and diagnosis of such systems, however, is hard as the amount of relevant performance diagnosis data is large. To help system developers with performance diagnosis, we have developed a tool called Performance Anomaly Detector (PAD). PAD combines user-driven navigation analysis with automatic correlation and comparative analysis techniques. The combination results in a powerful tool that can help find a number of performance anomalies. Based on our experience in applying PAD to the Orleans system, we discovered that PAD was able to reduce developer time and effort detecting anomalous performance cases and improve a developer’s ability to perform deeper analysis of such behaviors.