Performance Issue Diagnosis for Online Service Systems

the 31st International Symposium on Reliable Distributed Systems (SRDS'12) |

Monitoring and diagnosing performance issues of an online service system are critical to assure satisfactory perfor-mance of the system. Given a detected performance issue and collected system metrics for an online service system, engi-neers usually need to make great efforts to conduct diagnosis by first identifying performance issue beacons, which are metrics that pinpoint to the root causes. In order to reduce the manual efforts, in this paper, we propose a new approach to effectively detecting performance issue beacons to help with performance issue diagnosis. Our approach includes techniques for mining system metric data to address limita-tions when applying previous classification-based approach-es. Our evaluations on both a controlled environment and a real production environment show that our approach can more effectively identify performance issue beacons from system metric data than previous approaches.