Chao Liu, Xiangyu Zhang, and Jiawei Han
Software end users are the best testers, who keep revealing bugs in software that has undergone rigorous in-house testing. In order to leverage their testing efforts, failure reporting components have been widely deployed in released software: The Microsoft Dr. Watson System  and the Mozilla Quality Feedback Agent  are the two most typical examples. Many utilities of the collected failure data depend on an effective failure indexing technique, which, in the optimal case, would index all failures caused by the same bug together. Unfortunately, the problem of failure proximity, which underpins the effectiveness of an indexing technique, has not been systematically studied. This paper presents the first systematic study of failure proximity. A failure proximity consists of two components: a fingerprinting function that extracts signatures from failures and a distance function that calculates (from the extracted signatures) the likelihood of two failures being due to the same bug. By considering different instantiations of the two functions, we study an array of six failure proximities (two of them are new) in this paper. These proximities range from the simplest approach which checks failure points to the most sophisticated approach which utilizes fault localization algorithms to extract failure signatures. Besides presenting technical details of each proximity, we also study the properties of each proximity and trade-offs between proximities. Altogether these deliver a systematic view of failure proximity. For fair comparison, this study proposes the first set of evaluation metrics that objectively quantifies the effectiveness of different failure proximities. We carry out three case studies of the six proximities on three mid-sized programs (namely, flex, grep, and gzip) and evaluate their effectiveness using the proposed metrics. The experimental result clearly validates our identified properties and trade-offs. In summary, this study not only presents a systematic study of six failure proximities, the problem formulation, the proposed metrics, and the experimental result, but would also help guide further investigation in the future.
|Published in||IEEE Transactions on Software Engineering|
|Address||Los Alamitos, CA, USA|
|Publisher||IEEE Computer Society|
Copyright © 2007 IEEE. Reprinted from IEEE Computer Society. This material is posted here with permission of the IEEE. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to firstname.lastname@example.org. By choosing to view this document, you agree to all provisions of the copyright laws protecting it.