A Systematic Study of Failure Proximity

Chao Liu, Xiangyu Zhang, and Jiawei Han


Software end users are the best testers, who keep revealing bugs in software that has undergone rigorous in-house testing. In order to leverage their testing efforts, failure reporting components have been widely deployed in released software: The Microsoft Dr. Watson System [1] and the Mozilla Quality Feedback Agent [2] are the two most typical examples. Many utilities of the collected failure data depend on an effective failure indexing technique, which, in the optimal case, would index all failures caused by the same bug together. Unfortunately, the problem of failure proximity, which underpins the effectiveness of an indexing technique, has not been systematically studied. This paper presents the first systematic study of failure proximity. A failure proximity consists of two components: a fingerprinting function that extracts signatures from failures and a distance function that calculates (from the extracted signatures) the likelihood of two failures being due to the same bug. By considering different instantiations of the two functions, we study an array of six failure proximities (two of them are new) in this paper. These proximities range from the simplest approach which checks failure points to the most sophisticated approach which utilizes fault localization algorithms to extract failure signatures.

Besides presenting technical details of each proximity, we also study the properties of each proximity and trade-offs between proximities. Altogether these deliver a systematic view of failure proximity. For fair comparison, this study proposes the first set of evaluation metrics that objectively quantifies the effectiveness of different failure proximities. We carry out three case studies of the six proximities on three mid-sized programs (namely, flex, grep, and gzip) and evaluate their effectiveness using the proposed metrics. The experimental result clearly validates our identified properties and trade-offs. In summary, this study not only presents a systematic study of six failure proximities, the problem formulation, the proposed metrics, and the experimental result, but would also help guide further investigation in the future.


Publication typeArticle
Published inIEEE Transactions on Software Engineering
AddressLos Alamitos, CA, USA
PublisherIEEE Computer Society
> Publications > A Systematic Study of Failure Proximity