Mark Gabel, Junfeng Yang, Yuan Yu, Moises Goldszmidt, and Zhendong Su
Software developers often duplicate source code to replicate functionality. This practice can hinder the maintenance of a software project: bugs may arise when two identical code segments are edited inconsistently. This paper presents DejaVu, a highly scalable system for detecting these general syntactic inconsistency bugs. DejaVu operates in two phases. Given a target code base, a parallel inconsistent clone analysis first enumerates all groups of source code fragments that are similar but not identical. Next, an extensible buggy change analysis framework refines these results, separating each group of inconsistent fragments into a fine-grained set of inconsistent changes and classifying each as benign or buggy. On a 75+ million line pre-production commercial code base, DejaVu executed in under five hours and produced a report of over 8,000 potential bugs. Our analysis of a sizable random sample suggests with high likelihood that at this report contains at least 2,000 true bugs and 1,000 code smells. These bugs draw from a diverse class of software defects and are often simple to correct: syntactic inconsistencies both indicate problems and suggest solutions.
|Published in||ACM International Conference on Systems, Programming, Languages, and Applications: Software for Humanity, OOPSLA Research Papers Track (SPLASH/OOPSLA)|