Laura Dietz and Valentin Dallmeier
Of all software development activities, debuggingâ€”locating the defective source code statements that cause a failureâ€”can be by far the most time-consuming. We employ probabilistic modeling to support programmers in finding defective code. Most defects are identifiable in control flow graphs of software traces. A trace is represented by a sequence of code positions (line numbers in source filenames) that are executed when the software runs. The control flow graph represents the finite state machine of the program, in which states depict code positions and arcs indicate valid follow up code positions. In this work, we extend this definition towards an n-gram control flow graph, where a state represents a fragment of subsequent code positions, also referred to as an n-gram of code positions. We devise a probabilistic model for such graphs in order to infer code positions in which anomalous program behavior can be observed. This model is evaluated on real world data obtained from the open source AspectJ project and compared to the well known multinomial and multi-variate Bernoulli model.
In NIPS Workshop on Analyzing Graphs: Theory and Applications