Merlin: Improving the Quality of Security Static Analysis Tools

Merlin focuses on inferring a better specification to static analysis tools for security such as Microsoft CAT.NET. These tools are designed to look for bugs such as cross-site scripting and SQL injection. With the help of Merlin we find more and heigher quality vulnerabilities.


The last several years have seen a proliferation of static and runtime analysis tools for finding security violations that are caused by explicit information flow in programs. Much of this interest has been caused by the increase in the number of vulnerabilities such as cross-site scripting and SQL injections. Tools checking for these vulnerabilities require a specification to operate. In most cases the task of providing such a specification is delegated to the user. Moreover, the efficacy of these tools is only as good as the specification. Unfortunately, writing a comprehensive specification presents a major challenge: parts of the specification are easy to miss leading to missed vulnerabilities; similarly, incorrect specification may lead to false positives.

This paper proposes Merlin, a new algorithm for inferring explicit information flow specifications. Such specifications greatly reduce manual labor, and enhance the quality of results. Beginning with a data propagation graph, Merlin infers an information flow specification. Merlin models information flow paths in the propagation graph using probabilistic constraints. For scalability, we approximate these path constraints using constraints on chosen triples of nodes, resulting in a cubic number of constraints. We characterize this approximation as a probabilistic abstraction, using the theory of probabilistic refinement. We solve the resulting system using factor graphs, a well-known structure for probabilistic inference.

We experimentally validate the Merlin approach by applying it to 10 large business-critical Web applications that have been analyzed with CAT.NET, a state-of-the-art static analysis tool for .NET. We find a total of 167 new confirmed specifications, which result in a total of 302 additional vulnerabilities across the 10 benchmarks. More accurate specifications also reduce the false positive rate: in our experiments, Merlin-inferred specifications result in 13 false positives being removed. This constitutes a 15% reduction in the CAT.NET false positive rate on these 10 programs.