Almost any static analysis of programs written in languages that allow manipulation of pointer valued expressions requires "good" pointer information to be viable. Unfortunately, pointer analysis algorithms, which produce information about pointers, exhibit a natural tension between good (precise) information and the ability of the pointer analysis to scale to large programs. Precise algorithms are too slow, while fast algorithms are too imprecise.

My goal is to build error detection tools based on static analysis for large programs. Therefore, I have concentrated my efforts over the last two years on scalable pointer analysis. I have built an infrastructure for pointer analysis of large (> 1MLOC) programs, and I have invented several algorithms for improving the precision of pointer analysis.


So far, I have focused on flow-insensitive pointer analysis. I began by implementing Steensgaard's unification-based method. This method is extremely efficient, but produces imprecise results. The alternative would be to use Andersen's subtyping-based algorithm, which is much more precise, but expensive (cubic in the worst case, faster in practice, but still much too slow). To tackle this problem, I have designed a new algorithm called the "one level flow" algorithm, which achieves a compromise by using subtyping at only those areas of the points-to graph that are related to assignments, while using unification everywhere else. The result is an algorithm that is quadratic in the worst case, but linear in practice. On all of the integer benchmark programs from Spec95, one level flow and Andersen's algorithm produce either identical or close to identical information.

At the same time, I have explored alternative ways to boost up the precision of unification-based pointer analysis. One of these is to make the analysis polymorphic, or context-sensitive. I have done some work with Jakob Rehof and Manuel Fahndrich on polymorphic pointer analysis, and recovery of points-to sets from a polymorphic analysis using flow analysis. Some of this work is described on the home page of the Scalable Program Analysis (SPA) Project.

I have also collaborated with Ben Liblit, a student from the University of Berkeley, on an extended version of my one level flow algorithm. Our new GOLF (generalized one level flow) algorithm incorporates a limited about of context-sensitivity in addition to limited subtyping. Ben's implementation of GOLF scales easily to millions of lines of code. It is described in a paper submitted to PLDI 2001.

In the course of my research on pointer analysis, I have invented a new method for measuring the precision of pointer analysis, based on the notion of "alias frequency". Using this metric, we have shown that our flow-insensitive methods for alias analysis may be close to optimal. The results of our experiments are described in the PLDI 2001 submission.





I have made my implementation of one level flow available for use by researchers outside MSR. Currently, I only have an institutional license for use my Universities. Please send me email if you are interested in using my implementation. The distribution includes source code for my analysis, as well as the implementation on which it is based (the AST Toolkit). The AST Toolkit is based on the Visual C parser and exposes abstract syntax trees to analysis clients. You must have a Windows PC and Visual C++ to use my implementation.

I encourage you to use my implementation to your advantage. I have made it available to so that researchers interested in a scalable alias analysis for use in their work do not have to spend time reimplementing pointer analyses from scratch. At the same time, those interested in improving the state of the art in pointer analysis research can save a fair amount of start-up time.


I believe that the results of our unification-based analysis are now at the point where we can begin to think about using this pointer information in error detection tools. However, I am also certain that there remain avenues for improving the precision of one level flow in a scalable manner. This leads to a two-point research plan:

  • The first research direction is to use the results of pointer analysis in error detection tools and optimizing compilers. One application I am currently examining is error detection based on path simulation. Path simulation is an expensive but precise technique for identifying errors in programs. The basic idea is to unroll the control flow of a program, examining each path in isolation. Current methods for path simulation ignore pointer aliasing, because of the lack of good pointer information. Error detection tools have the advantage that they can be useful even if they are unsafe. Therefore, a pointer analysis used in an error detection tool can make some unsafe assumptions that improve precision considerably. I am also interested in applying pointer analysis to perform memory disambiguation in optimizing compilers. The challenge here is that there are stringent requirements on the safety of the pointer analysis.

  • The second research direction is to improve the precision of pointer analysis, so as to improve the behaviour of the client error detection tools. My experiences lead me to believe that context-sensitivity is much more useful for pointer analysis than control flow-sensitivity. I am currently working on extending one level flow to extract context-sensitive information. I also believe that context-sensitivity pointer analysis is useful only if clients of pointer analysis are willing to extract side effects from called procedures in a context-sensitive manner (or some approximation of it, for instance 1 CFA). I am also working on adding structure layouts to one level flow.
My plan is to use the same methodology that led to successful results with the one level flow algorithm: Look at large amounts of source code from the target programs, and find common cases that expose the limitations of the current algorithm. Then extend the algorithm in a principled manner, in order to account for these common cases.