Almost any static analysis of programs written in languages that allow manipulation of pointer valued expressions
requires "good" pointer information to be viable. Unfortunately, pointer analysis algorithms, which produce
information about pointers, exhibit a
natural tension between good (precise) information and the ability of the pointer analysis to scale to large
programs. Precise algorithms are too slow, while fast algorithms are too imprecise.
My goal is to build error
detection tools based on static analysis for large programs. Therefore, I have concentrated
my efforts over the last two years on scalable pointer analysis. I have built an infrastructure
for pointer analysis of large (> 1MLOC) programs, and I have invented several algorithms for improving the precision
of pointer analysis.
So far, I have focused on flow-insensitive pointer analysis. I began by implementing Steensgaard's unification-based
method. This method is extremely efficient, but produces imprecise results. The alternative would be to use Andersen's
subtyping-based algorithm, which is much more precise, but expensive (cubic in the worst case, faster in practice,
but still much too slow). To tackle this problem, I have designed a new algorithm called the "one level flow" algorithm,
which achieves a compromise by using subtyping at only those areas of the points-to graph that are related to
assignments, while using unification everywhere else. The result is an algorithm that is quadratic in the worst case,
but linear in practice. On all of the integer benchmark programs from Spec95, one level flow and Andersen's algorithm
produce either identical or close to identical information.
At the same time, I have explored alternative ways to boost up the precision of unification-based pointer analysis.
One of these is to make the analysis polymorphic, or context-sensitive. I have done some work with Jakob Rehof and
Manuel Fahndrich on polymorphic pointer analysis, and recovery of points-to sets from a polymorphic analysis using
flow analysis. Some of this work is described on the home page of the
Scalable Program Analysis (SPA) Project.
I have also collaborated with Ben Liblit, a student from the University of Berkeley, on an extended version of my
one level flow algorithm. Our new GOLF (generalized one level flow) algorithm incorporates a limited about of
context-sensitivity in addition to limited subtyping.
Ben's implementation of GOLF scales easily to millions of lines of code. It is described in a paper
submitted to PLDI 2001.
In the course of my research on pointer analysis, I have invented a new method for measuring the precision of pointer analysis,
based on the notion of "alias frequency". Using this metric, we have shown that our flow-insensitive methods for
alias analysis may be close to optimal. The results of our experiments are described in the PLDI 2001 submission.
I have made my implementation of one level flow available for use by researchers outside MSR. Currently, I only
have an institutional license for use my Universities. Please send me
email if you are interested
in using my implementation. The distribution includes source code for my analysis, as well as the implementation on
which it is based (the AST Toolkit). The AST Toolkit is based on the Visual C parser and exposes abstract syntax
trees to analysis clients. You must have a Windows PC and Visual C++ to use my implementation.
I encourage you to
use my implementation to your advantage. I have made it available to so that researchers interested in a scalable
alias analysis for use in their work do not have to spend time reimplementing pointer analyses from scratch. At the
same time, those interested in improving the state of the art in pointer analysis research can save a fair amount
of start-up time.
I believe that the results of our unification-based analysis are now at the point where we can begin to think about
using this pointer information in error detection tools. However, I am also certain that there remain avenues for
improving the precision of one level flow in a scalable manner. This leads to a two-point research plan:
-
The first research direction is to use the results of pointer analysis in error detection tools and optimizing
compilers. One application
I am currently examining is error detection based on path simulation. Path simulation is an
expensive but precise technique for identifying errors in programs. The basic idea is to unroll the control flow of
a program, examining each path in isolation. Current methods for path simulation ignore pointer aliasing, because of
the lack of good pointer information. Error detection tools have the advantage that they can be useful even
if they are unsafe. Therefore, a pointer analysis used in an error detection tool can make some unsafe assumptions
that improve precision considerably. I am also interested in applying pointer analysis to perform memory disambiguation
in optimizing compilers. The challenge here is that there are stringent requirements on the safety of the pointer
analysis.
-
The second research direction is to improve the precision of pointer analysis, so as to improve the behaviour of
the client error detection tools. My experiences lead me to believe that context-sensitivity is much more useful
for pointer analysis than control flow-sensitivity. I am currently working on extending one level flow to extract
context-sensitive information. I also believe that context-sensitivity pointer analysis is useful only if clients
of pointer analysis are willing to extract side effects from called procedures in a context-sensitive manner (or some
approximation of it, for instance 1 CFA). I am also working on adding structure layouts to one level flow.
My plan is to use the same methodology that led to successful results with the one level flow algorithm: Look at large
amounts of source code from the target programs, and find common cases that expose the limitations of the current
algorithm. Then extend the algorithm in a principled manner, in order to account for these common cases.
-
Static Analysis of Large Programs: Some Experiences
(html version)
Manuvir Das
Invited Talk, PEPM '00: 2000 ACM SIGPLAN Workshop on Partial Evaluation and Semantics-Based Program Manipulation,
Boston, MA, USA, 22-23 January 2000.
Host: Julia Lawall
-
Towards Scalable Pointer Analysis
Manuvir Das
Invited Talk, University of California-Berkeley, 14 February 2000.
Host: Alex Aiken
-
Towards Scalable Pointer Analysis
Manuvir Das
Invited Talk, University of Wisconsin-Madison, 2 March 2000.
Host: Thomas Reps
-
Towards Scalable Pointer Analysis
Manuvir Das
Invited Talk, University of Washington, 10 May 2000.
Host: Susan Eggers
-
Unification-Based Pointer Analysis with Directional Assignments
Manuvir Das
PLDI, Vancouver, 19 June 2000.
-
Estimating the Impact of Scalable Pointer Analysis on Optimization
Manuvir Das, Ben Liblit, Manuel Fahndrich, and Jakob Rehof,
SAS '01: The 8th International Static Analysis Symposium,
July 2001.
(also published as Microsoft Research Technical Report MSR-TR-2001-20)
-
Dynamic Points-to Sets: A Comparison with Static Analyses and Potential Applications in Program Understanding and Optimization
Markus Mock, Manuvir Das, Craig Chambers, and Susan Eggers,
PASTE '01: Workshop on Program Analysis for Software Tools and Engineering,
June 2001.
(also published as Microsoft Research Technical Report MSR-TR-2001-38 and University of Washington CSE Technical Report UW-CSE-01-03-01)
-
Unification-Based Pointer Analysis with Directional Assignments
Manuvir Das,
PLDI '00: Proceedings of the ACM SIGPLAN 2000 Conference on
Programming Language Design and Implementation,
Vancouver, BC, Canada, June 2000.
-
Scalable Context-Sensitive Flow Analysis Using Instantiation Constraints
Manuel Fahndrich, Jakob Rehof, and Manuvir Das,
PLDI '00: Proceedings of the ACM SIGPLAN 2000 Conference on
Programming Language Design and Implementation,
Vancouver, BC, Canada, June 2000.
-
From Polymorphic Subtyping to CFL Reachability:
Context-Sensitive Flow Analysis Using Instantiation
Constraints
Manuel Fahndrich, Jakob Rehof, and Manuvir Das,
Microsoft Technical Report MSR-TR-99-84, Microsoft Corporation,
November 1999.
|