
 |
lmost
any static analysis of programs written in languages
that allow manipulation of pointer valued
expressions requires "good" pointer
information to be viable. Unfortunately, pointer
analysis algorithms, which produce information about
pointers, exhibit a natural tension between good
(precise) information and the ability of the pointer
analysis to scale to large programs. Precise
algorithms are too slow, while fast algorithms are
too imprecise.
My goal is to build error detection tools based
on static analysis for large programs. Therefore, I
have concentrated my efforts over the last two years
on scalable pointer analysis. I have built an
infrastructure for pointer analysis of large (>
1MLOC) programs, and I have invented several
algorithms for improving the precision of pointer |

 |
o far, I have focused on
flow-insensitive pointer analysis. I began by
implementing Steensgaard's unification-based method.
This method is extremely efficient, but produces
imprecise results. The alternative would be to use Andersen's
subtyping-based algorithm, which is much
more precise, but expensive (cubic in the worst
case, faster in practice, but still much too slow).
To tackle this problem, I have designed a new
algorithm called the "one level flow"
algorithm, which achieves a compromise by using
subtyping at only those areas of the points-to graph
that are related to assignments, while using
unification everywhere else. The result is an
algorithm that is quadratic in the worst case, but
linear in practice. On all of the integer benchmark
programs from Spec95, one level flow and Andersen's
algorithm produce either identical or close to
identical information.
At the same time, I have explored alternative
ways to boost up the precision of unification-based
pointer analysis. One of these is to make the
analysis polymorphic, or context-sensitive. I have
done some work with Jakob Rehof and Manuel Fahndrich
on polymorphic pointer analysis, and recovery of
points-to sets from a polymorphic analysis using
flow analysis. Some of this work is described on the
home page of the Scalable
Program Analysis (SPA) Project.
I have also collaborated with Ben Liblit, a
student from the University of Berkeley, on an
extended version of my one level flow algorithm. Our
new GOLF (generalized one level flow) algorithm
incorporates a limited about of context-sensitivity
in addition to limited subtyping. Ben's
implementation of GOLF scales easily to millions of
lines of code. It is described in a paper submitted
to PLDI 2001.
In the course of my research on pointer analysis,
I have invented a new method for measuring the
precision of pointer analysis, based on the notion
of "alias frequency". Using this metric,
we have shown that our flow-insensitive methods for
alias analysis may be close to optimal. The results
of our experiments are described in the PLDI 2001
submission. |

 |
have made my implementation of one
level flow available for use by researchers outside
MSR. Currently, I only have an institutional license
for use my Universities. Please send me email
if you are interested in using my implementation.
The distribution includes source code for my
analysis, as well as the implementation on which it
is based (the AST Toolkit). The AST Toolkit is based
on the Visual C parser and exposes abstract syntax
trees to analysis clients. You must have a Windows
PC and Visual C++ to use my implementation.
I encourage you to use my implementation to your
advantage. I have made it available to so that
researchers interested in a scalable alias analysis
for use in their work do not have to spend time
reimplementing pointer analyses from scratch. At the
same time, those interested in improving the state
of the art in pointer analysis research can save a
fair amount of start-up time. |

 |
believe that the results of our
unification-based analysis are now at the point
where we can begin to think about using this pointer
information in error detection tools. However, I am
also certain that there remain avenues for improving
the precision of one level flow in a scalable
manner. This leads to a two-point research plan:
- The first research direction is to use the
results of pointer analysis in error detection
tools and optimizing compilers. One application
I am currently examining is error detection
based on path simulation. Path simulation is an
expensive but precise technique for identifying
errors in programs. The basic idea is to unroll
the control flow of a program, examining each
path in isolation. Current methods for path
simulation ignore pointer aliasing, because of
the lack of good pointer information. Error
detection tools have the advantage that they can
be useful even if they are unsafe. Therefore, a
pointer analysis used in an error detection tool
can make some unsafe assumptions that improve
precision considerably. I am also interested in
applying pointer analysis to perform memory
disambiguation in optimizing compilers. The
challenge here is that there are stringent
requirements on the safety of the pointer
analysis.
- The second research direction is to improve
the precision of pointer analysis, so as to
improve the behaviour of the client error
detection tools. My experiences lead me to
believe that context-sensitivity is much more
useful for pointer analysis than control
flow-sensitivity. I am currently working on
extending one level flow to extract
context-sensitive information. I also believe
that context-sensitivity pointer analysis is
useful only if clients of pointer analysis are
willing to extract side effects from called
procedures in a context-sensitive manner (or
some approximation of it, for instance 1 CFA). I
am also working on adding structure layouts to
one level flow.
My plan is to use the same methodology that led to
successful results with the one level flow
algorithm: Look at large amounts of source code from
the target programs, and find common cases that
expose the limitations of the current algorithm.
Then extend the algorithm in a principled manner, in
order to account for these common cases. |

 |
- Static
Analysis of Large Programs: Some
Experiences (html
version)
Manuvir Das
Invited Talk, PEPM '00: 2000 ACM
SIGPLAN Workshop on Partial
Evaluation and Semantics-Based
Program Manipulation, Boston,
MA, USA, 22-23 January 2000.
Host: Julia Lawall
- Towards
Scalable Pointer Analysis
Manuvir Das
Invited Talk, University of
California-Berkeley, 14 February
2000.
Host: Alex Aiken
- Towards Scalable Pointer
Analysis
Manuvir Das
Invited Talk, University of
Wisconsin-Madison, 2 March 2000.
Host: Thomas Reps
- Towards
Scalable Pointer Analysis
Manuvir Das
Invited Talk, University of
Washington, 10 May 2000.
Host: Susan Eggers
- Unification-Based
Pointer Analysis with
Directional Assignments
Manuvir Das
PLDI, Vancouver, 19 June 2000.
|
|
|

 |
- Estimating
the Impact of Scalable Pointer
Analysis on Optimization
Manuvir Das, Ben Liblit, Manuel
Fahndrich, and Jakob Rehof
SAS '01: The 8th International
Static Analysis Symposium, July
2001.
(also published as Microsoft
Research Technical Report
MSR-TR-2001-20)
- Dynamic
Points-to Sets: A Comparison
with Static Analyses and
Potential Applications in
Program Understanding and
Optimization
Markus Mock, Manuvir Das, Craig
Chambers, and Susan Eggers
PASTE '01: Workshop on Program
Analysis for Software Tools and
Engineering, June 2001.
(also published as Microsoft
Research Technical Report
MSR-TR-2001-38 and University of
Washington CSE Technical Report
UW-CSE-01-03-01)
- Unification-Based
Pointer Analysis with
Directional Assignments
Manuvir Das
PLDI '00: Proceedings of the ACM
SIGPLAN 2000 Conference on
Programming Language Design and
Implementation, Vancouver, BC,
Canada, June 2000.
- Scalable
Context-Sensitive Flow Analysis
Using Instantiation Constraints
Manuel Fahndrich, Jakob Rehof,
and Manuvir Das
PLDI '00: Proceedings of the ACM
SIGPLAN 2000 Conference on
Programming Language Design and
Implementation, Vancouver, BC,
Canada, June 2000.
- From
Polymorphic Subtyping to CFL
Reachability: Context-Sensitive
Flow Analysis Using
Instantiation Constraints
Manuel Fahndrich, Jakob Rehof,
and Manuvir Das
Microsoft Technical Report
MSR-TR-99-84, Microsoft
Corporation, November 1999.
|
|
|
|
|