March 19 2007
University of Washington, Course CSE599L
Copyright (c) 2007, Thomas Ball, Microsoft Research
Software is full of holes. So many that software producers often warn the consumer that:
THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU
This text reflects the sad fact that the software industry still struggles to deliver software that doesn’t crash. Consumers don’t tolerate automobile companies selling defective “Lemons” and have recourse to “Lemon Laws”. Someday consumers will begin to perceive that software is just as important to their lives as the cars they drive. On that day, they won’t tolerate defect-laden software anymore. We in the software industry had better be ready for that day – it may be closer than it appears.
At Microsoft, we now regularly apply a new generation of software tools that can automatically identify serious defects (the holes) in programs. These tools examine millions of lines of code every day, long before the software is released for general use. With these tools, we catch more defects earlier in the software process, enabling Microsoft to deliver more reliable systems. A number of these tools have been released for general use through Microsoft’s Visual Studio integrated development environment as well as freely available development kits.
This course is about software tools that take programs as input and inspect them for defects, without actually running the programs. Such tools are called static analysis tools because they analyze the static source code of a program. Static analysis tools analyze all the statements of a program, without regard to how frequently these statements execute in practice. These tools are automated code inspectors, tirelessly examining a program for defects.
What sort of defects can static analysis tools find? Generally, the tools uncover low-level coding defects that, nonetheless, can cause severe faults in program behavior such as program crashes. At the most basic level, there are tools that find language-level defects such as integer overflow, division by zero, dereference of a null pointer, and buffer overflow. There is ample history that demonstrates the serious consequences of such seemingly minor software defects.
At a higher level, we make progress in the software industry by building software in layers. Upper layers of the software depend on the lower layers. Linking the layers are interfaces (also called APIs) through which control passes and communication takes place. The behavior of the software “stack” depends vitally on the proper coordination between adjacent layers in the stack.
To support this basic software architecture, we now have static analysis tools parameterized by contract languages on APIs. The contract language allows an API designer to precisely state the proper way for client programs to use the API and what guarantees an implementation of the API must provide to clients. For example, APIs often have an associated protocol, which is a fancy way to say that the actions an API permits in future computations may depend on the actions that occurred in the past. A simple example is that it is not possible for a program to read data from a file unless the program first has successfully opened the file.
When a set of contracts for an API is packaged together with a static analysis tool that checks that client code against the contracts, we get an API-specific checker for finding higher-level domain-specific errors. If the API is very heavily used, such as a platform API, then the effort in developing the API contracts is leveraged across many users.
Of course, these tools are not a panacea. They cannot make up for poorly designed and fundamentally flawed software-there is no software tool for turning software lemons into software lemonade. And like any technology, the tools have a cost and they have limitations. However, over the last six years, such tools have matured to a state where they are in regular use in a most major product groups in Microsoft, having been deeply integrated into the development process. The tools find defects that programmers care about fixing, and they have quantifiable positive impact on code quality, as we will demonstrate later. Furthermore, the tools are based on a set of analysis building blocks backed up by theory and experimentation as well as infrastructure that is available to build upon and experiment with.
You can participate in forwarding the state of the art in static analysis tools! This course will help you answer the following question:
How do I design and implement
a static analysis tool chain
to help people effectively address
a software reliability problem?
Of course, the question assumes that your software reliability problem can be addressed by static analysis, which may not always be the case. We will offer rough guidelines about how to determine when static analysis is applicable. At the end of this course, you will have a much better understanding of what sort of software reliability problems static analysis currently is good at solving.
The question also emphasizes that a coordinated set of software tools (a tool chain) rather than a single tool may be needed to solve a problem. This is especially true when analyzing legacy systems with tools that require program annotation. In these cases, tools that can infer annotations with a high degree of accuracy are essential.
Finally, the question reminds us that people will have to use these tools, interpret their results and maintain them. Here are some questions that software engineers might ask you about your static analysis tool:
Flowing from such questions, we identify six major concerns regarding static analysis tools.
Startup. Some tools work “out of the box” on unmodified programs. Others require the programmer to provide program annotations that specify what the tools should check. Before applying a tool to an incomplete program, the programmer also may need to provide stubs that turn the incomplete program into a complete program. At a lower level, the tool needs to be integrated into the build process, which can be a non-trivial task. Of course, we wish to minimize the startup cost of a tool.
Performance. How does the performance of a tool depend on program size? If the tool takes a negligible percentage of compile time then the tool can be run as part of every compilation, and is suitable for deployment on the programmer’s desktop. A tool that is an order of magnitude slower than a compiler will need to be run overnight (or for several days) in a build lab by dedicated servers. Engineering the performance of static analysis tools takes a tremendous amount of time and is a critical part of tool development.
Precision. The precision of a tool is inversely proportional to how many of the error messages it reports are false alarms. A tool that overwhelms the user with a flood of false alarms may be of little use. The precision of a tool is greatly dependent on the accuracy with which the static analysis tracks facts that are relevant to the given reliability problem. Central concerns here are how to design an appropriate analysis and how to increase the precision of a tool to reduce the number of false alarms.
Comprehensive Search.* We can think of a static analysis tool as performing a search for errors over an enormous search space. What do we know if this search finds no errors? Of course, it depends on the assumptions that the search makes about the input program as well as how comprehensive the search is in the various dimensions it explores. The ultimate ideal of static analysis is to prove the absence of a particular class of errors in a comprehensive, or mathematically rigorous, manner. However, this ideal often may be sacrificed for the sake of performance and/or precision. *(Thanks to Manuvir Das for the suggestion of this term rather than the usual "soundness").
Presentation/Management. Users need to understand and manage the error messages produced by a tool. What are effective ways for a tool to explain an error to the user? Error explanations are very dependent on the type of information computed by the static analysis. Additionally, based on knowledge about the static analysis, we may have higher confidence in certain errors found by the tool and lower confidence in others. Prioritizing error messages based on such confidence can be very useful. Even then, it is critical to give the user the means to manage error messages.
Extensibility. Does the tool provide a way for people to extend its checking capability? A number of analysis tools take as input a contract in addition to a program. As mentioned before, such tools allow the designer of an API to provide a set of API contracts that client programs of the API should obey. Extensible tools allow product teams to specialize tools to check that product-specific coding disciplines are followed. Another trend is that several program analysis engines now use Datalog, a logic programming language, to script program analyses.
The design of a static analysis tool is a complex subject because, in addition to the above concerns, there is no one analysis to solve all problems (unlike in the lexing and parsing phases of compilers, where there are standard algorithms and tools that support these phases). However, there are basic techniques that have proven very useful and have shown their worth through numerous applications and refinements.
Abstract Interpretation. A key concept underlying most static analysis tools is abstraction. Given a reliability problem, we design a simplified view, or abstraction, of a program that enables us to address the problem by analyzing the abstract program in place of the original program. While an interpreter for a language is a program that can execute a program in that language on some input, an abstract interpreter is a program that executes an abstract program on abstract inputs.
Abstract interpretation is a foundational technique, developed in the 1970s, for creating abstract interpreters based on solid mathematical and logical principles. Key issues the technique addresses are how to:
More formally, an abstract interpreter computes an inductive invariant (a particular form of mathematical proof) in order to establish a safety property of the abstract program. In abstract interpretation, each step of the original program is matched (or simulated) by a step of the abstract program, but not necessarily vice versa. As the abstract program has more behaviors than the original program, if an abstract interpreter establishes a property of the abstract program then the same property holds of the original program.
Symbolic Execution and Reasoning. Another time-tested approach to statically analyzing programs is to translate a program into a logical formula that represents the semantics of the program and a query about its behavior. Questions about program behavior then translate to logical questions about the formula that can be answered by programs known as satisfiability (SAT) solvers.
Symbolic execution is powerful because, generally, it does not involve abstraction of the program under examination and has a high degree of precision. However, this power comes at a price. First, one is dependent on the search strategies of the underlying SAT solver, which may be hard to predict or guide. Second, SAT solvers generally do not have the power to compute inductive invariants, so they are limited to dealing with formulas representing loop-free programs.
Types, Modular Specification and Verification. Types allow programmers to adorn the interfaces of their components with information, or specifications, about their intended use. Type checkers then check each component to ensure its consistency with respect to the programmer-declared types. Key aspects of most type systems are: modularity: the code of each component can be type checked without knowledge of the code of other components; (2) verification: if a program passes the type checker then certain classes of errors cannot occur at run-time.
A long line of research has addressed how to power up such modular reasoning with more expressive contract languages and concomitant verification engines. Such contract languages generally involve procedure pre-conditions and post-conditions, and invariants. A procedure pre-condition is a constraint on the input to a procedure that must be satisfied before the procedure is called. It is the caller’s responsibility to satisfy the pre-condition before calling the procedure. Conversely, a procedure post-condition is a constraint that execution of the procedure body should guarantee holds on procedure return, assuming that the pre-condition holds initially. Invariants are constraints on program state that should always be true at certain points in execution, such as the entry of a loop.
Interprocedural Analysis. While modular reasoning is essential for scaling static analysis, it requires the presence of pre- and post-conditions for each procedure. Rather than depend on the programmer to provide such conditions, or to check their claims, we may need a global analysis of an entire program to compute essential facts about each procedure. For example, for most any program analysis, it is critical to know the program state that can/cannot be affected by a procedure call. Interprocedural analysis can be used to determine what memory locations a procedure may modify or what other procedures it may call, as well as many other properties of programs.
Part and parcel of interprocedural analysis is the computation and caching of procedure summaries. A procedure summary can be viewed as a pair of a pre-condition and post-condition for a procedure. The meaning of the summary is that if a procedure is called from a state satisfying the pre-condition then the execution of the procedure will result in a state satisfying the post-condition. Procedure summaries are essential for analyzing programs with recursion as well as for avoiding the reanalysis of procedure bodies.
Abstraction Refinement. What can one do when a static analysis yields too many false alarms? Typically, one manually investigates to determine the root cause of the false alarms and takes corrective action. Sound like the process of debugging a program? It is, except for two things: (1) one is debugging an abstract interpreter in the process of interpreting another program; (2) the root cause of false alarms may not be a bug in the abstract interpreter, though that certainly is possible, but a basic design decision made about the abstract domains over which the interpreter operates. Often, one will need to refine the precision of the abstraction used by the static analysis so as to reduce the number of false alarms.
Several automated techniques for reducing and coping with false alarms have had quantifiable success. First, through the use of symbolic reasoning, it often is possible to automatically determine the root cause of false alarms and a refinement to the abstract domain that will eliminate them. Second, because false alarms tend to cluster together, statistical techniques have proven useful in separating false alarms from true alarms.
Analysis Building Blocks. If we open up the hood of a static analysis tool, we will find a number of data structures and algorithms:
Compiler internals. Many static analysis tools piggy-back on existing compilers or tools for reading various executable formats (such as Java bytecode, .NET assemblies or even native x86 binaries). Not surprisingly, many data structures and algorithms useful to optimizing compilers, such as control-flow graphs, dominators, static-single assignment form, etc. also find application in static analysis tools.
Constraints over symbolic values. Static analysis must cope with variables that are used before being defined. For example, when analyzing a procedure in isolation from its callers there are no definitions of the procedure parameters. The basic solution is for static analyses to introduce symbolic values or names (such as A, B and C) for the initial values of such memory locations. As analysis proceeds, it must manage sets of constraints over these symbolic values and perform queries over these constraints. There are a number of well-known procedures for managing constraints over various data types.
Symbolic state representation. Of course, programs mutate state, which includes global variables, local variables on the stack and heap locations. A large part of a static analysis is in the representation of a program’s store, which maps memory locations to their values. The basic operations over the store are: allocation, select the value of a location, update the value of a location and deallocation.
A statement interpreter. Finally, a static analysis must (abstractly) interpret each statement type in the programming language under consideration, using the constraint modules and state representation. In general, this process is simplified because the analysis operates over a low-level intermediate representation of the program with a few simple statement types. Critical parts of an abstract interpreter that are not found in standard interpreters are:
As you might gather already, the analysis of programs is a complex topic. In this course we will focus on:
We do not provide a comprehensive treatment of all the different approaches to static analysis of programs. For example, type systems have a long history and many connections to the ideas presented here but will not be featured prominently. Those familiar with compiler optimization will note our preference for the terminology of abstract interpretation over dataflow analysis. We will briefly touch on these and other related areas at appropriate moments and provide references for further reading.
Finally, we strive to keep our discussion as simple and precise as possible. Although the subject of static analysis is complex, we believe that a handful of simple concepts underlie most successful static analysis tools. We will strive to be precise in our exposition of these concepts, with a modicum of mathematical formalism.
We assume that you have experience programming in imperative programming languages in the C family (structured control-flow, procedures, pointers, program stack) as well as data structures (lists, trees, sets, maps and graphs) and basic algorithms over them. Furthermore, you should have taken introductory undergraduate-level courses on complexity theory and compilers. If not, you should be familiar with the following concepts from such courses: