Inferable Object-Oriented Typed Assembly Language

ACM SIGPLAN 2010 Conference on Programming Language Design and Implementation (PLDI) |

Published by Association for Computing Machinery, Inc.

A certifying compiler preserves type information through compilation to assembly language programs, producing typed assembly language (TAL) programs that can be verified for safety independently so that the compiler does not need to be trusted. There are two challenges for adopting certifying compilation in practice. First, requiring every compiler transformation and optimization to preserve types is a large burden on compilers, especially when adopting certifying compilation into existing optimizing non-certifying compilers. Second, type annotations significantly increase the size of assembly language programs.

This paper proposes an alternative to traditional certifying compilers. It presents iTalX, the first inferable TAL type system that supports existential types, arrays, interfaces, and stacks. We have proved our inference algorithm is complete, meaning if an assembly language program is typeable with iTalX then our algorithm will infer an iTalX typing for that program. Furthermore, our algorithm is guaranteed to terminate even if the assembly language program is untypeable.We demonstrate that it is practical to infer such an expressive TAL by showing a prototype implementation of type inference for code compiled by Bartok, an optimizing C# compiler. Our prototype implementation infers complete type annotations for 98% of functions in a suite of realistic C# benchmarks. The type inference time is about 8% of the compilation time. We needed to change only 2.5% of the compiler code, mostly adding new code for defining types and for writing types to object files. Most transformations are untouched. Type annotation size is only 17% of the size of pure code and data, reducing type annotations in our previous certifying compiler [4] by 60%. The compiler needs to preserve only essential type information such as method signatures, object layout information, and types for static data and external labels. Even non-certifying compilers have most of this information available.