Andrew Kennedy: Generics for .NET

Generic collection class

For four years Don Syme and I were involved in the design and implementation of generics (also known as parametric polymorphism) for the .NET Common Language Runtime. The generics feature is supported in the Whidbey (V2.0) release of Visual Studio in C#, Visual Basic and Managed C++. It first made its appearance in a modified version of the shared-source Rotor release of the Common Language Infrastructure which we called Gyro. (See An Introduction to C# Generics by Juval Lowy for a nice introduction to generics as they appear in C#).

Our design is based around support in the runtime (virtual machine) itself, extending the type system and intermediate language of .NET to support type-parameterized types and methods. The loader and just-in-time compiler generate specialized code on demand, sharing code across multiple instantiations where possible. The design is very expressive, supporting type-parameterization of classes, interfaces, structs and delegates, type-parameterization of static, instance and virtual methods, per-instantiation static fields, multiple instantiations of interfaces, type constraints on type parameters (so-called "F-bounded" polymorphism), and run-time inspection of type information. The CLR implementation pushes support for generics into almost all feature areas, including serialization, remoting, reflection, reflection emit, profiling, debugging, and pre-compilation.


Invited Talk

Parametric Polymorphism for Popular Programming Languages (ppt) (html)
Invited talk, 10th Workshop on Foundations of Object-Oriented Programming (FOOL), New Orleans, Louisiana, January 2004.

Publications and Presentations

On Decidability of Nominal Subtyping with Variance (Slides from talk: pdf pptx)
Andrew Kennedy and Benjamin Pierce. In 2007 International Workshop on Foundations and Developments of Object-Oriented Languages (FOOL/WOOD'07), Nice, France, January 2007.

We investigate the algorithmics of subtyping in the presence of nominal inheritance and variance for generic types, as found in Java 5, Scala 2.0, and the .NET 2.0 Intermediate Language. We prove that the general problem is undecidable and characterize three different decidable fragments. From the latter, we conjecture that undecidability critically depends on the combination of three features that are not found together in any of these languages: contravariant type constructors, class hierarchies in which the set of types reachable from a given type by inheritance and decomposition is not always finite, and class hierarchies in which a type may have multiple supertypes with the same head constructor. These results settle one case of practical interest: subtyping between ground types in the .NET intermediate language is decidable; we conjecture that our proof can also be extended to show full decidability of subtyping in .NET. For Java and Scala, the decidability questions remain open; however, the proofs of our preliminary results introduce a number of novel techniques that we hope may be useful in further attacks on these questions.

Variance and Generalized Constraints for C# Generics (Slides from talk: ppt)
Burak Emir, Andrew Kennedy, Claudio Russo and Dachuan Yu. In Proceedings of the 20th European Conference on Object-Oriented Programming (ECOOP), Nantes, France, July 2006.

Generic types in C# behave invariantly with respect to subtyping. We propose a system of type-safe variance for C# that supports the declaration of covariant and contravariant type parameters on generic types. To support more widespread application of variance we also generalize the existing constraint mechanism with arbitrary subtype assertions on classes and methods. This extension is useful even in the absence of variance, and subsumes equational constraints proposed for Generalized Algebraic Data Types (GADTs). We formalize the subtype relation in both declarative and syntax-directed style, and describe and prove the correctness of algorithms for constraint closure and subtyping. Finally, we formalize and prove a type safety theorem for a featherweight language with variant classes and generalized constraints.

Generalized Algebraic Data Types and Object-Oriented Programming
Andrew Kennedy and Claudio Russo. OOPSLA, October 2005, San Diego, California.

Generalized algebraic data types (GADTs) have received much attention recently in the functional programming community. They generalize the type-parameterized datatypes of ML and Haskell by permitting constructors to produce different type-instantiations of the same datatype. GADTs have a number of applications, including strongly-typed evaluators, generic pretty-printing, generic traversals and queries, and typed LR parsing. We show that existing object-oriented programming languages such as Java and C# can express GADT definitions, and a large class of GADT-manipulating programs, through the use of generics, subclassing, and virtual dispatch. However, some programs can be written only through the use of redundant run-time casts. We propose a generalization of the type constraint mechanisms of C# and Java to avoid the need for such casts, present a Visitor pattern for GADTs, and describe a switch construct as an alternative to virtual dispatch on datatypes. We formalize both extensions and prove a type soundness result.

Design and Implementation of Generics for the .NET Common Language Runtime
Andrew Kennedy and Don Syme. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), Snowbird, Utah, USA. June 2001.

The Microsoft .NET Common Language Runtime provides a shared type system, intermediate language and dynamic execution environment for the implementation and inter-operation of multiple source languages. In this paper we extend it with direct support for parametric polymorphism (also known as generics), describing the design through examples written in an extended version of the C# programming language, and explaining aspects of implementation by reference to a prototype extension to the runtime.

Our design is very expressive, supporting parameterized types, polymorphic static, instance and virtual methods, "F-bounded" type parameters, instantiation at pointer and value types, polymorphic recursion, and exact run-time types. The implementation takes advantage of the dynamic nature of the runtime, performing just-in-time type specialization, representation-based code sharing and novel techniques for efficient creation and use of run-time types.

Early performance results are encouraging and suggest that programmers will not need to pay an overhead for using generics, achieving performance almost matching hand-specialized code.

Formalization of Generics for the .NET Common Language Runtime
Dachuan Yu, Andrew Kennedy, and Don Syme. In Proceedings of the 31st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), Venice, Italy, January 2004. (Slides from talk: ppt)
We present a formalization of the implementation of generics in the .NET Common Language Runtime (CLR), focusing on two novel aspects of the implementation: mixed specialization and sharing, and
efficient support for run-time types. Some crucial constructs used in the implementation are dictionaries and run-time type representations. We formalize these aspects type-theoretically in a way that corresponds in spirit to the implementation techniques used in practice. Both the techniques and the formalization also help us understand the range of possible implementation techniques for other languages, e.g., ML, especially when additional source language constructs such as run-time types are supported. A useful by-product of this study is a type system for a subset of the polymorphic IL proposed for the .NET CLR.

Transposing F to C#: Expressivity of polymorphism in an object-oriented language
Andrew Kennedy and Don Syme. In Concurrency and Computation: Practice and Experience, Volume 16, Issue 7, June 2004, Wiley.
(Earlier version appeared in Proceedings of Workshop on Formal Techniques for Java-like Programs (FTfJP), Málaga, Spain, June 2002. Slides from talk: ppt)

We present a type-preserving translation of System F (the polymorphic lambda calculus) into a forthcoming revision of the C# programming language supporting parameterized classes and polymorphic
methods. The forthcoming revision of Java in JDK 1.5 also makes a suitable target. We formalize the translation using a subset of C# similar to Featherweight Java. We prove that the translation is fully type-preserving and that it preserves behaviour via the novel use of an environment-style semantics for System F. We observe that whilst parameterized classes alone are sufficient to encode the parameterized datatypes and let-polymorphism of languages such as ML and Haskell, it is the presence of dynamic dispatch for polymorphic methods that supports the encoding of the "first-class polymorphism" found in System F and recent extensions to ML and Haskell.

Combining Generics, Pre-compilation and Sharing Between Software-Based Processes
Andrew Kennedy and Don Syme. In 2nd workshop on Semantics, Program Analysis and Computing Environments for Memory Management (SPACE), January 2004, Venice, Italy.

We describe problems that have arisen when combining the proposed design for generics for the Microsoft .NET Common Language Runtime (CLR) with two resource-related features supported by the Microsoft CLR implementation: application domains and pre-compilation. Application domains are "software based processes" and the interaction between application domains and generics stems from the fact that code and descriptors are  generated on a per-generic-instantiation basis, and thus instantiations consume resources which are preferably both shareable and recoverable. Pre-compilation runs at install-time to reduce startup overheads. This interacts with application domain unloading: compilation units may contain shareable generated instantiations. The paper describes these interactions and the different approaches that can be used to avoid or ameliorate the problems.

Co-workers

Don Syme and I have been involved since the beginning. Peter Sestoft visited Microsoft for three months in the autumn of 2001, making valuable contributions to the generics design and specification for C#, and writing the first generics code in C# (still present in test suites). Dave Berry visited for six months to work on the Gyro shared source release of generics. Claudio Russo designed and implemented verification (CLR type-checking) for generics. During his internship in summer 2003, Dachuan Yu formalized many aspects of the implementation.

Then of course, there are countless folks in the Microsoft product teams (CLR, C#, and others). I'd like to pick out Jim Miller (Program Manager, now Architect, CLR team), Vance Morrison (JIT dev lead, CLR team) and Anders Hejlsberg (principal designer of C#) who were supportive from the earliest days, and Jim Hogg (Program Manager, CLR team) and Jason Zander (Product Unit Manager, CLR team) who were instrumental in pushing the "OK" button to ship the feature. Also Chris Brumme and Patrick Dussud (architects, CLR team), Peter Golde and Peter Hallam (devs, C# team), Dario Russi, Sean Trowbridge, Shri Borde, Simon Hall and Chris King (devs, CLR team), and Marcelo Birnbach (tester, CLR team).

Related work

Java now supports generics, in JDK version 1.5. The Java Generics design, based on the earlier GJ compiler of Bracha, Odersky, Stoutamire, and Wadler, takes a different approach from .NET. Rather than extend the JVM with support for generics, the feature is "compiled away" by the Java compiler. Furthermore, existing non-generic Java libraries can be "retrofitted" with generic types. These design choices make the design less flexible than ours: in particular, generic types can be instantiated only with reference types (e.g. string or object) and not with primitive types (e.g. float or int), and type information is not preserved at runtime, so objects with distinct source types such as List<string> and List<object> cannot be distinguished by run-time type tests. On the other hand, our flexible design baked into the CLR was a lot more difficult to implement, as Don and I will testify ;-)