This section discusses the motivation behind Cω and provides a fairly detailed, yet high-level, account of the main features of the language.
Important: Cω is an experimental research language. There are no plans to turn it into a commercial language supported by Microsoft. It is not supported by either the C# or the Visual Studio teams. There are no plans to integrate it into any product.
In the last decade, strongly-typed, garbage collected object-oriented languages have left the laboratory and become mainstream industrial tools. This has led to improvements in programmer productivity and software reliability, but some common tasks are still harder than they should be. Two of the most critical for the development of the next generation of loosely-coupled, networked applications and web services are concurrent programming and the processing of relational and semi-structured data.
It is a truth universally acknowledged that concurrent programming is significantly more difficult than sequential programming. It used to be the case that only highly skilled developers, working on such projects as operating system kernels or the core engines of databases, did non-trivial concurrent programming. Over time, however, more developers have had to deal with more concurrency. This began to happen with the arrival of multi-tasking GUIs and multi-user server-side applications, and the initial approach was to encapsulate the concurrency somewhere where the typical application programmer wouldn't have to deal with it. GUI frameworks are typically single-threaded despite the natural concurrency of graphical interfaces, whilst a common approach to concurrency control in server applications is to delegate it to the transactional mechanisms of a database. However, we believe the arrival of network-centric computing has exposed these palliatives as inadequate.
Even the simplest application now has to simultaneously manage both a GUI and network communications, and locking up a single threaded user interface whilst performing a lengthy remote operation is unacceptable. As applications perform more remote communications with more remote services, we are increasingly forced to replace synchronous RPC with asynchronous (one-way) messaging. But asynchronous programming is also difficult -- incoming messages arrive at unpredictable times and in unpredictable orders, and one naturally ends up needing multiple threads to handle them.
Unfortunately, support for concurrent programming, especially asynchrony, in current mainstream programming languages is weak. Shared memory concurrency on a single machine, even in modern languages like C# and Java, is handled by 1970s threads and locks model which is implemented entirely in terms of library routines. Distributed concurrency and asynchronous messaging are handled by different library routines with their own model; the .NET libraries, for example, include a fairly complex delegate-based API for asynchronous messaging that still offers little support for the hard problem of handling incoming messages.
The current situation with regard to external data is just as bad. Most web-based (and many non-web-based) applications are basically thin interface-generating layers over relational databases (the `3-tier' model). Yet dealing with relational data from within current languages is messy, error-prone and dangerous. APIs for database access typically construct SQL queries as strings and return results as untyped collection objects which are deconstructed imperatively. This makes code lengthy and unreadable, negates much of the benefit of working in a language with strong static safety guarantees and loses much of the advantage of the underlying declarative query language. Worse still, it is insecure: when query strings are derived from user input, preventing script-injection attacks requires very careful coding.
The other form of external data with which application increasingly have to deal is tree-shaped semistructured data, such as XML documents. Here again, support for processing such data in current languages is weak. One either uses an untyped DOM-like representation, giving up static guarantees about conformance to a particular DTD or Schema, or uses an external tool to produce a more strongly typed mapping into the language's type system, with an accompanying loss of fidelity, efficiency and genericity in queries. Even in the DOM case, in which one can more easily support a reusable library of query methods, constructing queries is still significantly more complex than it is in a more domain-specific language such as XQuery. On the other hand, interfaces from general purpose languages to external XML processing languages are often string-based, with all the same impedance mismatches and security holes as in the SQL case.
From a data perspective, our design goal was to evolve C# to provide an integration of the object, relational and semi-structured data models. (A similar process is happening with database systems; nearly all now offer some form of integration of the relational and XML data model, and of SQL and XQuery.) Diagrammatically we have aimed for the following.

More technically, we have
sought to integrate these models by generalization, rather than
by ad-hoc specializations (e.g. by adding Table<R> or
XML<S> classes where R is some relational type and
S is some schema).
Cω offers an integration of the data models by extending with C# type system with some new types, and by offering some new expressions -- primarily by generalizing the familiar member access operator. We shall consider these new types in turn, and for each consider the appropriate new expressions.
The first structural type we add is a stream type; streams represent
ordered homogeneous collections of zero or more values. For example,
int* is the type for homogeneous sequences of
integers. Streams in Cω are closely related to
IEnumerable<>, the generic iterator interface that will appear
in the next release of C#.
Cω streams are typically generated using iterators, which are
blocks that contain yield statements. For example, the method
produces a single element stream.
virtual string* Foo(){
yield return "Hello world!";
}
Importantly,
it should be noted that, just as for C#, invoking such a method
body does not
immediately execute the iterator code, but rather
immediately returns a closure. (Thus Cω streams are essentially lazy
lists, in the Haskell style.) This closure is consumed by the foreach
statement, For example, given a stream zones of type
int*, the following statement prints each
element in that stream.
foreach(int zone in zones) Console.WriteLine(zone);
Here is a method that generates a finite stream of integers:
virtual int* FromTo(int b, int e){
for (i = b; i <= e; i++) yield return i;
}
Here's an example that iterates over the elements of the stream:
foreach(int i in FromTo(0,10)) {
Console.WriteLine(i);
}
Streams are constructed on demand, and can be infinite:
virtual int* From(int n){
for(;;)
yield return n++;
}
A vital aspect of Cω streams is that they are always flattened; there are no nested streams of streams. Cω streams are thus aligned with XPath/XQuery sequence which are also flattened.
A key programming feature of Cω is generalized member access: the familiar 'dot' operator is now much more
powerful. Thus if the receiver is a stream, then the member access is
mapped over the elements, e.g. zones.ToString() implicitly
maps the method call over the elements of the stream zones and
returns a value of type string*. This feature significantly
reduces the burden on the programmer. Moreover, member access has been
generalized so it behaves like a path expression. For example,
zones.ToString().PadLeft(10) converts all the elements of the stream
zone to a string, and then pads each string, returning a stream of
these padded strings. As we shall see, the combination of generalized
member access and new datatypes, enables the Cω programmer to write
XPath-like queries directly.
Sometimes one wishes to map more than a simple member access over the
elements of a stream. Cω offers a convenient shorthand called an
apply-to-all expression, written e.{s;...s;}, which applies the
block {s;...;s;} to each element in the stream
e. The block may contain the variable it which plays
a similar
role as the implicit receiver argument this in method bodies and is
bound to each successive element of the iterated stream. For instance,
the apply-to-all expression below will convert the stream of natural
numbers from 0 into the stream of even numbers, converts each of
these into a string, and then prints them all.
From(0).{return 2*it;}.ToString().{Console.WriteLine(it);};
Cω also provides a lightweight way to define finite streams without the need of defining a method, called block expressions. For instance, the following block expression generates a stream of two strings.
string* greeting = {yield return "Hello"; yield return "World!";};
The second family of structural types we add are anonymous structs, which
describe ordered collections of heterogeneous values. An anonymous
struct is like a tuple in ML or Haskell and is written, for example
struct{int i; Button;}. A value of this type contains a
labelled member i
of type int and an unlabelled member of type Button. We can
construct a value of this type with the following expression:
new(i=42,new Button())
To access components of anonymous structs we (again) generalize the
notion of member access. Thus assuming a value x of the previous
type, we write x.i to access the integer value. Unlabelled members
are accessed by their (constant) position; for example x[1] returns the
Button member. As for streams, member access is lifted over
unlabelled members of anonymous structs. To access the BackColor
property of the Button component in variable x we can just write
x.BackColor, which is equivalent to x[1].BackColor.
At this point we can reveal even more of the power of Cω's
generalized member access. Given a stream friends of type
struct{string name;int age;}*, the expression friends.age
returns a stream of integers. The member access has been lifted over
both structural types. To print the names of one's friends is the
query-like statement:
friends.name.{ ConsoleWriteLine(it);};
Interestingly, Cω also allows repeated occurrences of the same member
name within an anonymous struct type, even at different types. For
example, assume the following declaration:
struct{int i; Button; float i;} z;
Then z.i projects the two i members of z into a new anonymous
struct that is equivalent to
new(z[0],z[2]) and of type struct{int;float;}.
Cω provides a limited form of covariance for anonymous structs, just
as for streams. Thus struct{int;Button;} is a subtype of
struct{int; Control;} but is not a subtype of
struct{object; Control;}.
Cω does not support width subtyping for anonymous structs.
The third structural type we add is a particular form of discriminated
union type, which we call an choice type. This is written, for
example, choice{int; bool;}. As the name suggests, a value of this
type is either an integer or a boolean, and may hold either at any one
time. Unlike unions in
C/C++ and variant records in Pascal where users have to keep track of
which type is present, values of an discriminated unions in Cω are
implicitly tagged with the static type of the chosen alternative, much
like unions in Algol68. In other words, discriminated union values are
essentially a pair of a value and its static type.
There is no syntax for creating choice values; this is handled by the compiler.
choice{int;Button;} x = 3;
choice{int;Button;} y = new Button();
Cω provides a test, e is T, on choice values to test
the value's static type. Thus x is int would return
true, whereas y is int would return false.
This is a simple extension of C#'s is type test.
Assuming that an expression e is of type
choice{T1;...Tn;},
the expression e is T is true for exactly one T in
T1;...Tn;. This invariant is maintained by the type system. The
only slight complication arises from subtyping, e.g.
As Button is a subtype of both Control and object, which type
tag is generated by the compiler? The answer should be obvious to the
experienced C# programmer: an choice type can be thought of as
providing a number of overloaded constructor methods, one for each
component type. Just as for standard object creation, the best
constructor method is chosen. In the example above, clearly
Control is better than object.
Thus z is Control returns true.
The notion of 'best' for Cω is the obvious extension of that for
C#.
As the reader may have guessed, member access has also been generalized
over discriminated unions (Cω's choice types). Here the behaviour of member access is less
obvious, and has been designed to be aligned with XPath. Consider a
value w of type choice{int; Button;}. The member access w.GetType()
clearly succeeds irrespective of whether the value is an integer or a
Button object. In this case the type of the expression w.GetType()
is Type.
However clearly the member may not be supported by all the possible
component types, e.g. w.BackColor. Classic treatments of union types
would probably consider this to be type incorrect. However, Cω's
choice types
follow the semantics of XPath where, for example, the query foo/bar
returns the bar nodes under the foo node if any exist, and
the empty sequence if not. Thus in Cω, the expression
w.BackColor is well-typed, and will return a value of type
Color?, which is either a Color object or
null. This new type, Color?, can be thought of a
singleton stream containing the Color value (when w
contains a Button), or the empty stream (which is equated
with null). Again, we emphasize that this behaviour precisely
matches that of XPath.
Cω follows the design of C# in allowing all values to
be boxed and hence all value types are a subtype of the supertype
object. Thus both anonymous structs and choice types are considered
to be subtypes of the class object.
To allow close integration with XSD and other XML schema languages, we have included the notion of a content class in Cω. A content class is a normal class that has a single unlabelled type that describes the content of that class, as opposed to the more familiar (named) fields. The following is a simple example of a content class.
class friend{
struct{string name;
int age;
};
void incAge(){...}
}
Again we have generalized member access over content classes. Thus the
expression Bill.age returns an integer, where Bill
is a value of type friend.
From an XSD perspective, classes correspond to global element declarations, while the content type of classes correspond to complex types. Some further comparisons with the XML data model are immediately below, but a more comprehensive study can be found elsewhere.
C# provides the ability to query members of an object using the "." operator and members of an array using array indexers, "[i]". The intention of Cω's generalization of these operators is to leverage in XQuery/XPath-like programming in an intuitive way. It's certainly the case that XQuery/XPath code can be almost trivially rewritten in Cω.
As an example consider the following XQuery code, that is taken from the W3C's Use cases document. It operates on a bibliogrpahy, that contains a number of books. For each book in the bibliography, it lists the title and authors, grouped inside a 'result' element.
for $b in $bs/book
return
<result>
{$b/title}
{$b/author}
</result>
In Cω the code is almost identical (by design!).
foreach (b in bs.book)
{
yield return <result>
{b.title}
{b.author}
</result>;
}
Cω offers some other XPath-like expressivity:
friends[ it.age < 18] therefore
filters the stream friends of type
friend*, returning those objects who denote
minors.....friends.string::* returns all direct members of
type string, regardless of their member names.Cω provides a convenient way to construct object graphs using an XML literal expression. The following is from the XSVG sample provided with the compiler installation. It builds a an SVG graphic object using an XML literal expression:
<svg width=1000 height=1000 viewBox="0 0 1000 1000"> </rect x=0 y=0 width=1000 fill="gray"/> </g fill="black" stroke="black"> {MoirePattern(1000, 1000, 80)} </g> </svg>;
This is analogous to the XML element construction that is possible in XSLT and XQuery. However, Cω views this as just convenient shorthand for the appropriate data constructors. Notice that, just as in XSLT and XQuery, the XML elements can contain embedded code so that dynamic object graphics can be constructed easily. XML literal expressions are just another way of constructing Cω values and can be used anywhere that an expression is allowed in the language.
This is a considerable improvement on the current support for XML
programming in C#. The following shows a function GetTotalByZip
written in C# that takes an XML document as input, then proceeds to
process selected nodes in qualified orders. The output of the program
is the total for all items that were ordered in a particular zip
code:
decimal GetTotalByZip(XmlDocument doc, int zip) { // For each order in 98052 zip code, do price * quantity // of item and add it to the running total. decimal total = 0; string xpathQuery = "orders/order[Zip='"+zip+"']/item"; foreach (XmlElement item in doc.SelectNodes(xpathQuery)) { XmlNode price = item.SelectSingleNode("price"); XmlNode qty = item.SelectSingleNode("quantity"); decimal p = Decimal.Parse(price.InnerText); decimal q = Decimal.Parse(qty.InnerText); total += p * q; } // Return the total. return total; }
Notice that XmlDocument programming involves a lot of
expensive, untyped string-to-type coercion. Here is how that same
program can be written using the Cω language:
public decimal GetTotalByZip(Orders orders, int zip) { float total = 0; foreach (Item item in orders.order[Zip == zip].item) { total += item.price * item.quantity; } order.total = total; }
Here you can see how XPath-like query expressions lead to succinct Cω programs.
Just as Cω includes XPath-like queries to process semi-structured data, we should like to offer similar functionality for processing relational data. Thus Cω includes a simple fragment of SQL, including selection with projection, filtering, ordering, grouping, and joins. The advantage of including these features in the language mean that we can synactically check the queries, and also use compile-time type checking.
Below is a fragment of a Cω program that uses a select expression to return all the unitprice/quantity pairs from some orders with matching zip codes. We highlight various parts of the select expression and discuss them below in further detail.

Some points about this expression are as follows:
struct{string ProductID; decimal Total; }
from clause specifies the data source being queried. This can be either an object in a SQL database, or any in-memory object. In Cω, methods can be defined as either synchronous or
asynchronous. When a synchronous method is called, the caller
is blocked until the method returns, as is normal in C#. However,
when an asynchronous method is called, there is no result and the
caller proceeds immediately without being blocked. Thus from the
caller's point of view, an asynchronous method is like a
void one, but
with the useful extra guarantee of returning immediately. We often
refer to asynchronous methods as messages, as they are a
one-way communication from caller to receiver (think of posting a
letter rather as opposed to asking a question and waiting for an
answer during a face-to-face conversation).
By themselves, asynchronous method declarations are not particularly novel. Indeed, .NET already has a widely-used set of library classes which allow any method to be invoked asynchronously (though note that in this standard pattern it is the caller who decides to invoke a method asynchronously, whereas in Cω it is the callee (defining) side which declares a particular method to be asynchronous). The significant innovation in Cω is the way in which method bodies are defined.
In most languages, including C#, methods in the signature of a class are in bijective correspondence with the code of their implementations -- for each method which is declared, there is a single, distinct definition of what happens when that method is called. In Cω, however, a body may be associated with a set of (synchronous and/or asynchronous) methods. We call such a definition a chord, and a particular method may appear in the header of several chords. The body of a chord can only execute once all the methods in its header have been called. Thus, when a method is called there may be zero, one, or more chords which are enabled:
Here is the simplest interesting example of a Cω class that uses chords:
public class Buffer {
public async Put(string s);
public string Get() & Put(string s) { return s; }
}
This class contains two methods: a synchronous one, Get(), which takes no arguments and returns a
string, and an asynchronous one, Put(), which
takes a string argument and (like all asynchronous methods) returns no
result. The class definition contains two things: a declaration (with
no body) for the asynchronous method, and a chord. The chord declares
the synchronous method and defines a body (the return statement) which can run when both the
Get() and Put()
methods have been called.
Now assume that producer and consumer threads wish to communicate
via an instance b of the class Buffer. Producers make calls to b.Put(), which, since the method is asynchronous,
never block. Consumers make calls to b.Get(),
which, since the method is synchronous, will block until or unless
there is a matching call to Put(). Once b has received both a Put()
and a Get(), the body runs and the argument to
the Put() is returned as the result of the
call to Get(). Multiple calls to Get() may be pending before a Put() is received to reawaken one of them, and
multiple calls to Put() may be made before
their arguments are consumed by subsequent Get()s. Note that:
Get(). Hence no new threads are spawned in
this example.Put() cannot be delivered to two distinct
calls to Get(). Furthermore (though it
makes little difference in this small example), the locking is
fine-grained and brief - polyphonic methods to not lock the
whole object and are not executed with 'monitor
semantics'.In general, the definition of a synchronous method in Cω
consists of more than one chord, each of which defines a body which
can run when the method has been called and a particular set of
asynchronous messages are present. For example, we could modify the
example above to allow Get() to synchronize
with calls to either of two different Put()
methods:
public class Buffer {
public async Put(string s);
public async Put(int n);
public string Get()
& Put(string s) { return s; }
& Put(int n) { return n.ToString(); }
}
Now we have two asynchronous methods (which happen to be distinguished by type rather than name) and a synchronous method which can synchronize with either one, with a different body in each case.
The previous example showed how to define a buffer of unbounded size:
any number of calls to Put()
could be queued up before matching a Get(). We
now define a variant in which only a single data value may held in the buffer
at any one time.
public class OnePlaceBuffer {
private async empty();
private async contains(string s);
public OnePlaceBuffer() { empty(); }
public void Put(string s) & empty() {
contains(s);
return;
}
public string Get() & contains(string s) {
empty();
return s;
}
}
The implementation of OnePlaceBuffer makes use of two
private asynchronous messages: empty() and
contains(). These are used to carry the state of the buffer and
illustrate a very common programming pattern in Cω: note that we have made
no use of fields. The way in which this works can be best understood by reading
the constructor and the two chords in a simple declarative manner:
empty().
Put(s) on an empty()
buffer then it subsequently contains(s) and the call to
Put()
returns.
Get() on a buffer which contains(s)
then the buffer is subsequently empty() and
s is returned to the caller of Get().
Put() and
Get() block.
It is easy to see that the constructor establishes, and both the chords
maintain, the invariant that there is always exactly one empty()
or contains(s) message pending on the buffer. Thus the
two chords can be read as a fairly direct specification of a finite state
machine.
Controlling concurrent access to a shared, mutable resource is a classic
problem. Clients request, and later release, either read-only or read-write
access to the resource. To preserve consistency but minimize waiting, one
usually wishes to allow either any number of readers or a single writer, but
not both, to have access at any one time. The code below implements a
reader-writer lock in Cω. Writers call Exclusive(),
do some writing and then call ReleaseExclusive().
Readers call Shared(), do some reading and then call
ReleaseShared():
public class ReaderWriter {
private async idle();
private async s(int n);
public ReaderWriter() {
idle();
}
public void Exclusive()
& idle() {
}
public void ReleaseExclusive() {
idle();
}
public void Shared()
& idle() {
s(1);
}
& s(int n) {
s(n+1);
}
public void ReleaseShared()
& s(int n) {
if (n == 1)
idle();
else
s(n-1);
}
}
Again, we use private messages, idle() and s(n),
to carry the state. And once more, there is a simple declarative reading of the
constructor and chords which allows one to understand the implementation:
idle().
Exclusive() access and the lock is
idle() then he may proceed (but the lock is no longer idle()).
ReleaseExclusive()
then the lock is idle()
again.
Shared() access and the lock is
idle()then the lock moves to the state s(1),
meaning there is now one reader, and he may proceed.
Shared() access and there are
currently n readers, then there are now n+1
readers, and the new one may proceed.
ReleaseShared()
and there were previously n readers, then if n was 1 then lock is
idle() again, otherwise there are now n-1 shared readers. In
either case, the reader who has just relinquished access may proceed with
whatever else he has to do.Assuming that all the clients obey the request-release protocol, the invariant is that there is at most one pending private async message representing the state:
none <—> idle() <—>
s(1) <—> s(2) <—> s(3)
<—> …
Operationally, it may help to think about what does not appear in the
code. There is, for example, no chord that is applicable if a client calls
Exclusive() when there is a pending s(n) message.
In such a case, the client will block until all the readers have released their
access and an idle() message has been sent (There is a
question about fairness here - see the Polyphonic C# paper for a modification
to this example which enforces a simple form of fairness between readers and
writers.)
Cω is an experimental programming language intended to make it easier to write the data-intensive distributed applications being written today. To do so we need to provide application writers with better support for data and control. Cω contains elegant primitives for asynchronous communication, and offers a strongly-typed integration of the object, relational and semi-structured data models.