This document was made by OCR from a scan of the technical report. It has not been edited or proofread and is not meant for human consumption, but only for search engines. To see the scanned original, replace OCR.htm with Abstract.htm or Abstract.html in the URL that got you here.

A Description of the Cedar Language

A Cedar Language Reference Manual

Butler W. Lampson

CSL-83.15         December 1983 (Printed November 1986)    [P83-00016]

© Copyright 1983, 1986 by Xerox Corporation. All rights reserved.

Abstract: The Cedar Language is a programming language derived from Mesa, which in turn is derived from Pascal. It is meant to be used for a wide variety of programming tasks, ranging from low-level system software to large applications. In addition to the sequential control constructs, static type checking and structured types of Pascal, and the modules, exception handling, and concurrency control constructs of Mesa, Cedar also has garbage collection, dynamic types, and a limited form of type parameterization.

This report describes the Cedar language. Except for chapter 2, it is written strictly in the style of a reference manual, not a tutorial. Furthermore, it describes the entire language, including a number of obsolete constructs and historical accidents. Hence it tells much more than you probably want to know. A summary of the safe langauge and comments throughout the manual suggest which constructs should be preferred for new programs.

CR Categories and Subject Descriptors: D.3.2 [Programming Languages]: Language Classifications - Cedar, extensible languages; D.3.1 [Programming Languages]: Formal Definitions and Theory - semantics

Additional Keywords and Phrases: kernel language, polymorphism, data types

The work described here was completed in late 1983 but not published at that time. It attempts to provide a reasonably formal and precise definition of the Cedar programming language. The then current version of the language was perceived as an inadequate base for a number of planned extensions to the language and supporting environment: on the other hand, there was already a large body of Cedar code that could not simply be abandoned. These problems are dealt with by defining a small but powerful kernel language plus a mapping of existing Cedar constructs into that kernel. The kernel language introduces value spaces and operations over them that go well beyond what has been available in any implemented version of the Cedar language; it was to provide the basis for extension and simplification. The mapping from existing Cedar into the kernel provides not only a migration path for existing code but also a definitional method.

This report should be of interest to students of programming languages and their definitions. Most of the interesting ideas of the Cedar language appear in the kernel, which is described in Chapter 2. Such readers should note that the formalism used to describe the kernel has several known shortcomings. Its treatment of so-called dependent types is somewhat cavalier. A subsequent report by Burstall and Lampson ("A Kernel Language for Modules and Abstract Data Types," Digital Systems Research Center, September 1984) includes a more careful treatment of such types in a language very similar to the kernel. The present treatment also glosses over most of the definitional problems raised by the possibility of concurrent evaluation.

The report should also be of interest to Cedar programmers. Chapters 3 and 4 constitute the most complete. precise and accurate definition of the implemented Cedar langauge that has appeared to date. For a reader willing to make the effort to assimilate the concepts introduced in Chapter 2, this report can serve as an interim reference manual. The later chapters are painfully honest and complete; as the abstract notes, they say much more than anyone probably wants to know. As of March 1986, the only known differences between the description and implemention, other than minor bugs in each, are the following:

· The improved syntax for ENTRY and INTERNAL has not been implemented; these attributes must still precede the type in a procedure declaration (Section 3.5).

· Sections 3.3.4 and 4.3.4 document an improved design for opaque types that was never implemented. In current Cedar, opaque types behave as they do in Mesa.

· According to Section 4.14, if P is a procedure taking one argument, its application to x using dot notation is written without brackets, as x.P. In current Cedar, the alternative form x.P[] is also accepted.

Both classes of readers should note that many parts of the kernel language have never been implemented in their full generality. Some of the current developers and users of the Cedar language would not even agree that the directions of evolution suggested by the kernel language are desirable or feasible. The claims about long-term goals and promised improvements in this report should therefore be taken as the personal opinions of the author.

Ed Satterthwaite, March 1986

Chapter 1. Introduction

The Cedar language is a programming language derived from Mesa, which in turn is derived from Pascal. It is meant to be used for a wide variety of programming tasks, ranging from low-level system software to large applications. In addition to the sequential control constructs, static type checking and structured types of Pascal, and the modules, exception handling, and concurrency control constructs of Mesa, Cedar also has garbage collection, dynamic types, and a limited form of type parameterization.

This manual describes the Cedar language. Except for the overview material in § 2.1 and the discussion of concepts in §§2.3-2.7. it is written strictly as a reference manual, not a tutorial. Furthermore, it describes the entire language, including a number of obsolete constructs and historical accidents. Hence it tells much more than you probably want to know. A summary of the safe language and comments throughout the manual, suggest which constructs should be preferred for new programs.

The manual is organized into three major parts:

Chapter 2: A description of a much simpler kernel language, in terms of which the current Cedar language is explained. This description includes:

An overview or glossary, in which the major technical terms used in the kernel are briefly defined (§2.1).

An informal explanation of the ideas of the kernel and the restrictions imposed by current Cedar (§§ 2.3-2.9)

A precise definition of the kernel (§2.2). Most readers will probably find this rather hard going.

Chapter 3: The syntax and semantics of the current Cedar language. The semantics is given precisely by a desugaring into the kernel. It is also given more informally by English text. This chapter also contains a number of examples to illustrate the syntax.

Chapter 4: The primitive types and procedures of Cedar. For each one, its type is given as well as an English definition of its meaning. This chapter is organized according to the class hierarchy of the primitive types (§4.1).

In addition, there is a one-page grammar for the full language, a shorter grammar for the safe language, and a two-page language summary which includes the grammar, the desugaring, and the examples from § 3. The tables in §§4.1-2 summarize the types and primitives.

To find your way around:

First read chapter 2, except for § 2.2.

Then consult the table of contents, or the index, for the topics of interest to you. The full grammar (at the end) and the class hierarchy (Table 4-1) may also be useful as starting points.

The manual is extensively cross-referenced. Section titles and numbers appear at the top of each page. The summaries and tables also point to the section in which each construct is defined.

Acknowledgements: Rod Burstall and Ed Satterthwaite helped me greatly in clarifying the ideas presented in § 2. Ed was also indispensable in getting an accurate description of the current Cedar language. Bill McKeeman's work on an earlier Cedar language description was the starting point for this manual. Will Crowther, Jim Horning and Lyle Ramshaw read part or all of the manual carefully, and made many helpful comments. Several other Cedar programmers have pointed out errors or omissions. Of course, I am responsible for the errors that remain.

Chapter 2. The kernel language

This document describes the Cedar language in terms of a much smaller language, which we will usually call the kernel or the Cedar kernel. Cedar differs from the kernel in two ways:

·      It has a more elaborate syntax (§ 3). The meaning of each construct in Cedar is explained by giving an equivalent kernel program.

Often the kernel program is longer or less readable: the Cedar construct can be thought of as an idiom which conveniently expresses a common operation. Sometimes the Cedar construct has no real advantage, and the difference is the result of backward compatibility with the ten-year history of Mesa and Cedar.

·      It has a large number of built-in or primitive types and procedures (§4). In the kernel language all of these could in principle be programmed by the user, though in fact most are provided by special code in the Cedar compiler. In general, you can view these built-in facilities much like a library, selecting the ones most useful for your work and ignoring the others.

Unfortunately, the current Cedar language is not a superset of the kernel language. Many important objects (notably types, declarations and bindings) which are ordinary values in the kernel that can be freely passed as arguments or bound to variables, are subject to various restrictions in Cedar: they can only be written in literal form, cannot be arguments or results of procedures, or whatever. The long-term goal for evolution of the Cedar language is to make it a superset of the kernel defined here. In the meantime, however. you should view the kernel as a concise and hopefully clear way of describing the meaning of Cedar programs.

To help in keeping the kernel and current Cedar separate, reserved words and primitives of the kernel which are not available in current Cedar are written in SANS-SERIF SMALL CAPITALS, rather than the SERIF SMALL CAPITALS used for those symbols in current Cedar. Operator symbols of the kernel which are not in current Cedar are not on the keyboard.

The kernel is a distillation of the essential properties of the Cedar language, not an entirely separate invention. Most Cedar constructs have simple translations into the kernel. Those which do not (e.g., some of the features of OPEN) are considered to be mistakes, and should be avoided in new programs.


§ 2.1 gives a brief summary of each major idea in the kernel, which may be helpful as an introduction and reminder. Most of the chapter (§§2.3-2.8) is an informal explanation of the concepts behind the kernel. Usually. terms are defined and explained before they are used, but some circularity seems to be unavoidable. Both this and the explanations in §§2.3-2.7 are given under five major headings, as follows:

Values and computations The type system




There is also a sketch of the restrictions imposed by the current Cedar language on the generality of the kernel: for more on this subject, see § 3. The meaning of the various built-in primitives is given in § 4. The incompatibilities between the kernel language and current Cedar are described in § 2.9. i.e., the constructs in Cedar which would have a different meaning in a kernel program. For the most part, these are bits of syntax which do not have consistent meanings in current Cedar: future evolution of the language will replace them with their kernel equivalents.

§ 2.2 precisely defines the syntax and semantics of the Cedar kernel language, the former with a grammar, and the latter by explaining how to take a program and deduce the function it computes and the state changes it causes. The kernel definition follows the ordering of the kernel grammar. This section is rather difficult to read, and you may prefer to skip it.

2.1 Overview

This section gives a brief summary of the essential concepts on which the Cedar language is based. The explanations are informal and incomplete. For more precise but more formal definitions, see § 2.2: for more explanation, see § 2.3-§ 2.8.

2.1.1 Values and computations

Application: The basic mechanism for computing in Cedar is applying a procedure (proc for short) to arguments. When the proc is finished, it returns some results, which can be discarded or passed as arguments to other procs. The application may also change the values of some variables. In the program an application is denoted by (the denotation of) the proc followed by square brackets enclosing (the denotation of) the arguments: f [first-3, last– x+ 1]: here the – symbol binds the value of the expression on the right to the name on the left. There are special ways of writing many kinds of application: x+ 1, person.salary. IF x<3 THEN red ELSE green, x<-7 .

Value: An entity which takes part in the computation (i.e., acts as a proc, argument or result) is called a value. Values are immutable: they are not changed by the computation. Examples: 3, TRUE, "Hello". X [x: !NT] IN x+ 3: actually these are all expressions which denote values in an obvious way. The X-expression denotes a proc value P: the name x is called a parameter. When P is applied to an argument, the parameter x is bound to the argument.

Variable: Certain values. called variables, can contain other values. The value contained by a variable v (usually called the value of v) is returned by v.VALUEOF, and can change when a new value is assigned to v. In addition to its results, a proc may have side-effects by changing the values of variables. Nearly every non-variable type T has a corresponding variable type VAR T; values of type VAR T contain values of type T. Every VAR type has a NEW proc which creates a variable of the type. A variable is usually represented by a single block of storage: the bits in this block hold the representation of its value. A variable may be local to a proc. or it may be created by an explicit call of NEW, and referred to by a REF or pointer value.

Group: A group is an ordered set of values, often denoted by a constructor like this: [3, x+ 1, "Hello"]. Like everything else. a group is itself a value.

Binding: A binding is an ordered set of [name. value] pairs, often denoted by a constructor like this: [x: INT-3, y: BOOLTRUE] (or simply [x-3, yTRUE], in which the types of the names are the syntactic types of the expressions). If b is a binding. b.n denotes the value of the name n in b. Note the difference between binding and assignment: one introduces a new name with a fixed value: the other changes the value of a variable.

Argument: A binding constructor written explicitly after an expression (e.g., Copy{from– x, to–y]) denotes application of the value P denoted by the expression to the value a denoted by the constructor, called the argument. P is usually a proc, and a is a binding, which is bound to P's domain declaration D to get the argument which is passed. In making this binding a is coerced, if necessary, to match the declaration:

If a name in D is missing from a. a default value is supplied.

If a value in a doesn't have the type required by D, it is coerced (if possible) into another value which does.

The constructor can also be for a group, in which case the names from D are attached to its elements to turn it into a binding.

2.1.2 The type system

Type: A type defines a set of values by specifying certain properties of each value in the set (e.g., integer between 0 and 10); these properties are so simple that the compiler can make sure that proc arguments have the specified properties. A value may have many types; i.e., it may be in many of these sets. A type also collects together some procs for computing with the value (e.g., add and multiply).

More precisely, a type is a value which is a binding with two items:

Its predicate, a function from values to the distinguished type BOOL. A value has type T if T's predicate returns TRUE when applied to the value.

Its cluster, a binding in which each value is usually a proc taking one argument of the type. For any expression e, the expression e.f denotes the result of looking up fin the cluster of is syntactic type V e, and applying the resulting proc to the value of e.

A proc's type depends on the types of its domain and range; a proc with domain (argument type) D and range (result type) R has the type D— )R. Every expression e has a syntactic type denoted V e. e.g., the range declared for its outermost proc; in general this may depend on the arguments. The value of e always has this type (satisfies this predicate); of course it may have other types as well.

Mark: Every value carries a set of marks (e.g., INT or ARRAY: think of them as little flags stuck on top of the value). The predicate HASMARK tests for a mark on a value; it is normally used to write type predicates. The set of all possible marks is partially ordered.

The set of marks carried by a value must have a largest member m. and it must include every mark smaller than m. Hence all the marks on a value can be represented by the single mark m: we can say that m is the mark on the value. This does not imply a total ordering on the marks.

Type-checking: The purpose of type-checking is to ensure that the arguments of a proc satisfy the predicate of the domain type; this is a special kind of pre-condition for executing the proc. The proc body can then rely on the fact that the arguments satisfy their type predicates. It must establish that the results satisfy the predicate of the range type; this is a special kind of post-condition which holds after executing the proc. Finally, the caller can rely on the fact that the results satisfy their type predicate. In summary:

Caller—                   establish pre-condition: arguments have the domain type;

rely on post-condition: results have the range type.

Body—                     rely on pre-condition: parameters have the domain type;

establish post-condition: returns have the range type.

Declaration: A declaration is an ordered set of [name, type] pairs. often denoted like this: [x: INT, y: BOOL]. If d is a declaration, a binding b has type d if it has the same set of names, and for each name n the value b.n has the type dn. A binding b matches d if the values of b can be coerced to yield a binding b' which has type d.

A declaration can be instantiated (e.g., on block entry) to produce a binding in which each name is bound to a variable of the proper type; instantiating the previous example yields


Class: A class is a declaration for the cluster of a type. For instance, the class Ordered is [T: TYPE. LESS: PROC[T, T]--0.[Boot1, . . 1. C is a subclass of D if (loosely) C includes at least all the [name. type] pairs in D.

2.1.3 Programs

Name: A name (sometimes called an identifier) appearing in a program denotes the value bound to the name in the scope that the name appears in (unless the name is in a pattern before a colon (declaration or binding) or tilde (binding), or after a dot or $). An atom is a value that can be used to refer to a name; a literal atom is written like this: $alpha.

Expression: In a program a value is denoted by an expression, which is one of: a literal value-3 or "Hello":

a name– x or salary:

an application of a proc value to a group or binding value– GetProperties[directory. input]:

a A-expression. which yields a proc value– A [x: INT]=>[INT] IN (IF x<0 THEN –x ELSE x);

a constructor for a declaration or binding–[x: INT-3, y REAL-3.14D.

If a value is given for each free name in an expression, then it can be evaluated to produce a value. Thus an expression is a rule for computing a value. The entire program is a single expression. made up of sub-expressions according to the five constructs above.

Scope: A scope is a region of the program in which the value bound to a name does not change (although the value might be a variable, whose contents can change). For each scope there is a binding called ENV (for environment) which determines these values. A new scope is introduced (in the kernel) by IN (after LET or A) or by a REC [...] constructor for a declaration or binding; e.g..

LET x"3 IN x-l- 5:

LET REC Fact–A[n: INT]=>[r: INT] IN (IF n THEN 1 ELSE rt*Fact[n-1]) IN Fact[4].

The first expression evaluates to 8. the second to 24.

Constructors: Brackets delimit explicit constructors for group. declaration or binding values. They all have the form [x1, x2....], and are distinguished by the form of the x.:

an expression for a group; n: e for a declaration:

n–e or n: e1–e2 for a binding.

Recursion: When names are introduced in a constructor in Cedar, this is done recursively:

If v is bound to n in a binding constructor, then in expressions in the constructor n has the value v. rather than its value in the enclosing scope. Exception: argument bindings are non-recursive.

If n is declared in a declaration constructor, then it may not be used in the constructor, unless there is an ordering of the declarations in the constructor such that a name is used only by later declarations. Exception: declared names may be used in the bodies of A-expressions in the constructor (see § 3.3.4).

In the kernel, however, constructors are non-recursive unless preceded by REC.

Dot notation: The form an looks up n in some binding associated with e. and does something with the result. There are three cases:

if e is a binding, an is just the value paired with n in e.

If e is a type. e.n is e.Cluster.n.

Otherwise. e.n is (V e.n)[e], and e.n[more args] is usually (V e.n)[e, more args]. Recall that V e is the syntactic type of e.

In all cases you are supposed to think of n as some property or behavior associated with e: an denotes that property or evokes that behavior.

2.1.4 Conveniences

Coercion: Each type cluster may contain To and From procs for converting between values of the type and values of other types (e.g.. Float: PROC[INT]-'[REAL]: this would be a To proc in REAL and a From proc in 4\T). One of these procs is applied automatically if necessary to convert or coerce an argument value to the domain type of a proc: this application is a coercion. Each coercion has an associated atom called its tag (e.g.. $widen for INTREAL or $output for INT—ROPE): several coercions may be composed into a single one if they have the same tag. The tags thus serve to prevent unexpected composition of coercions: all are NIL currently. however.

Exception: There is a set of exception values. An expression e denotes a value which is either of type Ce or is an exception. Whenever an exception value turns up in evaluating an expression e1. it

immediately becomes the value of e1. unless (in the kernel) e1 has the form e, BuT {...}. The {...}

tests for exception values and can supply an ordinary value, or another exception. as the value of the BuT expression. An exception value may contain an ordinary value, called the argument of the exception. so that arbitrary information can be passed along with an exception.

Finalization: When a variable is no longer accessible, the storage it occupies is freed (automatically in the safe language). Before this is done. a finalization proc in the cluster of the variable's type is called to do any other appropriate resource deallocation. Finalization is done by separate processes. and hence must be explicitly synchronized with the rest of the program. The local variables of a proc or other scope may also be finalized (using UNwiND): this is done synchronously (§ 3.4.3A).

Safe: The safety invariant says that all references are legal, i.e., each REF T value is NIL or refers to a variable of type T. A proc is safe if it maintains the safety invariant whenever it is applied to arguments of the proper types. If a proc body (A-expression) is

checked, the compiler guarantees that the proc value is safe:

trusted. the programmer asserts that it is safe (the compiler makes no checks): the proc value is safe: unchecked, the compiler makes no checks and the proc value is unsafe.

It is best to write checked code whenever possible. However, checked code cannot call unsafe procs (since the compiler then cannot guarantee safety).

Process: Concurrency is obtained by creating a number of processes. Each process executes a single sequential computation. one step at a time. They all share the same address space. Shared data (touched by more than one process) can be protected by a monitor: only one process can execute within the procs of the monitor at a time. So that each process can know what to rely on. there must be an invariant for the monitored data which is established whenever a monitor proc returns or waits. A process can wait on a condition variable within a monitor: other processes can then enter the monitor. The waiting process runs again when the condition is notified. or after a timeout.

2.1.5 Miscellaneous

Allocation: Cedar has standard facilities for allocating new variables of any type (the NEw primitive): related variables can be allocated in the same zone. Normally. variables are deallocated automatically by the garbage collector when they can no longer be referenced: such variables can only be referred to by REFS. Variables can also be deallocated explicitly by FREE. but this is unsafe.

Static: An expression whose value is computed without executing the program is called static. Literals are static. as are names bound to literals. and any expression with static operands. Proc bodies are never static unless they are inline, and often not then.

Pragma: Some language constructs do not affect the meaning of the program (except possibly to make a legal program illegal). but only its time and space costs: these are called pragmas. Examples are INLINE for proc bodies and PACKED for arrays.

2.2 Kernel definition

This section gives the syntax and semantics of the Cedar kernel language. Motivation, and an explanation of the relation between the kernel and the current Cedar language, can be found in §§ 2.3-2.8. Since this section is rather formal, you are advised to read the rest of the chapter first, and then return here if you want a more precise definition.

The kernel is subdivided into

A rather austere core; anything can be desugared into this, but not very readably (§2.2.1). A set of conveniences: with these, readable programs can be written (§ 2.2.2). Imperative constructs: statements and loops (§ 2.2.3).

Exception handling (§ 2.2.4).

The format of this section interleaves grammar rules which give the syntax of the language with text which gives the meaning. The meaning of the core is given in English. For other parts of the kernel, it is given by desugaring rules which show how to rewrite each construct in terms of others: if rewriting is done repeatedly. the result is a core program, which may invoke some primitives. The meaning of these is also given in English. There is also some English explanation of the desugaring. but this is only a commentary and does not have the force of law.

See § 3.1 for the notation used in the grammar and desugaring.

2.2.1 The core

The Cedar core is a minimal subset of the kernel, barely adequate as a base into which the rest of the kernal can be desugared. In the core, there is syntax only for names. literals. application, X-expressions. a basic and a recursive binding construction, and syntactic type: everything else is done with primitives. We never write anything in the core, however, except to show the desugaring of a kernel construct. Thus the reader need not struggle with programs in the ugly core syntax.

Many readers may be happy with the kernel definition given in the other sub-sections of § 2.2, and may wish to avoid the formalism of this section.

Table 2-1 gives the core syntax (in the first column). together with a comment suggesting the meaning of each construct (in the last column). The meaning is given in detail in § 2.2.1A-G. The middle column gives the syntactic type of each construct. For readability, this is written in the full kernel language. with a few conventions:

a * in front of the syntactic type indicates that it gives less information that one would like. For instance. DDOTP has type DECL-4TYPE. which says nothing about the fact that the type is a cross type whose structure matches the structure of the decl.

A parameter to a primitive declared with :: is the type of some other argument: the argument for this type parameter may be omitted in an application of the primitive, in which case it is supplied as the syntactic type of the other argument. For instance, p: [r: TYPE. x: t]*[...] can be applied with p[x-3], which is short for ,[r—INT. x-3].

A bold name is a reference to another parameter, e.g.. t in the previous example.

In the kernel, a core primitive named xooly is in the cluster of the type of its argument under the name y. Thus DDOTP is in the cluster of DECL under the name P. so that d.P=DDOTP[d] if d is a decl.

Text Box: Syntax	Syntactic type	MeaningText Box: expression ::=Text Box: n[	Vn	ENV.nText Box: literal I	V literalText Box: el n e2 I	(Vel.RANGE)ted	-- Standard application. I
A di = > d2 IN e I	dI—od2	-- Standard proc constructor. I
A d1= > d2 IN e	dl—>d2	-- Unchecked standard proc constructor. I
[(n—e), !..]	[(n: Ve), ...]	-- Vanilla binding constructor. I
FIX d e	d	-- Recursive binding constructor. I
De	TYPE	-- Syntactic type.
type ::= e	Ve	-- A type is syntactically just an expression.
deci ::= e	-- A decl is syntactically just an expression.
name ::=

letter (letter I digit)... literal ::=


primitive ::=













(VENv).n                                   -- Appears as an e or in a pattern.

A-10M                               .;TOM literal. I
V primitive

[d: DECL, p: (d—>DECL)]—qa: --arrow--fl PE] --arrow--TYPE]—,[r: TYPE]

[ri :: TYPE, firs:: t1, t,:: TYPE. rest: t_]—*[v: t1Xt2]

[ri:      PE]—[t: TYPE]      --iTYPE

[g: GROUP[TYPE]])[c: --cross--TYPE]

*Et: --cross--TYPE]+[g: GROUP[TYPE]]

Ed: DECL, v: d.-rho[b: d]

[b: BINDING]-0[d: DECL] [d:: DECL, b: [1]-4[v: d.T

[p: PATTERN, I:: TYPE, v: t]-3,[b: MKDECL[p. t]] =MKBINDD[d—MKDECap, t]. v— v] Ed:: DECL, b: d. n: ATOM]+[v: oToB[d].n]

DECL, b1d1' d2' • : DECL, b,: dd,]-->[v:d,]-->[v:d1 THEND d, ]



*[p: PATTERN, t: TYPE]— [d: DECL]

*[d: DECL]*[p: PATTERN]

*[d: DECL]>[l: TYPE]

*[d: DECL])[b: BINDING] --=MKBINDP[pd.P, vdT.G]]

*[b: BINDING]*[d: DECL] --=MKDECL[pb.D.P, tMKCROSS[b.V]] DECL, d2: DECL]—.[v: DECL] --=BTOD[DTOB[di] THEN ErOB[di]l

[t:: TYPE. v: t]—qh: HEX] TY 13F

TRUE I FALSE             BOOL




ANY for any type T

-- See § 2.2.4 -- See § 2.2.4

Table 2— I: The core language

A name not in a literal (or pattern. in the kernel) denotes the value to which it is bound in the current environment ENV (A below). An ATOM literal is a value which stands for a name in the primitives which deal with declarations and bindings.

A literal denotes a value according to a rule which depends on its syntax. The core has only numeric and ATOM literals. and the primitives enumerated above.

An expression denotes a value according to a rule which depends on its syntax. If the expression is a name or literal, the value is the value of the name or literal. The remaining cases are discussed in the following sub-sections. Most of these cases define the value of the expression in terms of the value of its sub-expressions. The sub-expressions may be evaluated in any order.

A.   The current environment ENV

The current environment ENV is a binding. The value of the expression n is ENV.n. ENV for a sub-expression is the same as ENV for its containing expression, except that:

For the b of a closure being applied, ENV is computed according to B below.

For the e of a FIX. ENV is computed according to E below.

Thus applying a closure and evaluating a FIX are the only ways to change ENV.

B.    Application

The value of a standard application is obtained by evaluating el and e2 to obtain v1 and v,, and applying v1 to v2. There are two cases for application:

v1 is a primitive. The value of the application is a function of v2 given in the definition of

the primitive. The core primitives are defined throughout § 2.2.1. the Cedar primitives in §4.

v is a closure c (C below), with domain declaration d, body b and environment E. The value of the application is the value of the expression b in the environment


(E below). Note that if the closure was made with A, the body must be type-checked when it is applied: a closure made with A was type-checked when it was made (C below).

Vet must be an arrow type. An application type-checks if Ve, implies VerDOMAIN (G below). The type of the application is obtained by applying Ve1.RANGE to v2. In simple cases. Ve1.RANGE is a

constant. For instance. NOT: BOOL>BOOL has RANGE=A BOOL=>TYPE IN BOOL. However, the result type may depend on the argument value. Thus

VMKBINDD.RANGE= A [d: DECL, v: d.T] = >TYPE IN [b: d]

so that MKBINDD[[i: INT]. 3] has type [b: [1: INT]] to go with its value [b–P-31].

C. Lambda

The value of a A-expression is a closure, which has three parts:

A domain declaration d. equal to the value of d1.

A body b. which is the expression e (not the value of e).

An environment E. equal to the current environment ENV when the A is evaluated. A A-expression type-checks if

d1 evaluates to a declaration d.

For any x of type d.T. Ve implies d2.T in the environment MKBINDD[d. x] THEN E.

A A-expression type-checks if d1 evaluates to a declaration: type-checking of the body is deferred until the closure is applied.

D.    Pairs, groups and cross types

A pair is the basic structuring mechanism. MKPAIR[x, y] yields the pair <x. y>. Bigger structures are made, as in Lisp. by making pairs of pairs. When we are interested in the leaves of such a structure. we call it a group and call the leaves its elements. A group has type GRouP[7] if all its elements have type T or are NIL. A flat group is a pair in which first is not a group, and rest is a flat group or


The type of a pair is a cross type: MKPAIR[x, y] has type TX U iff x has type T and y has type U. Cross types are made with MKCROSS, which turns a GROUP[TYPE] (i.e., a group whose elements are types) into a cross type in the obvious way:


MKCROSS[T] = T if T is a type.


Note that MKCROSS of a flat group is flat. CDOTG goes the other way, turning a cross type into a GROUP[TYPE] in which no element is a cross type. Thus MKCROSS is the inverse of CDOTG, but not necessarily the other way around.

E.  Bindings

A binding is either NIL, or an <atom, value> tuple, or a <binding, binding> tuple. The primitive MKBINDD constructs a binding from a declaration d and a matching value v, i.e. (as the type of MKBINDD indicates), one with the type d.T. The resulting binding has type a and consists of the names from d paired with the corresponding values from v. Example:

MKBINDD[ [x: INT, b: BOOL]. [3, TRUE]] = [x-3, b–TRUE]

= < <$X, 3>, «$b, TRUE>, NIL > >

In this example. d-r is INTXBOOL.

The declaration and group in this example is written using the syntax of §2.2.2: in the core they would be MKDECL[p[Sx. Sb], r—imcRossllivr. BooLl] ] and kiKpain[firsi-3. restmKPAIRUirsiTRUE. restNIL]] (where we have written the arguments of these primitives in the kernel syntax).

The primitives BTOD and BTOV return the arguments of the MKBINDD primitive that made the binding. MKBINDP is redundant: it is like MKBINDD, but takes a pattern instead of a declaration, and hence accepts any v with the right structure, regardless of the component types.

LOOKUP returns the value of the name n in the binding. THEN combines two bindings, giving priority to the first one in case of duplicate names. It works only for flat bindings, in which the first element of each <binding, binding> tuple is an <atom, value> tuple, and the second element is another <binding. binding> tuple or NIL. The value of b1 THEN b2 is another flat binding, obtained

by first replacing any tuple <<a, v>, b> in b2 where a is equal to an atom in b1 by b. and then using this binding to replace the final NIL in b1.

The binding constructor [(ne),            has the value MKBINDP[pIn, ...], v[e, ...] ].

FIX makes a recursive binding: the value of FIX d1–e is MKBINDD[d. v], where d is the value of d1 in

ENV and v is the value of e in the environment (LET FIX d–e IN d–e) THEN ENV. Of course in general this computation may not terminate: normally the names in d occur in e only in the bodies of A-expressions, and in this case it does terminate. The FIX typechecks if De in the latter environment implies DTOT[d].

F.  Declarations

A declaration is either NIL, or an <atom, type> tuple. or a <declaration, declaration> tuple. The primitive MKDECL constructs a decl from a pattern p and a value 1 of type GROUP[TYPE]. A pattern is a GROUP[ATOM]. i.e., either NIL, or an atom, or a pair of patterns: the ATOM elements must all be different. An application of MKDECL typechecks if t matches p. i.e., if

both p and t are NIL, or

p is an atom and 1 has type TYPE, or

p is a pair [p1, p2] and t is a cross type 11Xt, and pi matches ti and p2 matches t2.

The resulting declaration consists of the names from p paired with matching type values from I.

The primitives DDOTP and DDOTT return the arguments of the MKDECL primitive that made the declaration. Thus


DDOTT[<$n, 7)] = T:

DDOTT[<dt. d2>]=DDOTT[d1]XDDOTT[d2]

DTOB is redundant: it converts a declaration to a binding in which each name has the corresponding type as its value. Thus DTOB[[x: INT, y: REAL]]=[x-INT, jr-REAL]. The inverse is BTOD. also redundant: it is defined only if all the values in the binding are types. THEND combines two declarations just as THEN combines two bindings: V(b1 THEN b2)= Vbi THEND Vb2

G.    Types and type-checking

A type is a value consisting of a pair:

the predicate, a function from values to BOOL. the cluster, a binding.

A value v has type T if Ts predicate applied to v is TRUE. T implies U iff (V x) T.Predicate[x]U.Predicate[x].

Typechecking consists of ensuring that the argument of an application has the type specified by the domain of the proc (B above). The body of a A-expression can then be type-checked (or the implementation of a primitive constructed) independently, assuming that the parameter satisfies the domain predicate. Symmetrically, the result of an application can be assumed to have the type specified by the range of the proc.

To complete the induction. it is also necessary to check that the value of the body of a A-expression has the range type (C above).

The primitive types in the kernel are:

BOOL, with two values TRUE and FALSE.

ATOM, with values denoted by literals of the form $n. TYPE, a predicate satisfied by any type value. ANY, a predicate satisfied by any value.

DECL. the type of a declaration (F above).

BINDING, the type of any binding.

Arrow types, the types of procs (C above). An arrow type has a domain type and a range type.

Cross types, the types of pairs (D above).

GROUP[7], the type of any pair in which all the elements have type T. Declarations, the types of bindings (E and F above).

There are no non-trivial implications among any of these types, except as follows:


TANY for any type T.

T1XT2= U IX U2 iff Ti=Ui and T2= U2.

GROUP[T]=.GROUP[U] if           U.

T1-+T2U1-12                                T1 and (Vx: U1) (A T1 IN T2)[xl(X U1 IN L12)[x]. Note the
reversal of the domains.

d1 'd2for declarations iff di.P=d2.P and oToB[d1].nDTOB[d2J.n for each n in d1.P.

2.2.2 Conveniences

Table 2-2 gives the syntax and semantics for kernel expressions. Most of this is straightforward sugar. LET adds the binding el to ENV in evaluating e2. The separate case for b, ... simply allows the

H which normally enclose a binding constructor to be omitted in this case; see below. IF wraps e2 and e3 in A's so that they don't get evaluated: the iFPROC primitive chooses the one to evaluate and applies it.

The dot notation has three cases.

For a binding it just looks up n in the binding.

For a type it looks up n in the type's cluster.

For anything else, it looks up n in the cluster of V' e and applies the result to e. The special LOOPUPC primitive does something special if it finds a proc which takes more than one argument: it splits the proc into one which takes the first argument and returns a proc taking the remaining arguments. This ensures that if V e.n is such a proc P, the expression en[a b] will desugar into something equivalent to P[e a b].

The usual syntax for application is a proc el followed by an explicit binding constructor. The kind of application may depend on the type of el, via the APPLY element of its type; for a proc applied by the standard apply operator 4, APPLY is the identity. If e1 is followed by an group rather than a binding constructor, the argument is obtained by binding the group to the declaration which is ei's domain.

Infix operators desugar straightforwardly into application: note that the choice of proc is determined by the type of the first operand only. AND and OR are not ordinary infix operators. since they evaluate no more than necessary; this is expressed by the desugaring into IF.

The remaining expression syntax is various constructors, described below, and the imperative and exception features described in the next two sections.

expression = coreExpression d1 —> d2


LET el IN e2 I


IF el THEN e2 ELSE e3

e n


el infixOp e2 I

e1 AND e2 I el OR e2 I

PATT p I [ b, !.. 1 I

REc[(p : t"'e)....]I

[ d. !.. ] X X

statements I simpleLoop but

infixOp ::=



literal :: = coreLiteral digit digit ... I

declaration ::= p: t 1

[(p:                      I

binding :: =

d e I
pattern ::=

n I

[pi. -1

primitive :: = corePrimitive



ARROW 4 [di. A di = >DECL IN d2]

-- The domain defaults to [], the range to Vet (A Vei IN e, ) 4 ery ei a binding I

LET [b....] IN e

iFPROC[Ve2. ei, A IN e2. A IN e311 []


ELSE IF VeTYPE THEN LOOKUP I [V e.cluster, $n]

ELSE ( LOOKUPC 4 [V e.cluster, $n] ) 4 [e]

el . APPLY 4 [b, ...] I el . APPLY I MKBINDD[Vei.DOMAIN, [e,....]

ei . infix0a[e2]


NIL I mkPAIR[el, [ (I e2, )]-- Group constructor. I -- Pattern constructor: see the rule for p below. I


FIX [p, ...] : MKCROSS[[t, ...11-[e, ...]


xxxxxx I --Also recursive d maps into this? -- See § 2.2.3

-- See § 2.2.4.


INT                -- Numeric literal, giving the decimal representa
-- A d is not an e: a d must be before - or after LET or DECL. MKDECL[ PATT p, t]

[p, ...]: MKCROSS[[t ...]]   -- to separate names and types
-- Only the [...] form is an e: a b must be written after LET. I


-- Note: a pattern is not an e: it can appear only before - or or after PATT in the kernel.

PATT n=$n

PATT [pi, ...]=[PATT pi, ...]

-- Fill in types

The precedence of operators in e is: (highest) a        infixOps (all the same), BUT. IN (lowest). All are
left associative.

Table 2 — 2: Kernel expression syntax and semantics


A bracketted sequence of expressions (e.g., [1, 2. 3]) denotes a flat group with its elements in the same order (e.g.. mKPAIR[1, MKPAIR[2. mkPAIR[3, Nit]]]. Thus a group constructor is just like the LIST function in Lisp. A pattern is a similar construct, except that it contains names which stand for the corresponding ATOM literals; PATT yields the group obtained by replacing each name n by the literal $n. After desugaring a pattern always appears after PATT and hence is always desugared into an atom or a GROUP[ATOM].

Brackets are also used to delimit binding and declaration constructors. They are distinguished from each other, and from group constructors, by the presence of – in each element of a binding constructor, and : in each element of a declaration constructor. The elements of a binding or declaration constructor are sugar for applications of the MKDECL. MKBINDP and MKBINDD primitives. The constructor itself strings the resulting declarations into a big one using the PLUS operator. which is just like THEN except that it does not allow duplicate atoms; the motivation for this is to allow the names and corresponding types or values to be written together, instead of factored as the primitives require. As a result, values made from constructors are always flat.

Note that these constructors do not nest, so they can only be used to build flat values. The only exception is that a d can be [(p: ... ]. This is intended for the d–e form of binding; e.g., if DivRem returns two INTS, you can write [d: INT, r. INT]-'DivRem[...] instead of [d, r]: INTXINT–DivRem[...].

The REC binding constructor is sugar for FIX which exactly parallels the non-recursive one.

2.2.3 Imperatives

These constructs are generally used together with non-functional procs.

statements ::= e; }                                                                   IF (IsvoiD[e]) AND ... THEN [] ELSE ERROR

-- Ordering by non-prompt evaluation.

simpleLoop ::= SIMPLELOOP statements                                 LET REC [loop'–(X IN { statements: loop[] })] IN loop[]

-- Only an exception (such as EXIT) will terminate the loop.

Each e in the statements must evaluate to VOID, which is a distinguished null value; this is to catch mistakes like writing x+1 as a statement. The definition of AND ensures that the is are evaluated left-to-right.

The simpleLoop is the standard way to express a loop in terms of recursion. You are supposed to use an exception to get out of this loop; Cedar provides a number of convenient ways to do this, such as EXIT and RETURN.

2.2.4 Exceptions

An exception is treated as a special value returned from an application. The exception value contains an exception code and an args value which may be of any type. When an application sees an exception value, it immediately abandons the application and returns the exception value; thus application is strict. There has to be some way to stop this, or the first exception would be the value of the program. The HIDE primitive takes any value and returns a variant record of type HEX. It turns:

a normal value into the normal variant, with the value in its v field;

an exception into the exception variant, with the code in its code field and the arguments in its args field.

UNHIDE takes a HEX value and returns the original unhidden value.

An exception code has the type EXCEPTION[7], where T is a declaration which is the type of the args: it is the domain of the exception, and (VEXCEPTION[T]).DOMAIN= T. An exception value is constructed by the primitive

RAISE: [T:: TYPE, code: EXCEPTION[T], args: T]

Thus the args always has the type demanded by the code.

This is dressed up with the following syntax.

but ::= e BUT { butChoice; ...

butChoice :: = el => e2
e        e1.'... => e21

ANY => e2

LET v"HIDE[e] IN (


ELSE IF ISTYPE[v', HEx.exception] THEN

LET h"—NARRow[v% HEx.exception ] IN

LET selector"--11`.code IN butChoice ELSE ... ELSE UNHIDE[v] ELSE ERROR

IF selector' = ei THEN LET MKBINDD[Ve1.DOMAIN, h'.args] IN e2

IF (selector =e1) OR ... THEN e2 IF TRUE THEN e2

A BUT expression evaluates e. If it is a normal value, that is the value of the BUT. If it is an exception, each butChoice in turn gets a look at it. If one of them likes it, then it supplies the value of the BUT: otherwise the exception is the value.

The el in a butChoice must evaluate to an exception code. If there is just one, and it matches code in the exception. then args in the exception is bound to the domain of the code, and e2 is evaluated in that environment. If there is more than one. then e2 is just evaluated in the current environment. An ANY butChoice matches any exception, but of course doesn't bind the arguments.

2.3 Values and computations

A computation in Cedar is the evaluation of an expression in some environment. This section describes the kinds of values which can be computed by Cedar programs, and the basic mechanisms for doing computations.

2.3.1 Application

The basic mechanism for computing in Cedar is applying a proc to argument values. A proc is a mapping

from argument values and the state of the computation. to result values, and a new state of the computation. The state is the values of all the variables.

A proc is implemented in one of two ways:

By a primitive supplied as part of the language (whose inner workings are not open to inspection, but which is defined in § 4).

By a closure, which is the value of a A-expression whose body in turn consists of an expression, which may contain further applications of procs to arguments. e.g., A [x: INT] IN x+ 3. When a closure is applied, the parameters declared after the A are bound to the arguments, and then the body after IN is evaluated in the new environment thus obtained.

In Cedar, each parameter value thus obtained is used to initialize a variable, which is the object named by the parameter in the body. Thus the body can assign to the parameters. Use of this feature is not recommended.

Note that when a A-expression is evaluated to obtain a closure its body is not evaluated, but is saved in the closure, to be evaluated when the closure is applied. Some constructs (IF, SELECT, AND, OR) are defined (see § 2.2.2 and § 3.8) by wrapping A-expressions around some arguments, and then applying them only when certain conditions hold; e.g., IF b THEN f[x] ELSE g[y] evaluates f[x] iff b iS TRUE and g[y] iff b iS FALSE.

Application is denoted in programs by expressions of the form fiarg, arg. mi. If the value off is a closure, this expression is evaluated by evaluating f and all the arg's, and then evaluating the body of the closure with the formal parameters bound to the arguments (unless an exception value turns up; see § 2.6.2). Thus to evaluate (A [x: INT] IN x+3)[4]:

evaluate the A-expression to obtain a closure;

evaluate the argument 4 to obtain the number 4; evaluate x+ 3 with x bound to 4 to obtain the number 7.

The first two evaluations can be done in either order (with different results in general, though not in this case).

To evaluate a primitive application such as x+3, evaluate the arguments, and then invoke the primitive on those arguments to obtain the result and any state change. With a few exceptions (e.g., assignment and dereferencing or following references), primitives are functions and can be thought of as tables which enumerate a result value for each possible combination of arguments. Invoking a primitive can therefore be viewed as a simple table lookup using the arguments as the table index.

Actually there may be one more step in an application. If an argument doesn't have the type expected by the proc, the argument is coerced to the proc's domain type if possible. If no coercion can be found, there is a type error. Coercion is discussed further in § 2.6.1 and § 4.13.

Most procs take a binding as argument, in which the various parts of the argument are named. E.g., OpenFile: PROC[name: ROPE, mode: Files.Mode] takes a binding with two values named name and mode. It might be applied like this: OpenFile[name–"Budget.memo", mode–$read]. If the names are missing, there is a positional coercion which supplies them left-to-right, see § 2.3.6. There is also a defaulting coercion that supplies missing parts of the binding: see § 4.11.

If f is neither a primitive nor a closure, the meaning of applying it is defined by the APPLY proc for its type: this case is discussed further in § 4.4.

There are many ways of writing applications other than f[x]. In fact, many Cedar primitives cannot be the values of expressions. and can only be applied by writing some other construct. The desugaring rules show how large parts of the Cedar syntax denote various special kinds of application. In each case. the meaning is defined by the standard meaning of application and the specific meaning of the primitives involved: see § 4.1.

This is partly because of history. and partly because specialized syntax makes the program more readable. Future evolution of the language will improve the situation.

Functions and order of evaluation

An expression is functional if

its value does not depend on the state, but only on the values bound to its free names, and evaluating it does not change the state.

As a consequence of this definition,

Two identical functional expressions in the same scope will always have the same value.

Note that a functional expression must not depend on values contained in variables bound to its free names. Thus. v.VALUEOF is not functional.

A proc is a function if every application of it is functional. It doesn't matter when or how many times a function is applied: the order of evaluation doesn't matter for functions. Thus Cedar functions can be thought of as mathematical functions for many purposes. Note that a constant can be regarded as an application of a function of no arguments.

Non-functional procs, on the other hand, are more complicated objects. Cedar makes no formal distinction, either in syntax or in the type system, between functions and procs. However, it does not define the order of evaluation in an expression, except that:

all arguments are evaluated before a proc is applied:

because of the desugaring of IF, SELECT, AND and OR into A-expression. the order of evaluation for these expressions is determined by the first rule:

statements separated by semi-colons are evaluated in the order they are written.

As a consequence, two applications of non-functions should not be written in the same statement unless they don't affect each other: if this is done the effect of the program is unpredictable.

An expression is guaranteed to be functional if it only applies functions: thus if f is a function, p a non-functional proc, and x a variable, f[3] is functional and p[3] and p[ x] may not be. Furthermore. f[x] may not be functional, because it is sugar for f[x.vALuEoF], and VALUEOF is not a function. The value of a A-expression is a function if its body is functional. There are more complicated ways of guaranteeing that an expression is functional, just as for any other interesting property.

Because the values of variables constitute the state. it is only the existence of variables that allows non-functional procs to exist. In particular. the VALUEOF proc which returns the value of a variable is non-functional (because its result depends on the state), and the ASSIGN proc which changes the value of a variable is non-functional (because it changes the state).

2.3.2 Values

A Cedar program manipulates values. Anything which can be denoted by a name or expression in the program is a value. Thus numbers, arrays, variables, procedures, interfaces, and types are all values. In the kernel language, all values are treated uniformly. in the sense that each can be:

passed as an argument, bound to a name, or returned as a result.

These operations must work on all values so that application can be used as the basis for computation and A-expressions as the basis for program structure. In addition, each particular kind or type of value has its own primitive operations. Some of these (like assignment and equality) are defined for most types. Others (like addition or subscripting) exist only for certain specific types (numbers or arrays). None of these operations. however, is fundamental to the language. Formally. assignment or equality has the same status as any operation on an abstract type supplied by its implementor: thus INTEGER.ASSIGN has the same status as 10.GetInt. In practice, of course, special syntax is usually used to invoke these operations, and the implementations are not Cedar programs open to inspection by the editor or debugger. A complete description of the primitives supplied by the language can be found in Chapter 4, organized by the type of the main operand. Table 4-5 is an alphabetized index of these descriptions.

Restrictions on types, declarations, bindings and unions: In current Cedar, however, there are restrictions on values which are types, declarations or bindings: they can only be arguments or results of modules, and hence are first-class values only in the modelling language, and not within a module. Also, declarations and bindings cannot be constructed or bound to identifiers within a module. Unions are also restricted: they can only appear inside records. Nonetheless, it is simplest to emphasize the uniform treatment of all values, and consider separately the restrictions on types, declarations, bindings and unions. Future evolution will improve this situation.

Restriction on dot notation: In current Cedar you can only use dot notation for some operations of built-in clusters: the procs which access record fields, and others as noted in Table 4-5. As a substitute, there are various syntactic forms which are sugar for dot notation: infix, prefix and postfix operators, built-in functions, and funny applications. These desugarings are given in rules 20-24 of the Cedar grammar in § 3.

2.3.3 Variables

Certain values, called variables, can contain other values. A variable containing a value of type T has type VAR T. If the variable doesn't allow the value to be changed. the type is READONLY T: this is not the same as T, because there may be a VAR T value which is the same container. The value contained by a variable (usually called the value of the variable) can be changed by assigning a new value to the variable. The set of all variables accessible from the process array constitutes the state of the computation: these are all the variables which can be reached from any process. and a variable which cannot be reached cannot affect the computation. Note that a variable value is a container, which like all values is immutable; it may help to think of it as (the address of) a block of storage. The contents of a variable can be changed by assignment. Thus the value of a variable can change. even though the value that is the variable is immutable.

A suitable abstract representation for a VAR T is a value of type [Get: []*T. Set: T>0]. This representation is not used in Cedar, but it clarifies the way in which variables fit into the type system: VAR TVAR U only if T and U have the same predicate, because the Get proc requires TAU and the Set proc requires UST. READONLY T corresponds to [Get: 0–>T] and a write-only variable type would be [Set: T– U].

There is a coercion (an automatically applied conversion: see § 2.6.1) from VAR T to T. so that a variable can be passed without fuss as an argument to a proc which expects a value.

Restriction on variables: In current Cedar, variables generally cannot be passed as arguments or results. The only exception is that an interface can declare a variable (called an exported variable) for which an implementation supplies a value; this is normally written x: VAR INT in the interface, but for historical reasons it is also possible to write just x: INT. Certain primitives (e.g.. dereferencing a REF or POINTER) return variables. a variable can (indeed, must) be passed as the first argument to ASSIGN, and a variable can be bound to a name by a declaration in a LET or block (LET x I NT.NEW IN ... binds a VAR INT value to x). For the most part, however, a program which wants to handle variables must do so at one remove, through procs or REFS (or, unsafely,


A variable is often represented by a block of storage; the bits in this block hold the representation of its value. All the built-in VAR types are represented in this way. A variable u overlaps another variable v if assigning to u can change the value of v. The primitive ASSIGN procs have the property that

if r and s are REFs. then rr overlaps st iff r= s.

For any variables u and v with the same VAR type, u overlaps v iff u= v, provided that no unchecked program has given overlapping blocks of storage to the two variables (if u and v have different types, one might be contained in the other).

The role of variables in non-functional expressions is discussed in § 2.3.1.

2.3.4 Groups

There is a basic mechanism for making a composite value out of several simpler ones. Such a composite value is called a group. and the simpler values are its components or elements. Thus [3, x+ 1, "Hello"] denotes a group, with components 3, the value of x+ 1, and "Hello". The main use of explicit groups is for passing arguments to procs without naming them (these are sometimes called positional arguments). This is done by binding the group to the declaration which is the domain type of the proc; the result is a binding which is the argument the proc expects. Thus, with P: [x: INT, y: R EA L]--q. . .1, the application P[2. 3.14] is sugar for P[ [x: INT, y REAL]-'[2, 3.14] ], which is equivalent to P[x-'2. y-3.14].

A group has a type which is the cross type of its component types: if x has type T and y has type U, then [x y] has type TX U. Thus for syntactic types, V[ei, e2.                                                 ]=Ve/XVe2X ... The X type

constructor is associative, and type implication (§2.4.2) extends to cross types elementwise. If the T

are types, there is a coercion called MKCROSS from [T1. T2. ...] to T1X T2X ...; because of this, the

explicit cross type is usually not needed.

Restriction on cross types: Current Cedar provides no way of making cross types except as domain and range types of a proc type (or other transfer type); e.g.. PROC [INT, REAL]+[BOOL, ATOM]. There are no procs taking groups except the group-to-binding coercions. Hence the only thing to do with a group is pass it to one of the built-in coercion procs by writing it as a proc argument or to a record or array constructor as described in the next section. Current Cedar does not have X, but it does have the MKCROSS group to cross type coercion described in the last paragraph and illustrated in the example.

2.3.5 Bindings

A binding is a group in which each component has a name. Thus, it is an ordered set of [name, value] pairs. There are three main uses for a binding:

• As an argument in an application. Thus, if P is a proc with type PROC[i: INT, b: BooL], its argument must be a binding such as [1-3, bTRUE]. The application then looks like this: P[i-3, bTRUE]. A binding argument is sometimes called a keyword argument list. See the next section for details.

·    In a LET expression, to give names to values in the scope of the LET. Thus,

LET 1-3. bTRUE IN (IF b THEN 1+5 ELSE 0)

has the value 8. Current Cedar doesn't have LET expressions. but a binding at the beginning of a block has the same effect. See § 2.5.4 on scopes for details.

·      As a way of collecting and naming a set of related values. A value can be extracted from the set using dot notation. Thus if b is the binding [1-3, bTRUE], the value of b.i is 3. In current Cedar this only works for interfaces; see § 3.3.4 and § 4.14 for details.

A binding is usually denoted by a constructor, which takes the form

[1-3, bTRUE]

or redundantly (if there are no coercions)

[i: INT-3, b: BOOLTRUE]

in which the types are specified explicitly (but you can't write the second form as the argument of an application). See § 2.5.5 on constructors for details.

2.3.6 Arguments

When a group or binding is bound to a declaration (d–v), there are various conversions called coercions which may be applied to the values. This usually happens when the arguments of a proc application are bound to the parameter declaration.

First, if v is a group rather than a binding, it is coerced to a binding by attaching the names from d to the elements of v in order. Thus in

[a: [NT, b: REAL]–[2, 3.14]

the group constructor is coerced to [a-2, b-3.14].

Next, if v is shorter than d, elements of the form nOMITTED are appended. where n is the corresponding name from the declaration. Thus in

[a: [NT, b: REAL][2]

the group constructor is coerced to [a-2, bOMITTED].

Now the items of the binding are matched by name with the items of the declaration. There is an error unless the names match exactly. The remaining coercions are done on individual items. n: from the declaration and the corresponding n–v from the binding. If v has a type implying 1. all is well. Otherwise, if there is a sequence of coercions from the type of v to 1. these are applied to v. If no such sequence exists, there is an error. In particular, there is a coercion from OMITTED to the default value for :, if any. Thus in

[a: INT -0, b: REAL4-1.1][b-3.14]

the group constructor is coerced to [a-0, b-3.14], and in

[a: INTO-0, b: REAL4-1.1]–I]

it is coerced to [a-0,                                   Coercions are discussed in § 2.6.1 and § 4.13, defaulting in § 4.14.

An important special case is constructors for record and array values. A record type has a construction proc; e.g.,


has a proc R.CONS of type PROC[a: INT, b: REAL€0.]—)[R]. Thus R.CONS[a-2, b-3.1416] constructs a record value. There is also a coercion from BINDING to the particular declaration RB which is the domain type of R.CONS, so that

rl: R4-[a-2, b-3.1416]

is short for

rl: R4- R.coNs[a-2, b-3.1416].

Composing the positional coercion from GROUP to RB with R.CONS makes

rl: R<-[2.3.1416]

also short for the previous line.

The same scheme works for arrays. but only an array indexed by an enumeration has a corresponding binding which can be written; the elements of an array indexed by numbers don't have names which can be written in a binding. However, the group constructor still works.

2.4 The type system

This section describes the way in which types can be used to make assertions about the program which the compiler can verify. It also discusses the role of types in organizing the names of the program.

2.4.1 Types

Types serve two independent but related functions in Cedar:

·      A type contains an assertion about some property of a value, e.g.. that it is a whole number between 0 and 10 represented in a single machine word. A value which has the property is said to be of that type, or to have that type.

The assertion part of a type is called its predicate. It is a function which accepts a single value (of any type) and returns TRUE iff the value satisfies the assertion. In principle the predicate can be applied to any value at runtime, but in practice a lot of optimization is done by the compiler.

·      A type contains a collection of named procs (and perhaps other values) related in some useful way. Most often, the procs of type T take a value of type T as their first argument. For example, INT has PLUS, TIMES and MINUS procs (usually written as infix or prefix operators) which can be applied to INTS. The dot notation (see § 2.4.4) makes it easy to refer to the procs in a type's collection.

The collection part of a type is called its cluster. It is simply a binding. No rules are enforced about what kind of values are in the binding. However, the idea is that the cluster is an interface for manipulating values of the type (perhaps the main or even only interface). As with any interface, a tasteful choice of names and values is important.

The predicate and the cluster serve rather different purposes:

The predicate provides the basis for type-checking (§ 2.4.2). The most important function of type-checking is to guarantee the integrity of abstract data types: this is done with basic predicates called marks (§ 2.4.3).

The cluster provides the basis for convenient naming of a large collection of procs and other values (§2.4.4). Clusters are organized into a hierarchy of classes (§ 2.4.5).

Like everything else which can be named, a type is a value. Hence there is nothing special about binding a type value to a name. If T is a type expression, the binding


binds Ts value to U. In the scope of U, T and U are completely interchangeable (provided T is not rebound). Furthermore, with two exceptions, all type expressions are functional: identical type expressions in the same scope denote the same type value. The exceptions are the record and enumeration type constructors, which make a distinct type each time they are used (by constructing a new mark: see § 2.4.3).

Restriction on uses of types: Current Cedar has a number of restrictions on the use of TYPE values, given in §4.8.

2.4.2 Type predicates and type-checking

Type predicates provide a way of making assertions in the program which can be checked mechanically. These assertions take the form of declarations for the formal parameters of procs. In general the checking must be done during execution. Thus, if the program says

a: ARRAY [0..10] OF INT4-ALL[0]:

is INT4- s.Readl ni:

s.PutF[ a[i]]:

there must be a check that i>0 and 1<10 just before the expression a[i] is evaluated. This is called a bounds check: if it fails there is an exception called Runtime.BoundsFault. Where did this check

come from? Note that aid is short for Va.APPLY[a. d. and Va.APPLY is SUBSCRIPT, the subscript procedure for ARRAY [0..10] OF INT. The type of SUBSCRIPT iS PROC[array: VAR ARRAY [0..10] OF INT, index: [0..10]1—qvAn INT]. So when i is passed as the index argument. the declaration of SUBSCRIPT says it must have the type [0..10]. The predicate for this type is

[x: ANY] IN HASMARK[x, INT] AND LET yNARROW[x. INT] IN y>= 0 AND ye= 10.

Leaving the HASMARK term for later discussion. we see that the rest of the predicate is the same as the bounds check.

The type system is designed, however, so that most assertions can be checked statically (i.e., proved), by examining the text of the program without running it. Static checking has three obvious advantages:

It reports any errors after a single examination of the program, leaving none (of this kind) to be discovered later in Peoria.

It introduces no cost in time or space for run-time checking.

The compiler can take advantage of the assertions to generate better code.

Of course, there is a corresponding drawback: the assertions made by parameter declarations must be simple enough that the compiler can reliably prove or disprove them.

The proofs done for type-checking have exactly the same form as program correctness proofs based on preconditions and postconditions. Consider a proc whose value is the A-expression


The domain declaration [x: 7] is a precondition for the body e. This means that any application of the proc must satisfy this condition. As a consequence, the body e can be analysed on the assumption that the precondition holds, i.e., that x has type T. Similarly, the range declaration [y: U] is a postcondition for the body. This means that given the precondition, any evaluation of e must produce a value y which has type U. In summary, for the body we assume the precondition and must establish the postcondition.

To make this hang together, each application must establish the precondition; this means that the argument must have the domain type. In return, the application can assume the postcondition; this means that the result of the application has the range type. Thus we have a linkage:

argument domain range result

The result in turn will be the argument of another application. In this way the proof is extended to larger and larger expressions, and finally to the whole program. In summary:

Application —                  establish pre-condition: arguments have the domain type;

rely on post-condition: results have the range type.

Body —                              rely on pre-condition: parameters have the domain type;

establish post-condition: returns have the range type.

These proofs require showing that an expression always has a particular type T. This is done by observing that every expression has a unique syntactic type U, which is the type of every evaluation of that expression; e.g.. an application always has the range type of its proc (see below for a more detailed discussion of syntactic type). If every value of type U has type T. we are done. Hence the usefulness of type implication. One type implies another. iff (V x) T[x] U[x]; sometimes we say that T is a sub-type of U. If two types are equal, each implies the other. However, there are many other useful cases of implication. For instance. VAR INT implies READONLY INT. The type implications in current Cedar are given in § 4.12.

Of course, not all arguments are applications. The kernel grammar gives the other possible forms of argument expressions, and we enumerate the proof rules for each:

A literal is like a zero-argument proc: it has a known range (e.g.. 3 has type INT, 'A has type CHAR).

A name has the type specified in its declaration or binding.

If there is only a declaration n: T (e.g., x: INT), it must be the domain declaration

of a A-expression, and we have already seen how to ensure that the n's value has type T when the resulting proc is applied.

If there is a binding n: T—e for the name (e.g.. x: INT-3), we must check that e has type T.

A A-expression A [x: 7]=>[y: U] IN e has the type [x:                 U]. This works for the reason
discussed in the next paragraph.

A binding constructor [x—e. y—A has the type of the corresponding declaration. [x: V e. y: VA.

There is one more link in the chain. An application f[x] has an arbitrary expression for f not necessarily a A-expression. The requirement is that f must have a proc type. say D—4R: D is the domain type and R the range type. Since the type of A D=>R IN e is D-.R, satisfying the precondition D for the application is the same as satisfying the precondition D for the A-expression, and similarly in reverse for the postcondition. The value of f may be a primitive rather than a closure obtained from a A-expression. In this case, the implementation of the primitive can still depend on the precondition and must still establish the postcondition, but since the implementation cannot be examined (within the framework of Cedar) we can say nothing about how this is accomplished. Example: INT.PLUS, which is implemented by the machine's 32-bit add instruction.

In a proc type D—*R, D and R may be declarations which provide names for the arguments and results. In general. the expression R may include names declared in D. The range type of an application then depends on the argument values.

Restriction on dependent proc types: In current Cedar only a module has a type whose range depends on its argument values; the type returned by an interface, or the interfaces exported by an implementation, may depend on the interface and implementation parameters.

As a by-product of the type-checking proof rules just given, a syntactic type is derived for every expression e in the program. It is denoted by De. and computed as follows:

for a name, the declared type; for a literal, its type;

for an application, the range type (which may depend on the argument);

for a A the obvious proc type;

for a binding constructor. the declaration obtained by pairing the names with the syntactic types of the value expressions.

Typechecking ensures that whenever e is evaluated, the resulting value will have type De (though it may have other types as well, i.e., it may satisfy other predicates). The main use of syntactic types is in connection with dot notation (see § 2.4.4).

In order to carry out the proofs described above, the compiler must either compute the values of all types, including those denoted by complex expressions such as ARRAY [i...j] OF INT, or it must be able to prove the equality of unevaluated type expressions. For the most part. current Cedar requires the former approach: hence a type expression must have value which the compiler can compute. Such a value is called static: the rules for static values are given in § 3.9.1.

2.4.3 Marks

By this point you may have thought of asking why the assertions provided by type predicates are worth all this fuss. The reason is simple: they are the basis for authenticating values of an abstract type. so the implementation can be sure that it is working on properly formed values. Suppose you are the implementer of an abstraction. e.g.. Table. You provide operations to Lookup a key in the table, to Insert a [key. value] pair, and to Enumerate the items in the table. A Table is implemented as a REF to a record containing a sorted array a of items and an INT n which gives the number of

items. Lookup is implemented by binary search. All three operations are programmed on the assumption that elements 0 through n-1 of a are sorted, and that n is smaller than the size of the array. They will not work properly if these assumptions are not satisfied, and indeed they may try to subscript the array with an out-of-bounds index or to violate other requirements of the abstractions they depend on.

Here is a lower level, but perhaps more dramatic example. The dereferencing operation t for a REF REAL returns a VAR REAL, which can, for instance, be assigned to, as in the program fragment


rt. <- 3.14159

A REF REAL is represented by the address of a four-byte block of storage which holds a REAL, and the assignment to rt stores the four bytes which represent 3.14159 into that block. If somehow a REF BOOL finds its way into r. the assignment will still store four bytes, since it doesn't know any better. But the REF BOOL points to a two-byte block: the other two bytes that will be modified belong to some unrelated variable, which will be clobbered without warning.

The second example is scarier because the consequences of the bug seem more unpredictable. In both cases, however, the fundamental problem is the same: even if the implementation is correct, the wrong thing happens because it is given an improper value to work on. Or to make the same point in different words, the implementation cannot be held responsible for bad results from one of its operations, if it has no control over the validity of the arguments it receives.

So that the implementation of an abstraction can take responsibility for correct operation. there must be a way to authenticate a value of the abstract type. In Cedar this is done by placing a mark on the value: think of it as a little flag stuck into the value. The mark uniquely identifies the abstract type, and authority to affix it is under the control of the implementation. A correct implementation will mark only values which have the properties needed for a representation of an abstract value, and if no one else can affix the mark, the implementation can be sure that every value with the mark has the desired properties.

A mark can be thought of as an abbreviation for an assertion or type invariant which characterizes a proper abstract value, such as Table or REF REAL. Such an assertion can be quite complex. In the Table example, it would say that the representation is a record of the proper form, that n is less than the array size, and that the first n array elements are sorted. In the REF REAL example, it would say that the address points to a block of storage such that at least the first four bytes don't overlap any other blocks. Such assertions are not easy to write down formally, and proving them is certainly beyond the power of any existing program. So the abbreviations are not a mere convenience, but a necessity.

A new mark can be created on demand by the primitive


The primitive HASMARK tests a value for the presence of a mark, so HASMARK[x. m] tests x for the presence of the mark tn. Affix adds the mark to a Rep value.

Restriction on marks: MARK, UNIQUEID, CREATEMARK, HASMARK and TYPEFROMMARK are not accessible in current Cedar. Record and enumeration type constructors provide some access to CREATEMARK, as described below. The 1STYPE primitive, also described below, is closely related to


With these facilities, it is easy to create a new abstract type. Choose its representation type, and obtain a new mark m. TYPEFROMMARK[m] with an appropriate cluster added is the new abstract type. The implementation must use Affix to mark only values which satisfy the properties it demands.

The type returned by TYPEFROMMARK[m] has the predicate

A [x: ANY] = >[BOOL] IN HASMARK[x. m]

and an empty cluster. Except for subranges and bound unions, all types in current Cedar have a predicate of this form. The built-in types (INT. BOOL etc.) come with such predicates, and the built‑

in type constructor procs (ARRAY, RECORD etc.) obtain a mark from CREATEMARK. So that two invocations of ARRAY [0..10] OF INT will produce the same type, ARRAY and most of the other constructors use a canonical encoding of the constructor and its arguments for the UNIQUEID, and hence are functional. RECORD and ENUMERATION produce a different type each time they are invoked, so they obtain fresh unique identifiers. Since the program cannot invoke CREATEMARK directly. we need not explain how to prevent forgery of UNIQUEIDS. Future versions of Cedar will address

this problem.

In current Cedar you make a new abstract type by declaring it as an opaque type in an interface:


This generates a new mark, and declares T to be a type which has that mark. You get such a type by explicitly painting some other type, normally in an implementation which exports T to the interface which declared it:


See § 4.3.4 for more details.

The implementation actually stores a mark with each variable allocated by NEW. Such a variable can be referenced by a REF. and in particular by a REF ANY value. The type of a REF ANY value can be tested at runtime using the primitive


If V e is REF ANY and RT= REF T, then the value of ISTYPE[e, is TRUE iff the predicate for T just tests for mark m. and xt has the mark VAR m. ISTYPE is described in detail in § 4.3.1. along with the WITH ... SELECT construct and the NARROW primitive, which are more powerful operations built up from ISTYPE.

For other values, there is no mark actually stored: instead, types must be computable statically using the methods described in the last section. The AMTypes interface, however, gives a way to refer to any value in a uniform way, and to test its type at runtime.

There is only room for one mark on a variable, and this must encode all the marks that the value actually carries. We arrange for this by imposing a partial order on the marks, and requiring that:

The set of marks on a value must have a maximal element.

Every mark smaller than the maximal one must be on the value.

With these rules, a single mark stored on the value is enough to code all the others. In current Cedar. a value actually has only one mark, since:

The only way to create a new mark is with the record or enumeration type constructors, or by declaring an opaque type.

When you paint a type T with the mark of an opaque type, T must be a record or enumeration type, and the opaque type mark replaces the mark it had before.

Note that VAR T, READONLY T and T are different types with different marks, although VAR TREADONLY T, and there is a coercion VALUEOF from either one to T.

2.4.4 Clusters and dot notation

It is convenient to associate with a type the procs supplied by its implementor for dealing with values of the type. This is done by putting these procs into the type's cluster. The cluster is simply a binding which is part of the type value (the predicate is the other part). There are no rules enforced about what goes into the cluster. However, there is a special dot notation which makes it desirable to populate Ts cluster with procs which take a T as their first argument. The usual effect is like this: Ln is sugar for Vt.n[1], and t.n[other args] is sugar for V t.n[t, other args].

For example, if t has type T, and a proc [T, INT]-0[Boot.] is in Ts cluster under the name P, then

the proc can be applied by an expression like t.P[3], which is sugar for V' t.P[r, 3]. The name P is looked up only in Ts cluster, not in the current scope. If Q: [T]–[INT] is also in the cluster, it can be applied with t.Q. which is sugar for V t.Q[d.

The general rule that makes this work is the following: t.n is sugar for LooKuPc[Vt, $n][t]. LOOKUPC[Vf. $n] is just V t.n. except that if Vr.n is a proc that takes several arguments. it is split into a proc that takes the first argument and returns a proc taking the remaining ones. Thus LookuPc[Vt. $ n][t] will be a proc taking the remaining arguments. and tifi[other args] = LOOKUPC[V r. $n][t][other args] will be the same as V t.n[t, other args].

Dot notation can also be used to obtain values from a binding or from the cluster of a type without any application: T.P would be the proc named P in the previous example. The possible cases of dot notation in current Cedar are described in detail in § 4.14.

Restriction on constructing clusters: There is currently no way to explicitly construct clusters. The built-in types and type constructors have clusters: they are described in detail in § 4. In addition, there is a clumsy way to provide a cluster for an opaque or record type in an interface: every proc name in the interface is put into the type's cluster. For a record. the procedures supplied by the record constructor are also in the cluster. and they win if there are name conflicts. There is one of these clusters for each type in each imported interface value: if a module imports more than one value of the same interface, however, there are severe restrictions (see § 3.3.3).

2.4.5 Declarations

A declaration is the type of a binding. Thus, the binding [x-3, y-3.14] has the type [x: INT. y: REAL]. All the relationships among types, and between types and values, are carried over elementwise to decls and bindings: the elements are matched up by name rather than by position. A decl itself simply has the type DECL.

A decl is made up of two parts: the names or pattern, and the types. The basic operation for making decls. MKDECL, takes a pattern and a type. Thus MKDECL[ PATT[x. y], INTXREAL]= [x: INT, y: REAL]. In general, a pattern is one of NIL, a simple name, or a pair of patterns. just like a Lisp S-expression. Similarly, a type argument to MKDECL is one of NIL, a type, or a cross type. The type must decompose in a way which matches the pattern. Normally. as in Lisp, we deal only in flat patterns. where the first element of a pattern is always a name. Such flat patterns are conveniently denoted by constructors of the form [x, y. ...]. The reason for defining things in terms of pairs is that it makes it much simple to write down precise rules for the semantics, using structural induction on the values.

The main use of a decl is to type-check a binding. The basic binding constructor is MKBINDD[d, e], where d is a decl and e is matching group or binding. If e is a-binding. then its structure and names must match the structure and names of d, and each element of e must have the type demanded by the corresponding component of d, after a possible coercion. Thus MKBINDD[[x: INT, y: REAL]. [x-3. y-3.14]1= [x-3, y-3.14]. This may seem pointless, but it has two important uses:

Such a binding is used to bind the argument of a proc to the domain declaration. Even though the resulting binding is the same as the argument, the type-checking is essential.

There may be coercions involved, so that the resulting binding is not the same. Coercions on the component values are discussed in § 2.6.1. There are also coercions on the binding itself, which can default missing elements: these are discussed in § 2.3.6.

If e is a group, it is first coerced to a binding by attaching the names from the decl, as discussed in § 2.3.6. Thus in MKBINDD[[x: INT, y: REAL], [3. 3.14]] the second argument is coerced to [x-3. y-3.14], and things then proceed as before.

Bindings may also be used in LET expressions. Here the types are often redundant, and it is better to use the MKBINDP primitive to bind the value directly to a pattern. The syntactic type of the result is the decl whose type is the syntactic type of the value. Thus [x-3. y-3.14] is short for

MKBINDP[PATT[x.         [3. 3.14]]: its syntactic type is MKDECL[[x. y]. V[3, 3.14]]= MKDECL[[x. y].

A decl D in a block is interpreted somewhat differently. It becomes the argument of the NEWFRAME primitive. which turns the type of the decl D.T into the corresponding vAR type VT= D.T.MKVARO, allocates a new value v of type VT, and makes the binding MKBINDP[D.P, v] over the scope of the block. Thus

lx: INT: y: REAL: S}



Here D=[x: INT: y: REAL], VT=[VAR INT, VAR REAL], and v= [VAR INT, VAR REAL].NEW. Note that

the types might have defaults, which are used to initialize the variables as part of the NEW operation.

Actually this is a bit oversimplified, since NEWFRAME has to separate the bindings in the block from the decls. construct the variable binding just described from the decl. and then combine it with the binding from the block. Thus

ix: INT: y: REAL; z: BOOLTRUE: S}



or more readably


Anomaly about uninitialized names or variables: In Cedar the names in a block are introduced recursively, so that the d's and b's can refer to each other. It is possible for a binding or type to refer to a value which has not yet been initialized, with undefined results. See § 3.4.1 for a further discussion of this point.

2.4.6 Classes

Another important use of a declaration is to characterize the cluster of a type. Since the cluster is just a binding, it is characterized by its type, which is a decl. When used for this purpose, a decl is called a class. See § 4.1 for further discussion of classes, and an enumeration of the primitive classes of Cedar.

2.5 Programs

This section describes how meaning is assigned to kernel programs.

2.5.1 Structure of programs

A kernel program is an expression, which is either atomic (a name or literal), or is an application which involves sub-expressions: the proc being applied, and the arguments. The concrete syntax treats certain kinds of expressions specially: modules, blocks (which introduce new variables and return no value), and statements (which return no value). All desugar into simple expressions, however, and are treated identically in the kernel.

2.5.2 Names

A name is a part of a program which usually serves to denote a value. There are two contexts in which the occurrence of a name n denotes a value:

It may occur as an expression. Then n denotes the value bound to it in the scope in which the expression appears (see § 2.5.4 for details).

It may occur after a dot. as in en. Then the expression e.n denotes the binding for n supplied by e (see § 2.4.4 and § 4.14 for details):

the value bound to n in e, if e is a binding:

the value bound to i? in the cluster of e. if e is a TYPE; roughly (V e). n[e] otherwise.

There are also two defining contexts for a name n (see § 2.5.5 for details):

It may occur before a – in a binding constructor, as in n--e. The value of e is the value bound to n in the binding denoted by the constructor (see § 2.3.5 for details).

It may occur before a : in a declaration constructor. as in n: t. The value oft is the type of n in the declaration denoted by the constructor (see § 2.4.5 for details).

These constructors are usually recursive in Cedar: that is. the expression n elsewhere in the constructor denotes the value bound to n in that constructor: see § 2.5.6 for details. In the kernel they are non-recursive unless preceded by REC.

A name is not a value, but there are values of type ATOM which are related to names. An atom has a print name which is a rope (an immutable sequence of CHARS). A name following a $ is an atom literal: $n denotes the atom with print name n. Other properties of atoms are described in § 4.5.1A.

Caution on names: Current Cedar has several complications in its treatment of names:

•In an argBinding27, n: e may be written instead of n- e. The syntactic context distinguishes this from a declaration, but this usage is not recommended.

An argBinding is not recursive: in {a-1; f[a-3. b–a+ 1]} b is bound to 2. not to 4.

The declaration in an import list is non-recursive: IMPORT M is short for IMPORT M: M. and the second M denotes its binding in the surrounding scope (i.e.. the binding supplied by the DIRECTORY). Inside the body of the module, of course, M denotes the imported parameter.

Names which appear in an enumerationTC54 are treated specially: see § 4.7.1A for details.

2.5.3 Scope

A scope is a region of the program in which all names retain the same meanings (note that many names denote variables, which can change their values in the same scope. but each name continues to denote the same variable). In the kernel there are only three constructs which introduce a new scope. X, LET and REC. In current Cedar, these are sugared in a variety of ways: modules, import lists, proc bindings, blocks, exit labels, open. iterators, safeSelects and withSelects. All have straightforward desugarings, however.

2.5.4 Constructors

The kernel has constructors, denoted [...]. to make expressions which denote group, decl and binding values more readable. There is one flavor of constructor for each class:

A binding constructor is a list of binding elements (b in the kernel syntax) of the form I,– e or el– e. The presence of the – distinguishes it from the others. Here d is a decl element (not a declaration), and p is a pattern, in which the names are being defined rather than evaluated.

A decl constructor is a list of decl elements (d in the syntax) of the form p: t. The presence of the : without any – distinguishes it from the others. Again. p is a pattern.

A group constructor is a list of expressions. Note that decl and binding elements are not expressions, although constructors are expressions.

Constructors are useful for making decls and bindings where the names are literal. This is the normal case, and in fact the only case in current Cedar. If you want to make them out of other decls, for instance to bind an expression to a decl which is the value of a name dn, you cannot use a constructor: [dn-'e] would bind the value of e to the name dn, not to the decl which is its value. You have to write the decl-constructing primitive directly: MKDECL[d, e].

The only kinds of constructor you can write in current Cedar are:

Decl constructors for proc domains and ranges, and for records and unions (fields43 in the syntax).

Binding constructors for arguments in an application, or as an expression alone if a record or array value is needed (argBinding27 in the syntax).

2.5.5 Recursion

In the kernel, you get recursive definition of names only if you write PEG (or the unsugared form Fix) explicitly. In Cedar, on the other hand, decls and bindings are normally recursive, except for argBindings and import lists.

The recursion is legal in a block or interface body (although anomalies are possible in some cases when names are used before they are defined: see § 3.4.1). In fields it is illegal.

2.6 Conveniences

The facilities described here are not fundamental, but they are of great practical importance.

2.6.1 Coercion

A coercion is a proc which is automatically applied under some circumstances to map a value of one type T (called the source) to a value of another type U (called the dest), e.g. from [O..5) to INT. Coercions are obtained from the clusters of the types involved. The coercion mechanism adds no new functionality, since the programmer could always write the applications himself, but it is important in concealing some of the distinctions made by the type system when they are distracting rather than helpful.

There is exactly one (desugared) context in which a coercion is applied: when an expression e of syntactic type T appears as an argument in an application which expects a value of type U: this means that there is a binding n: U"e. Since nearly all Cedar constructs are desugared to application, coercions are widely applicable. The only (desugared) context in which there is no coercion is for the first operand of dot, since in that case the cluster of the operand is used to interpret the name which is the second operand. Thus in the expression e.n, it is always V e. the syntactic type of e, that is used to look up n, regardless of the fact that this expression may appear as an argument to a parameter of type U. If e is not a type or binding, however, then e.n desugars to P[e], where P= LOOKUPC[V e.Cluster, $n], and in the application of P, e does appear as an argument and can be coerced. Usually the cluster for T is set up with procs which take an argument of type T, so the domain of P is De and no coercion happens. This isn't always true, though: a subrange T of INT inherits the arithmetic procs of INT, for example, and there is a coercion from T to INT when PLUS is applied.

If it is sometimes natural to think in terms of a coercion from T to U that is implemented by the identity function. In fact, implication is stronger than that, since it propagates through many type constructors, including PROC, while coercion does not. Implication is discussed in § 2.4.2 and § 4.12.

There is a rather general rule for finding coercions from the clusters of types, though it is not of much practical importance in current Cedar, since there is no way for the user to define coercions. The rule goes like this. Each cluster may have a From item and a To item. T.From should consist of pairs with type [tag: ATOM, proc: T-+ U1, and T.To of pairs with type [tag: ATOM, proc: U-+71. Ignore the tags for the moment. Consider the binding n: U-'e. where V e= T, and TA U is false. For each proc P in T.From or U.To we try n: U- P[e.].

If P: T-> V is in T.From, it maps e to a value of type V, and we have to bind n: U- P[e]. If we are done: otherwise we can recurse on this sub-problem.

If P: V-0 LI is in U.To, we have to bind m:             If        V we are done: otherwise we can
recurse on this sub-problem.

The whole process fails if no path of coercion procs takes us from T to U. The search can terminate when all paths have been explored, and a particular path can be abandoned when a type appears on it for the second time. Since the search is done statically (by the compiler), and since the results of an attempt to coerce T to U can be cached, the time required for the search is not a problem.

There are two obvious difficulties with this scheme. First. it may transform erroneous applications into legal ones, by coercing an argument in ways not intended by the programmer. Second, more than one path of coercion procs may exist, and different paths may give different results. The second difficulty can be avoided, and the first minimized, if every coercion proc P is chosen so that it has a (partial) inverse, and P-1[P[x]]= x for all x in P.DOMAIN. This says that a coercion does not lose information. and that different paths give the same answer. Sometimes this is not feasible. e.g. for the narrowing coercion from INT to [0..5). The following rule gives the builder of clusters control over proliferating coercions:

If two procs on a coercion path have non-Nit. tags, they must have the same tag.

In general, coercions that don't lose information can have NIL tags, and others should have different tags.

The coercions in current Cedar are described in § 4.13. All have NIL tags. and none loses information except the subrange narrowing. Note that coercions extend componentwise to groups and bindings.

2.6.2 Exceptions

The basic idea behind exceptions is to extend the value space, so that it includes not only ordinary values, but also a set of exception values. An exception value has the special property that whenever it appears in an application, it becomes the value of the application, so that it propagates up through the control stack of the program until it finally becomes the value of the whole program. Of course this isn't always what is wanted, so there is a special HIDE construct which is not an ordinary application, but takes its argument value, ordinary or exception. and bundles it in a variant record which is a normal value. Then ordinary code can be used to test for the exception and take appropriate action. This construct is sugared to give distinctive ways of catching an exception: in the kernel with BUT (§ 2.2.4), and in Cedar with ENABLE, EXITS and REPEAT (§ 3.4.3). Cedar has two kinds of exception: GOTO labels and ERRORS, which must be raised and caught separately. and have slightly different semantics.

The main point of this treatment is that it does not require continuations or any other baroque explanation of how control is transferred to catch an exception. The view is that exceptions are simply a convenience feature: the same job could be done by returning a slightly larger result from each proc, with an appropriate status code.

An exception consists of a code and an optional argument value. The type of the code is ERROR T, where T is the type of the argument which goes with it. GOTO labels always have empty arguments. The argument is a way of passing some information along in addition to the identity of the exception.

A proper treatment of exceptions in the type system would require that each proc the exceptions that can emerge from an application of the proc. In fact, this is not possible in current Cedar.

Cedar also has signals, which historically were viewed as a kind of exception but different interpretation. as a way of obtaining dynamic rather than static scoping are discussed in § 3.4.3A.

range include all required or even

now have a very for names. They


2.6.3 Finalization

This subject is discussed in § 3.4.3A.

2.6.4 Concurrency

This subject is discussed in § 4.10. where the Cedar facilities for writing concurrent programs are given. Writing good concurrent programs, or even correct ones, is another matter, which is beyond the scope of this manual to more than hint at. Unfortunately, an adequate reference is lacking.

2.7 Miscellaneous

The different kinds of allocation are discussed in § 4.5. Static values are defined in § 3.9.1.

2.7.1 Pragmas

A pragma is a construct that does not change the meaning of the program, except perhaps to make something illegal which was legal without the pragma. Its purpose is to affect the implementation, generally by requesting optimization to favor one criterion over others. The pragmas in current Cedar are:

INLINE, which causes a proc body to be expanded inline when it is applied. See § 3.5.1 for details.

PACKED, which causes array components that fit in 8 or fewer bits to be packed, at the expense of more expensive code to access them (§ 4.4.2).

CHECKED, which forbids application of unsafe procs in a block, and adds runtime checking for some primitive procs which are otherwise unsafe—in particular, narrowing to a subrange, and assigning a proc (§ 3.4.4).

PRIVATE, which forbids access to items in an interface or instance except to modules which EXPORT (or SHARE) it (§ 3.3.6).

MACHINE DEPENDENT, which allows positions of record fields (§ 4.6.1) and representation values for enumeration elements (§ 4.7.1A) to be specified (strictly, it is the absence of MACHINE DEPENDENT that is the pragma, since the positions or representation values are legal only when it is present.)

2.8 Relations among groups, types, bindings and declarations

Cedar has are four closely related basic ways of building product values from simple values (all are given precise meanings in § 2.2.1 and § 2.2.2):

a group is simply an n-tuple of values (see § 2.3.4):

a X-type is the type of a group (if x: T and y: U then [x, y]: TX U) (see § 2.4.5): a binding is an n-tuple of [name, value] pairs (see § 2.3.5):

a declaration is the type of a binding, an n-tuple of [name, type] pairs (see § 2.4.5).

Figure 2-1 illustrates the relations among these kinds of objects. In current Cedar most of these objects can be constructed and manipulated only as interfaces and instances. In the kernel and the

modeller, all of them are first-class citizens. The primitives which go between them are defined in § 2.2.

[a: T a ea, b: Tbeb]

binding<nAKBINDD (instance)




[ea, eb]


[a: Ta. b: T b]                          [a: TYPE Ta. b: TYPE ^

Text Box: with namesBDOTD> decl           Dios      .BTOD.              binding

(interface)                                           -t.

MKDECL           t \NI                                                                         MKBINDP

1     4. DDOTP

[a. b] pattern

DDOTT                                                       4,   BDOTV

Text Box: without names>X-type      MKCROSSCTOG> group

TaXT b                                                                  [T a, T b]

types                                      types as values

Figure 2— 1: Relations among groups, types, bindings and decls

2.9 Incompatibilities with current Cedar

Most of the syntax is current Cedar is an extension (or sometimes a restriction) of kernel syntax. There are a few things that have different meanings in the kernel, however, and these are potential sources of confusion:

Type expressions in Cedar do not have the same syntax as ordinary expressions and cannot appear in the same contexts, for the following reasons:

The use of <- for specifying a default value for a type vs its use for assignment. The use of {} for enumeration types vs its use for a block.

The use of parentheses and brackets to specify subranges

·The use of adjectives for variants (red Node).

Target type overloading for union constructors ([rator-'$plus, rands—binary[...]]), and

·enumeration literals (red instead of or $red) is incompatible with the kernel's simple rules for the meaning of names.

·In addition to writing n: t—e or ne for a binding, you can also write n: t= e (in a module header or block) and n: e (in an argBinding). The most unfortunate consequence is that a Cedar argBinding can look like a kernel decl constructor!

It is now possible to avoid all the conflicting constructs except the relatively harmless ones: <- for defaults, {} for enumeration, and union constructors.

Chapter 3. Syntax and semantics

This chapter gives the concrete syntax for the current Cedar language. together with an informal explanation of the meaning of each construct, and a precise desugaring of each construct into the kernel language defined in § 2. The desugaring, together with the definitions of the kernel primitives used in it, are the authority for the meaning; the informal explanation is just for your reading pleasure. However, paragraphs beginning Anomaly or Restriction document properties of Cedar not captured in the desugaring. The primitive procs and types of Cedar are specified in § 4.

In addition to the grammar rules and desugaring, there are examples for each construct. These are intended to illustrate the constructs and do not form a meaningful program. The Cedar Manual has longer examples which do something interesting, and also illustrate the use of the standard Cedar packages.

There are several summaries which may be useful as references:

A two-page summary of all the syntax. desugaring and examples in this chapter (

A one-page summary of the full syntax (

A shorter and less cluttered summary of the syntax for the safe language; it also omits a number of constructs which are obsolete or intended only for efficiency hacking (CLRMSafeGrampress).

The chapter begins with a description of the notation (§ 3.1) The next sections deal systematically with the rules of the grammar, explaining peculiarities of the syntax and giving the semantics:

§ 3.2, rules 56-61: § 3.3, rules 1-5: § 3.4, rules 6-10: § 3.5, rules 11-13: § 3.6, rules 14-18: § 3.7, rules 19-27: § 3.8, rules 28-35:

The lexical structure of programs. Modules.


Declarations and bindings. Statements.


Conditional constructs: IF and SELECT.

§ 3.9 treats various miscellaneous topics. §4 deals with the syntax and semantics of types.

The order of the grammar rules is:

module, block, declaration, statement, expression, conditional


name, literal

and top-down within these.

3.1 Notation

This section describes the notation used in the grammar, desugaring, and commentary of this chapter.

3.1.1 Notation for the grammar

The grammar is written in a variant of BNF:

Bold parentheses are for grouping: ( interface I implementation). Item item means choose one.

?item means zero or one occurrences of item.

item; ... means zero or more occurrences of item separated by ":". The separator may also be ",",

ELSE, IN. or OR, or it may be absent. If the separator is ":", a trailing 11;1' is optional. item; !.. is just like item; ... but there is at least one occurrence.

A terminal is a punctuation character other than bold ()?1, or any character underlined, or a word in
SMALL CAPS. Note that a and {} are terminals, and do not denote optional occurrence and repetition as they do in many other variants of BNF.

The rules are numbered sequentially.

Special symbols mark constructs with special properties: t = unsafe:

· = obsolete;


*= efficiency hack.

The grammar is written so that a non-terminal never expands to the empty string. When an element of a rule is optional, that is always indicated explicitly by "?" or "..." .

The following non-terminals are so basic to the language and so frequently used, that they are represented in the grammar by abbreviations:

b= binding13

d = declaration"

e = expression 19

n = name56 (identifier)

s = statement14

t= type36

I'm afraid this means that you must learn the meaning of these six abbreviations in order to read the grammar.

With the exception of these abbreviated non-terminals, each use of a non-terminal is cross-referenced with a small superscript number59, unless the non-terminal is defined in one of the next few rules. If a non-terminal (other than e, t or n) is used in more than one rule, then all the rules that use it are listed in a comment after its definition.

Except for the entries in Table 3— 1, a terminal symbol appears in only one rule. These duplications do not lead to syntactic ambiguity. In most cases they are harmless, since the symbol has essentially the same meaning in each case, and the rules are separate only for greater readability, to highlight an unusual use of a construct, or for historical reasons. In some cases, however, the symbol has quite different meanings in different rules. These are marked on the left as follows:

·         In the rules whose numbers are marked with * the symbol has a different meaning than in the others, and confusion is quite possible. The programmer should beware.

m    In the rules whose numbers are marked with * the symbol has a different meaning than in the others, but the context is sufficiently clear that confusion is unlikely.

·    The rules whose numbers are marked with • are obsolete and should be avoided. A superscriptxn indicates that the terminal is repeated n times in that rule.






m         Text Box: expr, subrange, *position in record or enumeration constructor/built-in/funnyAppl, subrange,
application, typeName, fields, mdFields
interface body, block, enable, machine code. *enumerationTC
See note in § 3.2.


m       :


o        *

=> 4-


m    ANY


m       ERROR




m    USING

m    WITH

19. 25. *51.1. *54

19, 25, 26, 37, 43, 51

2, 6, 8, 13, *54

2.3, 6, 7, 9, 17, 27. 29,30, 32, 34, 35, 43, 51, 52

6. 8.10, 17. 27.1. 30, 33, 35

1, 2, 3, 5, •7, 11, 13,

18,     •27, 33, •34. 51. *51.1. 53

19,  37 25x4. *51.1

21, *53 21, 58

20, 21, 58

·13, 22

6, 9, 17. 31, 33, 35, 52 14, 16, 18, 21, *55

2, 3, 13, 20, *22, *27

7, 34

*9. 40, 43

*13, 23 31, 52

*19, *24. 41.1

18, 22

38x2, 45.1, 48

20. 22

14, •27, •52, •55

44, 45

29, 32, 34, 52

2, 3

*24, 41.1

27x2, 55x2

6, 13 1, *5 *32, 34

See note in § 3.2.

introducing names with types, except *51.1= position,

·7 = open, 027= argBinding 034= withSelect

dot notation for e is repeated for types

subrange, *position infixOp, *tag

infixOp, exponent

prefix0p, infixOp, exponent

·binding, infixOp

exits, enable, repeat, select choicesx4, unionTC

s, e'-STATE, iterator, e, *defaultIC interface.implementation,b,argBinding,*unary0p,*relOp open, withSelect

*enable, variableTC, fields *new exception. convert t to e

select endChoice, unionTC *expression, *funnyAppl, transferTC

iterator, relOp

cardinal/unspecified, pointer, descriptor

prefixOp, relOp

statement, •argBinding, •unionTC, •defaultTC

array, sequence

select, safeSelect, withSelect, unionTC

interface and implementation

*funnyAppl, transferTC argBinding, defaultTC block, machine code directory, *locks

*safeSelect, withSelect

Table 3- 1: Terminal symbols appearing in more than one rule 3.1.2 Notation for desugaring

The right-hand column is desugaring into the Cedar kernel language, or in a few cases into comments describing the meaning in English. This is a purely textual transformation: i.e., it is done on the text of the program, not on the values. The rewriting is done one rule at a time: a single step of rewriting involves elements from exactly one rule. The desugaring is specified by slightly informal but straightforward rewriting rules, in which:

An occurrence of a non-terminal (written in bold) denotes the text produced by that non-terminal in the grammar rule.

A I reflects a corresponding alternation in the grammar rule. ? reflects a corresponding optional item in the grammar rule, and (bold parentheses) are for grouping as in a grammar rule. As in grammar rules, literal parentheses are underlined.

Everything else is taken literally.

An underlined non-terminal in the right column means that the desugaring specified for that non-terminal must be done in order to obtain a legal program. Otherwise the transformations can be done in any order, yielding a legal program at each step.

Every occurrence of e (expression) and t (type) in the desugaring is implicitly parenthesized, so that the desugared program parses as the rewriting rule indicates. To reduce clutter, these parentheses are not written in the desugaring rules.

For type options like PACKED, the desugaring of the construct in which they appear is a call on a built-in type constructor which takes a corresponding BOOL argument defaulting to FALSE: if the attribute is present, the argument is supplied with the value TRUE.

Examples: the following rule for subranges:

Text Box: subrange :: =
( [ el '. e21 I
(( el • e2 I
Text Box: ( typeName I)
Text Box: generates these desugarings
Index [ 10 .. 20 ]
Index [ 10 .. 20 ) ( 1 .. 100 )
(typeName f INT).MKSUBRANGE ([el. e2 I e2.PRED )1 )

[e1.succ, ( e2 I e2.PRED )] )

Index.MKSUBRANGE[10. 20]


Names introduced in the desugaring are written with one or more trailing prime ("'") characters. Such names cannot be written in a Cedar program, and hence they are safe from name conflicts. The desugaring is constructed so that the Cedar scope rules prevent multiple uses of these names from being confused.

3.1.3 Notation for the commentary

Each section of the commentary begins with grammar rules, desugaring and examples for part of the language. It continues with text which explains the meaning of the constructs. Generally the meaning is fairly clear from the desugaring, and this text is short. For blocks and especially for modules, however, there are many non-obvious implications of the desugaring. and a number of restrictions: these constructs have a lot of explanatory text.

Some kinds of information are put into specially marked paragraphs, which begin with one of the following italicized words:

Anomaly: the meaning of this Cedar construct is not explained by desugaring into the kernel, but by the special rule given here.

Caution: here is an implication of the definition which might surprise you. Performance: facts about the time or space required by some construct.

Representation: the values of a data type are represented in terms of other types like this.

Restriction: a construct is not fully general. and will cause a static error unless the additional conditions stated here are satisfied.

Style: advice about good Cedar style.

Symbols written in SANS-SERIF SMALL CAPITALS are in the kernel but not in current Cedar. The superscript notation used to cross-reference non-terminals in the grammar is also used in the examples, usually to point to a rule whose example introduces a name.

3.2 Lexical structure

Text Box: -- But not one of the reserved words in Table 3 — 2.
-- INT literal, decimal if radix omitted or D, octal if B. I -- INT literal in hex; must start with digit. I
-- REAL as a scaled decimal fraction; note no trailing dot. -- With an exponent, the decimal point may be omitted. I -- CHAR literal; the C form specifies the code in octal. I
[ ('extendedChar I ' )....] Rope.ROPE, TEXT, or STRING. I -- ATOM literal.
-- Optionally signed decimal exponent.
56 name :: = letter (letter I digit)...

57 literal ::= num ?( ( Did I 13111 ) ?num ) I digit (digit IAIDICIDIEIE) ( Lib ) ?num I ?num . num ?exponent I

num exponent I

' ( extendedChar I ' I " ) I • digit !.. (CIO I " ( extendedChar I ' ) " MUDI

$ n

58 exponent ::= (Ele) ?(+ I —) num

59 num :: = digit !..

60 extendedChar :: = space I \ extension I anyCharNot"'Or\

61 extension :: = digits digit, digit3 i                            -- The character with code digit, digit2 digit.; B. I

(nIN I 1111) I (lID I (121M I               -- CR, \ 015 I TAB, '\011 BACKSPACE, '\010

OD I (111--) I' I " I \                FORMFEED, '\014 I LINEFEED, '\012 " I \


m, xl, x59y, longNameWithSeveralWords: INT: n: iNT-1 + 12D+ 2B3 +2000B

+ 1H +OFFH;

rl: REAL-0.1+.1+1.0E-1


al: ARRAY [0..3] OF CHAR'['x, '\N,                                      '\141];
r2: ROPE"Hello.\N...\NGoodbye\F": a2: ATOM '$NamelnAnAtomLiteral;

= 1+12+1024+1024 -- +1+255

= 0.1+0.1+0.1

-- +0.1


The main body of the grammar (rules 1-55) treats a program as a sequence of tokens; these are the terminal symbols of the grammar. Rules 56-61 give the syntax of most tokens. A token is:

-       A litera157. More information about literals of type T is in the section of § 4 devoted to T.

-       A name56, not one of the reserved words in Table 3-2. Note that case matters in names.

-       A reserved word, which is a string of uppercase letters that appears in Table 3-2. A reserved word may not be used as a name, except in an ATOM literal.

-       A punctuation symbol: any printing character not a letter or digit, and not part of one of the two-character sequences below. The legal punctuation symbols in programs are:

! @ # $     * — + = I( ) }                            +- t : • " . > /

The following ASCII characters are not legal punctuation symbols (and must not appear in a program except in an extendedChar6o):

% & \ ?

-   Text Box:  One of the following two-character symbols (used in the grammar rules indicated): not equal22

less than or equal22

not less than22

greater than or equal22

not greater than22

Text Box: 8. 17. 30. 31. 33. 35. 52chooses

subrange constructor25.51.I bind by name6.34

Note that Cedar uses a variant of ASCII which includes the characters                    (instead of the underbar          and t (instead of

the circumflex               ). Also, the character written — here is the ASCII minus, code 55B. and not any of the various dash or
typographer's minus characters with other codes, which are not in the standard Ascii set.




































































































































Table 3 — 2: Reserved words and predefined names

The program is parsed into tokens by starting at the beginning and successively taking from the front the longest sequence of characters which forms a token according to the rules above, after first discarding any number of initial whitespace characters or comments.

The whitespace characters are space, tab, and carriage return. A Tioga node boundary is also treated as a whitespace character.

A comment is one of:

A sequence of characters beginning with --, not containing -- or a carriage return. and ending either with -- or with a carriage return.

A Tioga node with the comment property.

Note that whitespace and comments are not tokens, but may appear before or after any token: they are token delimiters, and hence cannot appear in the middle of a token. Whitespace and comments thus do not affect the meaning of the program except:

When they delimit a token.

Within a CHAR literal or a ROPE literal, where they are taken literally. Thus • is equal to '\040, and "I

am --not--" is equal to "I\Nam --not--" and different from "I\Nam ".

Both reserved words (Table 3-2) and most names with predefined meanings (Table 4-5) are made up entirely of upper case letters. All are at least three characters long except the following:


Caution on use of reserved words and predefined names: They should not be rebound by the program: in some but not all cases the compiler forbids their rebinding.

A note on lists of items and their separators. In general, semicolons are used to separate statements. or slightly larger constructs that contain statements. Commas are used to separate the items in all other kinds of lists. Precisely:

Semi-colons are used to separate declarations, bindings and statements in a bodylO, and to separate choices in a select statement

29.32.M or in an exits6. 17 or enable& 27.1.

Commas are used to separate declarations in fields43.51 (i.e., in a proc domain or range, a recordTC or a unionTC), bindings in an application27 or an open7, choices in a select expression29.32.34 or in a unionTC52. expressions in a choice6.9.17. 30.35, 52, items in imports. exports or shares lists2.3.

In general these lists may be empty, and an extra separator at the end is harmless when there is some kind of closing bracket, except when the sequence is bracketed with U.

The braces { } which delimit a block6, interface body2, choices in an enable, or MACHINE CODE body13 may be replaced by BEGIN and END reserved words. BEGIN replaces "{" and END replaces "}". If one brace is replaced, its matching partner must also be replaced. The braces delimiting an enumTC54 may not be replaced by BEGIN and END.

3.3 Modules

module ::= DIRECTORY (nd (: TYPE (n, 1 ) 1 )

?(USING [ nu. ... ) ),

( interface I implementation )

2 interface :: =                : ?CEDAR DEFINITIONS

?locks (imports I) ?.(SHARES ne ...) ?eaccess12 { ?open7 (d I b):         .

3 implementation ::= rim : ?CEDAR

?safety ( PROGRAM ?drType42

MONITOR ?drType42 ( I locks)) (imports I )

?(EXPORTS ne, ...)

?.(SHARES ns,

?*access12 block .

3.iimports ::= IMPORTS ( (fly : I ) n, ), ... --In 2, 3.

4 safety ::= SAFE I UNSAFE --In 3.41. Stocks ::= LOCKS e?( USING nu: t)


A [ (nd : ( (TYPE nt I TYPE Ild) I TYPE nd)),            ] IN

LET (nd•RESTRICT[n [$n       ] )

IN ( interface I implementation )

LET r'       [ nm: INTERFACETYPC $rim, ...]] ] IN (imports I X = >r) IN

-- SHARES allows access to PRIVATE names in ne LET REC nm–open [ ?(1'–locks, ) (d I b),             IN tin,

LET     [(ne: ne) , , FRAME: TYPE nni


IN (imports I X = >1.1) IN



[ (ne--.BINDDFROM[ne, b' PLUS nmb`.nm] ), ,


nm–b' , CONTROL–b%n in] where the block body is desugared: [ (d I b),        rim: PROGRAM drType–{s:

[(ni:nr), ...]= >r' IN LET [((nv I ni)– (n, PLUS nf.BINDING)     ]

X ?( [nu t] )  e



Rope: TYPE USING [ROPE, Compare],

CIFS: TYPE USING [OpenFile,Error,Open,read]. 10: TYPE 10Stream,

Buffer: TYPE;

-- For Bufferlmpl below.

-- There should always be a USING clause -- unless most of the interface is used -- or it is a standard one like Rope or 10. -- or it is exported.


Handle: TYPE-REF BufferObject: BufferObject: TYPE = Rope.RoPE New: PROC RETURNS[h: Handle];

Get: PROC[h: Handle] RETURNS[BufferObject]; Put: PROC[h: Handle. o: BufferObject] }:

Bufferlmpl: MONITOR [f: CIFS.OpenFile] LOCKS Buffer.GetLock[h]t

USING h: Buffer.Handle

IMPORTS Files: CIFS, 10, Rope


— module body -- 1 .

-- Implementations can have arguments. -- LOCKS only in MONITOR, to specify -- a non-standard lock.

-- Note the absence of semicolons.


-- Note the final dot.


Modules serve a number of functions (which might perhaps better be disentangled, but are not): A file of source text (BufferImpLmesa), or its translation into object code (BufferlmpLbcd).

The unit handled by the editor, named in DF files and models, and accepted by the compiler, the binder, and the loader.

A set of related structures (types, procedures, variables) which are freely accessible to each other, hiding secrets or irrelevant information from other modules.

A procedure which can accept interface types and bindings as arguments, and returns interface instances as results.

The procedures of a monitor, perhaps with its protected data.

The first two uses are not relevant to the language definition, and are not discussed further here. The others are the subject of this section.

There are two kinds of modules: interface modules (written with DEFINITIONS) and implementation modules (written with PROGRAM or MONITOR). They have the same header (except that interfaces have no EXPORTS list); it defines the parameters and results of the module viewed as a proc (§ 3.3.1) and specifies the name nm of the module. The bodies (following the —) are different. Table 3 —3

summarizes the structure of modules and their types; it omits a number of details which are given in rules 1-3 and explained in the text.



Module type


Result type



[Rope: TYPE Rope, 10: TYPE 10]


TYPE Match

Match: DEFINITIONS-{...}


-0[TYPE Match]



DIRECTORY Match, Rope. 10; Matchlmpl: PROGRAM

Implementation module

[Match: TYPE Match,

Rope: TYPE Rope, 10: TYPE 10.

Exported instance


IMPORTS R: Rope. 1: 10


R: Rope. 1: 10]—).[Match]



EXPORTS Match-1...1






Table 3-3: Interface and implementation modules

The ensuing sub-sections deal in turn with:

§ 3.3.1: Modules as procedures. and the interface or instance values they return. § 3.3.2: How modules are applied.

§ 3.3.3: Module parameters: the DIRECTORY and IMPORTS lists: USING clauses.

§ 3.3.4: Interface module bodies and interfaces.

§ 3.3.5: Implementation module bodies: the EXPORTS list,

§ 3.3.6: SHARES and access12.

The meanings of the other parts of a module header are discussed elsewhere: CEDAR in § 3.4.4.

MONITOR and LOCKS in § 4.10.

3.3.1 Modules and instances

A module is a proc which takes two kinds of arguments:

Interfaces, declared in the DIRECTORY list. These arguments are supplied by the model (or on the compiler's command line), and used during compilation.

Instances of interfaces, declared in the IMPORTS list. These arguments are also supplied by the model (or in a config file passed to the binder, or implicitly by the loader), and used during loading.

§ 3.3.3 discusses the types of these arguments and how they are declared. In addition, an implementation may take PROGRAM arguments declared in the drType following PROGRAM or MONITOR. These are ordinary values: they are discussed in § 3.3.2A.

When a module is applied to its arguments, the resulting value is

For an interface module, an interface.

For an implementation module, a binding whose values are instances:

one interface instance for each interface it exports:

one for the program instance, also called a global frame;

one for the program proc derived from the module body (§3.3.2A), called


This application cannot be written in the program, only in the model: it is described in § 3.3.2.

An interface (sometimes called an interface type) is a type, as the latter name suggests. This type is a declaration (obtained from the declarations which constitute the module body), with an extended cluster that includes all the bindings in the module body that don't use declared names (§ 3.3.4). In the example, the Buffer interface (obtained by applying the Buffer module to the arguments declared in its DIRECTORY) has declarations for New, Get, and Put, and its cluster includes values for Handle and BufferObject.

An interface instance is a value whose type is an interface: such values are the results of instantiating implementation modules. In the example, Bufferlmpl returns (exports) an instance of Buffer.

A program instance or a global frame is a frame. as the latter name suggests, i.e., a binding obtained from the bindings and declarations of an implementation (PROGRAM or MONITOR) module body, just like any proc frame (§3.3.5). Normally code outside the module does not deal with the instance directly, but only with the exported interface instances. In the example, Bufferlmpl exports a program instance for the module and a CONTROL proc.

In most cases, there is:

Exactly one application of each module, and hence exactly one interface or one instance. Only one module which exports an interface.

Only one interface exported by a module.

Only one argument of the proper type for each module parameter (§3.3.3); hence it is redundant to write the arguments explicitly.

When these conditions hold, there is a close correspondence among the following four objects: an interface module;

the interface it returns (since its arguments need not be written explicitly);

the implementation module which exports the interface;

its instance (again, since its arguments need not be written explicitly).

The distinctions made earlier in this section then seem needless; it is sufficient to simply consider the interface and implementation modules, and identify them with the files which hold their text. In more complicated situations, however, it is necessary to know what is really going on.

In the example at the start of this section, BufferImpl is an implementation module with seven parameters:

Four interface parameters, declared in the DIRECTORY: Rope, CIFS, 10 and Buffer.

Three instance parameters, declared in the IMPORTS: Files (of type CIFS), 10 (of type 10), and Rope (of type Rope). Since the instance parameters are declared in an inner scope, the instance Rope is the one visible in the module body; the interface Rope is visible only in the header. The same is true for 10, but both the interface CIFS and the instance Files are visible in the body.

When BufferImpl is compiled, the four interface parameters must be supplied, in the form of (compiled) interface modules named Rope, CIFS, 10 and Buffer. When BufferImpl is instantiated (normally by loading it), the three instance parameters must be supplied, i.e. there must be other instantiated implementation modules which export the Rope, CIFS, and 10 interfaces. Normally there will be one of each, and the entire program will consist of eight modules:

the interface modules Rope, C1FS, 10 and Buffer,

implementation modules normally named Ropelmpl, CIFSImpl, 101mpl and Buffer1mpl, each exporting an instance of the corresponding interface

The instantiated Bufferlmpl exports an instance of Buffer, which can thus be used as a parameter by some other module.

3.3.2 Applying modules

A module is not applied to all its arguments at once. Instead, the arguments are supplied in two stages:

A module is applied to its interface (DIRECTORY) arguments by compiling it; the result is a BCD (represented by a .bcd file). The bcd is still a proc, with instance parameters. Like any proc, a module can be applied to different arguments (i.e., different interfaces) to yield different results (BCDs).

A BCD is applied to its instance (IMPORT) arguments by loading (or binding) it; the result is a program instance, together with any interface instances exported by the module. Again, the BCD can be applied to different arguments (i.e., different interface instances) to yield different instances. Indeed, because an instance may include variables, even two applications to the same arguments will yield different results (instances).

These two stages are separated for several reasons:

All the type-checking of a module can be (and is) done in the first stage, by the compiler. The only type error possible in the second stage is supplying an unsuitable argument.

Compiling is much slower than loading, and a module needs to be recompiled only when its interface arguments change, not when the interface instances change. The latter are changes in the implementations of the interfaces, and are much more common.

When there are multiple instances of the same module with the same interface parameters, they automatically get the same code.

We've always done it that way.

3.3.2A Initializing a program instance

The statements in the body of an implementation module form the body of a proc called the program procedure. The function of this proc is to initialize an instance of the module. When program instance PI is made, no code in the module is executed: hence PI may be uninitialized. It is the job of the program proc PP' to initialize PI, perhaps using the PROGRAM arguments if there are any. Until PP' has been called, PI is not in a good state. It would be better to supply the PROGRAM arguments along with the imported instances, and call PP' as part of making P1, so that PI is never accessible in its uninitialized state. But it isn't done that way: hence the programmer must ensure that PP is called before any use is made of P1. The preferred way to get hold of PP is from an interface to which it is exported: see § 3.3.5.

To confuse things. PP is not an ordinary procedure but a PROGRAM, and it must be called using the START construct (see § 4.4.1). Note that in addition to the statements of the module body, PP also contains the type-specific initialization code for any variables or non-static values in the instance: e.g., if x: INTO-3, the value of x will not be 3 until after PP' has been called.

There is some error detection associated with this kludge. If a proc in the instance is called before the instance has been initialized by START, a start trap occurs. At this point, if PP' takes no arguments it is called automatically, and the original call then proceeds normally: if PP does take arguments, there is a Runtime.StartFault ERROR.

Caution on initializing monitors: If the module is a monitor, PP' runs without the monitor lock: if another process calls into the module while PP' is running, it will not wait, but will run concurrently with PP. This is unlikely to be right. It is unwise to rely on a start trap to initialize a monitor module: call PP explicitly with START.

Caution on referencing module variables before initialization: If a variable in the instance is referenced before the instance has been initialized, no error is detected, and the uninitialized value will be obtained. PP' can still be called to initialize the instance, and may still be called automatically by a start trap.

The program proc is bound to the name CONTROL in the result of an implementation module if its type is PROGRAM[] RETURNS (otherwise the proc RuntimeReportStartFault is bound to CONTROL). This allows the modeller (and binder) to get access to PP so as to control the order in which modules are started.

3.3.3 Parameters to modules: DIRECTORY and IMPORTS

The interface parameters of a module are declared in the DIRECTORY. An interface / has type TYPE n, where n is any one of the names given before DEFINITIONS in the header of the interface module that produced I. The INTERFACETYPE primitive in the desugaring takes a list of atoms and returns a type which implies TYPE n for each $n in the list. The reason for allowing several names is to aid conversion of an interface from one name to another: both names can continue in use for a while.

The use of these names provides a clumsy check that the proper interface is supplied as an argument. DIRECTORY n: TYPE and DIRECTORY n are both short for DIRECTORY n: TYPE n.

The compiler must be able to find the interface arguments. which in general are stored as files. When the modeller is used, it supplies these arguments from the specifications in the model. Otherwise, they may be specified explicitly on the compiler's command line, or failing that. the compiler gets the interface I from the file I.bcd.

An interface is a type which can only be used:

Before a dot (§4.14). to obtain a value from its cluster, which simply consists of the bindings in the interface module body (§3.3.4).

In an IMPORTS list as the type of an instance parameter to a module. After POINTER TO FRAME (§ 4.5.3)

The USING clause in the DIRECTORY, if present. restricts the cluster of the interface to contain only items with the names nu, ... Thus in the example, only ROPE and Compare are in the cluster of Rope

in the BufferImpl module. This means that Rope.ROPE and Rope.Compare are legal, but Rope.n for any other n will be an error. Note that USING affects only the cluster of the parameter: it does not affect the clusters of any types or the bodies of any INLINE procs obtained from the interface. Thus within Rope. Compare might be bound by

Compare: PROC[rl, r2: ROPE] RETURNS NOOLHNLINE IF Length[r1]' = Length[r2] THEN ... I
A call of Rope.Compare in BufferImpi is all right, even though RopeLength in BufferImpi is an error.

In the example, CIFS, 10, and Rope are interfaces. They are the types of three IMPORTS parameters named Files, 10, and Rope (if the IMPORTS clause gives no name for the parameter, the name of the interface is recycled). An actual argument for an IMPORT parameter must be an interface instance, i.e., a value whose type is an interface type. Such a value is obtained from one or more modules which export the interface (§ 3.3.5). An instance is a binding: in it, the value of a name declared in the interface is provided by the exporter: the value of a name bound in the interface (e.g., x-3) is just the value the interface binds to the name (in this case, 3). This rule has two effects:

The client can ignore the distinction between names bound and declared in the interface, since both appear in the instance binding and are referenced uniformly with dot notation. This means that the client is not affected, for example. when a proc is moved from an INLINE in the interface to an ordinary definition in an implementation.

The client can often ignore the distinction between the interface and the instance, since all the values in the interface are also in the instance, with the same names. This is the motivation for the shorthand which allows the name of an IMPORT parameter to default to the name of the interface: the interface is no longer accessible, but I.x has the same meaning (namely 3) whether / is the interface or the instance.

Caution on inlines in interfaces: Names bound to inline procs in an interface do not appear in the interface binding, but only in an instance. This somewhat dubious rule ensures that clients won't have to add to their imports lists if a proc stops being an inline.

Restriction on importing multiple instances: An interface module may not import more than one instance of a given interface I. If an implementation module P imports more than one instance of I. the principal instance of / is the one with no name in the IMPORTS list (which is therefore named I by default). if P imports only one instance of type I. then that instance is the principal instance.

Restriction on importing a principal instance into imported interfaces-. Often an interface module has no IMPORTS. because it only needs access to the static values (types and constants) bound in its interface parameters, and does not need values for any names declared there (procs and interface variables). If an interface module does have IMPORTS, however, and there is more than one instance of any imported interface around, then there is a restriction on the argument values. Suppose that Intl imports Int2, and that a program module P imports Intl. Then Intl may only import one instance of Ina. and if P also imports MO. the principal instance of /n/2 in P must be the same as the value of Intl imported by the Intl imported by P. For example, with


DIRECTORY Intl. Intl: P. PROGRAM IMPORTS Int1V: Intl. Int2V: Int2...

we must have in P that Int1V.Int2V--=Int2V.

3.3.4 Interface module bodies

The body of an interface module I is a collection of bindings (e.g.. x: INT-3) and declarations (e.g..


Restriction on bindings in interfaces: The construct that follows the – in one of the bindings13 is restricted:

If it is an expression, it must be static (§3.9.1). Thus, no imported names. As a result, P: PROCLP


are not allowed.

If it is a block (providing the body of a proc), it must be INLINE (because there isn't any place to put the compiled code).

It may not be CODE. This is an unfortunate accident of the implementation.

The result of applying an interface module is an interface (§3.3.2), which is a type I obtained by applying the primitive MKINTTYPE to the d's and b's of the body. This type is simply the declaration obtained by collecting the declarations in the body, with a cluster which is extended to include all the bindings of the body. However, MKINTTYPE omits any inline proc bindings from the type's cluster, instead leaving the proc declarations in I. It puts an extra item BINDING in Is cluster with the inline procs in it. When an instance Ins: of 1 is imported, the binding actually imported is Inst PLUS 1.B1NDING. This slightly dubious arrangement ensures that clients don't have to change imports lists if a proc stops being inline. This policy is not extended to other items, however, even though they might change from being bound in the interface to being interface variables.

The interface returned by

Red Blue. Green: DEFINITIONS...

has the types TYPE Red TYPE Blue and TYPE Green.

Restriction on referring to names introduced in an interface: The types and expressions in the declarations and bindings of an interface may refer to other names in the bindings as usual, but they may not refer to names introduced in the declarations, except that:

Any declared name may be used

in the body of an INLINE, or

after a "4-" in a defaultTC55 in the fields43 of a transferTC41 which is the type of a decl in the interface's body.

A declared (opaque) type may be used anywhere. For example, if an interface contains

x: INT-3;



then the following may also appear in the interface: xx: INTx+ 1;


Q:   PROC [INT4-y];


but the following are illegal: xy: INTy+ 1:



The values of the bindings can be accessed directly by dot notation in any scope in which the interface is accessible. Thus if the value of the previous interface module is bound to J, e.g., because

J: TYPE / appeared in the DIRECTORY, then J.x is equal to 3. The declarations cannot be accessed directly (J.y is an error).

The declarations in an interface module are not quite like ordinary declarations. They are of three kinds, depending on whether the type of a declaration is:

A transfer type (§ 4.4.1); this is just like a declaration of a transfer parameter to an ordinary proc, except that it is readonly.

TYPE[ANY] or TYPE[e]; the type being declared is an opaque type or exported type. discussed in § 4.3.4. The expression e must be static. TYPE[ANY] or TYPE[E] is not allowed in an ordinary declaration; except in an interface, a type name must be bound to a type value when it is introduced.

VAR T, or READONLY T for any type T except TYPE: this is an interface variable: discussed in § below. You can also write simply T here, but this is not recommended.

An interface instance // has the interface type / if for each item n: T in the interface, there is an item n— v in the instance, and v has type T. This is the same rule which determines that a binding has the type of a declaration; e.g., that a proc argument has the domain type. In this respect there is nothing special about an interface.

Note that a name can be declared PRIVATE in an interface, even though it must be declared PUBLIC in the exporter (§ 3.3.6). This can be useful if the name is used in a type constructor or inline proc in the interface, but its value should not be accessible to the client

3.3.4A Interface variables

An interface variable v gives clients of an interface direct access to a variable in a program module, namely the variable which is exported to v. This is the only kind of variable parameter in current Cedar.

·If you use the obsolete shorthand of T for VAR T in an interface variable declaration, you cannot declare a transfer type variable as an interface variable, since that already means passing the transfer value.

Caution on uninitialized interface variables: the variable which is exported to provide the value for an interface variable is not initialized until its module is initialized (§ 3.3.2A). However, there is nothing to stop it from being accessed sooner, with possibly undesired results.

Performance of interface variables: An interface variable can be read and (if not READONLY) set directly, which is significantly faster than Get and Set procs. Of course, the implementor gives up some control. These operations are not quite as fast as access to an ordinary variable, since there is an extra level of indirection which costs one or two extra instructions each time. There is also one pointer per interface variable per module which refers to it. If you use a private interface variable and inline Get and Set procs, you pay nothing in performance, but retain the option of changing the proc definitions later.

·You can get direct access to all the variables of a module by using a POINTER TO FRAME type (§ 4.5.3).

3.3.5 Implementation module bodies

The body of an implementation module Imp is simply a block. This block plays two roles. On the one hand, it is an ordinary block, the body of an almost ordinary proc PP' called the PROGRAM proc, which has parameters and results like any other. PP' is special in one way: it has a PROGRAM type rather than a PROC type. When PP' is applied (using the special construct START: see § 4.4.1), its declarations and bindings are evaluated, its statements are executed, and its results are returned

as with any proc. The only difference is that the values bound to the names introduced in the block (i.e., the frame of PP') are retained after the proc returns: in fact, forever (unless Runtime.Unnew is used to free the frame). Procs local to the block can access these values in the usual way. and values of exported names can also be accessed through interfaces, as explained below: see § 3.3.2A.

As with any proc (§ 3.5.1). the frame of PP includes the parameters and results from Imp's drType42 as well as the names introduced in the block's d's and b's. It also includes an additional item:


where Imp is the name of the module and T is its drType.

The body of Imp has a second role: to supply values for the names declared in the interfaces exported by Imp. For each interface Ex which Imp exports. an interface value Exi of type Ex is constructed. Each name n in Exl acquires a value as follows:

If n: T is in Ex and n: PUBLIC T-'x is in the body of Imp, then nx is in ExI. This is a slightly peculiar kind of binding: as in an ordinary binding, x must be coerceable to