This document was made by OCR
from a scan of the technical report. It has not been edited or proofread and is
not meant for human consumption, but only for search engines. To see the
scanned original, replace OCR.htm with Abstract.htm or Abstract.html in the URL
that got you here.
A Description of the
Cedar Language
A Cedar Language
Reference Manual
Butler W. Lampson
CSL-83.15 December 1983 (Printed November 1986) [P83-00016]
©
Copyright 1983, 1986 by Xerox Corporation. All rights reserved.
Abstract:
The Cedar Language is a programming language derived from Mesa, which in turn is derived from Pascal. It is meant to be used
for a wide variety of programming tasks, ranging from low-level system software to large applications. In
addition to the sequential control constructs, static type checking and
structured types of Pascal, and the modules, exception handling, and
concurrency control constructs of Mesa, Cedar also has garbage collection, dynamic types, and a limited form of type
parameterization.
This
report describes the Cedar language. Except for chapter 2, it is written
strictly in the style of a
reference manual, not a tutorial. Furthermore, it describes the entire
language, including a number
of obsolete constructs and historical accidents. Hence it tells much more than
you probably want to know. A
summary of the safe langauge and comments throughout the manual suggest which constructs should be preferred for
new programs.
CR
Categories and Subject Descriptors: D.3.2 [Programming Languages]: Language Classifications - Cedar, extensible languages;
D.3.1 [Programming Languages]: Formal Definitions and Theory - semantics
Additional Keywords and
Phrases: kernel language, polymorphism, data types
The
work described here was completed in late 1983 but not published at that time.
It attempts to provide a reasonably formal and precise
definition of the Cedar programming language. The then current version of the
language was perceived as an inadequate base for a number of planned extensions
to the language and supporting environment: on the other hand, there was
already a large body of Cedar code that could not simply be
abandoned. These problems are dealt with by defining a small but powerful
kernel language plus a mapping of existing Cedar constructs into that kernel.
The kernel language introduces value spaces and operations over them that go
well beyond what has been available in any implemented version of the Cedar
language; it was to provide the basis for extension and
simplification. The mapping from existing Cedar into the kernel provides not
only a migration path for existing code but also a definitional
method.
This
report should be of interest to students of programming languages and their
definitions. Most of the interesting ideas of the Cedar
language appear in the kernel, which is described in Chapter 2.
Such readers should note that the formalism used to describe the kernel has
several known shortcomings. Its treatment of so-called
dependent types is somewhat cavalier. A subsequent report by Burstall
and Lampson ("A Kernel Language for Modules and Abstract Data Types,"
Digital Systems Research Center, September 1984) includes a more careful
treatment of such types in a language very similar to the kernel. The present
treatment also glosses over most of the definitional problems raised by
the possibility of concurrent evaluation.
The
report should also be of interest to Cedar programmers. Chapters 3 and 4
constitute the most complete. precise and accurate definition of the
implemented Cedar langauge that has appeared to date. For a reader
willing to make the effort to assimilate the concepts introduced in Chapter 2,
this report can serve as an interim reference manual. The
later chapters are painfully honest and complete; as
the abstract notes, they say much more than anyone probably wants to know. As
of March 1986, the only known differences between the description
and implemention, other than minor bugs in each, are the
following:
· The
improved syntax for ENTRY
and INTERNAL has not been implemented; these attributes must
still precede the type in a procedure declaration (Section 3.5).
· Sections
3.3.4 and 4.3.4 document an improved design for opaque types that was never implemented.
In current Cedar, opaque types behave as they do in Mesa.
· According
to Section 4.14, if P is
a procedure taking one argument, its application to x using dot
notation is written without brackets, as x.P. In current Cedar, the
alternative form x.P[] is also accepted.
Both
classes of readers should note that many parts of the kernel language have
never been implemented in their full generality. Some of the current developers
and users of the Cedar language would not even agree that
the directions of evolution suggested by the kernel language are desirable or feasible.
The claims about long-term goals and promised improvements in this report
should therefore be taken as the personal opinions of the author.
Ed Satterthwaite, March
1986
Chapter 1. Introduction
The
Cedar language is a programming language derived from Mesa, which in turn is
derived from Pascal. It is meant to be used for a wide
variety of programming tasks, ranging from low-level system
software to large applications. In addition to the sequential control
constructs, static type checking and structured types of Pascal, and the
modules, exception handling, and concurrency control constructs
of Mesa, Cedar also has garbage collection, dynamic types, and a limited form
of type parameterization.
This manual describes the Cedar language. Except for the
overview material in § 2.1 and the discussion of concepts in §§2.3-2.7. it is
written strictly as a reference manual, not a tutorial. Furthermore,
it describes the entire language, including a number of obsolete
constructs and historical accidents. Hence it tells much more than you probably
want to know. A summary of the safe language and comments
throughout the manual, suggest which constructs should be preferred for
new programs.
The manual is organized into three major parts:
Chapter 2: A description of a much simpler kernel language, in
terms of which the current Cedar language is explained. This description
includes:
An overview or glossary, in which the major technical
terms used in the kernel are briefly defined (§2.1).
An informal explanation of the ideas of the
kernel and the restrictions imposed by current Cedar (§§ 2.3-2.9)
A precise definition of the kernel (§2.2). Most readers
will probably find this rather hard going.
Chapter 3: The syntax and semantics of the current Cedar
language. The semantics is given precisely by a desugaring into the kernel. It
is also given more informally by English text. This chapter also
contains a number of examples to
illustrate the syntax.
Chapter 4: The primitive types and procedures of Cedar. For each one, its
type is given as well as an English definition of its meaning.
This chapter is organized according to the class hierarchy of the primitive types (§4.1).
In
addition, there is a one-page grammar for the full language, a shorter grammar
for the safe language, and a two-page language summary which
includes the grammar, the desugaring, and the examples from § 3.
The tables in §§4.1-2 summarize the types and primitives.
To find your way around:
First read chapter 2, except for § 2.2.
Then consult the table of contents, or the
index, for the topics of interest to you. The full grammar
(at the end) and the class hierarchy (Table 4-1) may also be useful as starting
points.
The manual is extensively cross-referenced. Section
titles and numbers appear at the top of each page. The summaries and tables
also point to the section in which each construct is defined.
Acknowledgements: Rod
Burstall and Ed Satterthwaite helped me greatly in clarifying the ideas presented
in § 2. Ed was also indispensable in getting an accurate description of the
current Cedar language. Bill McKeeman's work on an earlier
Cedar language description was the starting point for this
manual. Will Crowther, Jim Horning and Lyle Ramshaw read part or all of the
manual carefully, and made many helpful comments. Several other
Cedar programmers have pointed out errors or omissions. Of
course, I am responsible for the errors that remain.
Chapter 2. The kernel
language
This
document describes the Cedar language in terms of a much smaller language,
which we will usually call the kernel
or the Cedar kernel. Cedar differs from the kernel in two
ways:
· It
has a more elaborate syntax (§ 3). The meaning of each construct in Cedar is
explained by giving an equivalent kernel program.
Often the kernel
program is longer or less readable: the Cedar construct can be thought of as an
idiom which conveniently
expresses a common operation. Sometimes the Cedar construct has no real
advantage, and the difference
is the result of backward compatibility with the ten-year history of Mesa and
Cedar.
· It
has a large number of built-in or primitive
types and procedures (§4). In the kernel language
all of these could in principle be programmed by the user, though in fact most
are provided by special code in the Cedar compiler. In
general, you can view these built-in facilities much like a library,
selecting the ones most useful for your work and ignoring the others.
Unfortunately, the current Cedar language is not a
superset of the kernel language. Many important objects (notably
types, declarations and bindings) which are ordinary values in the kernel that
can be freely passed as arguments or bound to variables, are subject to various
restrictions in Cedar: they can only be written in literal form, cannot
be arguments or results of procedures, or whatever. The
long-term goal for evolution of the Cedar language is to make it a superset of
the kernel defined here. In the meantime, however. you should view
the kernel as a concise and hopefully clear way of describing the
meaning of Cedar programs.
To help in keeping the kernel and current Cedar separate,
reserved words and primitives of the kernel which are not
available in current Cedar are written in SANS-SERIF SMALL CAPITALS, rather than the SERIF SMALL CAPITALS used
for those symbols in current Cedar. Operator symbols of the kernel
which are not in current Cedar are not on the keyboard.
The kernel is a distillation of the essential properties
of the Cedar language, not an entirely separate invention. Most Cedar
constructs have simple translations
into the kernel. Those which do not (e.g., some of the
features of OPEN) are
considered to be mistakes, and should be avoided in new programs.
Roadmap
§ 2.1 gives a brief summary of each major idea in the
kernel, which may be helpful as an introduction and reminder.
Most of the chapter (§§2.3-2.8) is an informal explanation of the concepts
behind the kernel. Usually. terms are defined and explained before they are
used, but some circularity seems to be unavoidable. Both this and
the explanations in §§2.3-2.7 are given under five major headings,
as follows:
Values and computations The type system
Programs
Conveniences
Miscellaneous
There is also a sketch of the restrictions imposed by the
current Cedar language on the generality of the kernel: for
more on this subject, see § 3. The meaning of the various built-in primitives
is given in § 4. The incompatibilities between the kernel language
and current Cedar are described in § 2.9. i.e., the constructs in
Cedar which would have a different meaning in a kernel program. For the most
part, these are bits of syntax which do not have consistent meanings in current
Cedar: future evolution of the language will replace them with
their kernel equivalents.
§ 2.2 precisely defines the syntax and semantics of the
Cedar kernel language, the former with a grammar, and the latter by
explaining how to take a program and deduce the function it computes and
the state changes it causes. The kernel definition follows the ordering of the
kernel grammar. This section is rather difficult to read, and you
may prefer to skip it.
2.1 Overview
This section gives a brief summary of the essential
concepts on which the Cedar language is based. The explanations
are informal and
incomplete. For
more precise but more formal definitions, see § 2.2: for more
explanation, see § 2.3-§ 2.8.
2.1.1 Values
and computations
Application: The basic mechanism for
computing in Cedar is applying a
procedure (proc for
short) to arguments. When
the proc is finished, it returns some results, which can be discarded or passed as arguments
to other procs. The application may also change the values of some variables.
In the program an application is denoted by (the denotation of) the proc
followed by square brackets enclosing (the denotation
of) the arguments: f [first-3,
last– x+ 1]: here the – symbol binds the value
of the expression on the right to the name on the left. There are special ways
of writing many kinds of application: x+ 1, person.salary. IF x<3 THEN red
ELSE green,
x<-7 .
Value: An entity which takes part in the computation
(i.e., acts as a proc, argument or result) is called a value. Values are immutable: they are not
changed by the computation. Examples: 3, TRUE, "Hello". X [x:
!NT] IN x+ 3: actually these are all
expressions which denote values in an obvious way. The
X-expression denotes a proc value P: the name x is called a parameter. When P is applied to
an argument, the
parameter x is bound to the argument.
Variable: Certain values. called variables, can contain other values. The value
contained by a variable v (usually called the value of v) is returned by v.VALUEOF, and
can change when a new value is assigned to v. In addition to its results, a proc may have
side-effects by
changing the values of variables. Nearly every non-variable type T has a corresponding
variable type VAR T; values of type
VAR T contain values of type T. Every VAR type has a NEW proc which creates
a variable of the type. A variable is usually represented by a
single block of storage: the bits in this block hold the representation
of its value. A variable may be local
to a proc. or it may be created by an explicit call
of NEW, and referred to by a REF or pointer value.
Group: A group is an ordered set of
values, often denoted by a constructor
like this: [3, x+ 1, "Hello"].
Like everything else. a group is itself a value.
Binding: A binding is an ordered set of [name.
value] pairs, often denoted by a constructor like this: [x:
INT-3, y: BOOL—TRUE] (or
simply [x-3, y–TRUE], in which the types of the names are the syntactic
types of the expressions). If b is
a binding. b.n denotes
the value of the name n in
b. Note
the difference between binding and assignment: one
introduces a new name with a fixed value: the other changes the
value of a variable.
Argument: A binding constructor
written explicitly after an expression (e.g., Copy{from–
x, to–y]) denotes application of the value P denoted by the expression
to the value a denoted
by the constructor, called the argument. P is usually a proc, and a is a binding, which is bound to P's domain
declaration D to
get the argument which is passed. In making this binding a is coerced, if necessary,
to match the declaration:
If a name in D is missing from a. a default
value is supplied.
If a value in a doesn't have the type
required by D, it
is coerced (if possible) into another value which does.
The
constructor can also be for a group, in which case the names from D are
attached to its elements to turn it into a binding.
2.1.2 The type system
Type: A type defines a set of
values by specifying certain properties of each value in the set (e.g., integer
between 0 and 10); these properties are so simple that the compiler can make
sure that proc arguments have the specified properties. A value
may have many types; i.e., it may be in many of these sets. A type
also collects together some procs for computing with the value (e.g., add and multiply).
More precisely, a type is a value which is a
binding with two items:
Its predicate,
a function from values to the distinguished type BOOL. A
value has type
T if
T's predicate returns TRUE when applied to the value.
Its cluster,
a binding in which each value is usually a proc taking one
argument of the type. For any
expression e, the expression e.f denotes the result of
looking up fin
the cluster of is syntactic type V e, and applying the resulting
proc to the value of e.
A proc's type depends on the types of its domain
and range; a proc with domain (argument type) D and range (result type) R has the type D— )R. Every expression e has a syntactic
type denoted V e. e.g.,
the range declared
for its outermost proc; in general this may depend on the arguments. The value
of e always
has this type (satisfies this predicate); of course it may have other types as
well.
Mark: Every value carries a set of marks (e.g., INT or ARRAY: think of them as little
flags stuck on top of the value). The predicate HASMARK tests
for a mark on a value; it is normally used to write type
predicates. The set of all possible marks is partially ordered.
The set of marks
carried by a value must have a largest member m. and it must include every mark
smaller than m. Hence all the marks on a value can
be represented by the single mark m: we can say that m is the mark on the value. This does not imply a total ordering on the
marks.
Type-checking: The purpose of type-checking is to ensure
that the arguments of a proc satisfy the predicate of the domain
type; this is a special kind of pre-condition for executing the proc. The proc
body can then rely on the fact that the arguments satisfy their type
predicates. It must establish that the results satisfy the predicate
of the range type; this is a special kind of post-condition
which holds after executing the proc. Finally, the caller can rely on the fact
that the results satisfy their type predicate. In summary:
Caller— establish pre-condition: arguments have the domain
type;
rely on post-condition: results have the range type.
Body— rely on pre-condition: parameters have the
domain type;
establish post-condition: returns have the range
type.
Declaration: A declaration is an ordered set of [name, type] pairs. often
denoted like this: [x:
INT, y: BOOL]. If d is a declaration, a binding
b has
type d if
it has the same set of names, and for each name n the
value b.n has
the type dn. A
binding b matches d if
the values of b can be
coerced to
yield a binding b' which
has type d.
A declaration can be instantiated (e.g., on block entry) to produce a binding in
which each name is bound to a variable of the proper type;
instantiating the previous example yields
[x: VAR INT—(VAR INT).NEW, y VAR BOOL-'(VAR BooL).NEw].
Class: A class is a declaration for the cluster of a type. For
instance, the class Ordered is [T: TYPE. LESS: PROC[T, T]--0.[Boot1,
. . 1. C is
a subclass of
D if
(loosely) C includes
at least all the [name. type] pairs in D.
2.1.3
Programs
Name: A name
(sometimes called an identifier) appearing in a program
denotes the value bound to
the name in the scope that the name appears in (unless the name is in a
pattern before a colon (declaration or binding) or tilde (binding), or
after a dot or $). An atom is
a value that can be used to refer to a name; a literal atom is written
like this: $alpha.
Expression: In a program a value is denoted by
an expression, which
is one of: a literal value-3
or "Hello":
a
name– x or salary:
an application
of a proc value to a group or binding value– GetProperties[directory.
input]:
a A-expression.
which yields a proc value– A [x: INT]=>[INT] IN (IF x<0 THEN –x ELSE x);
a constructor for
a declaration or binding–[x: INT-3, y REAL-3.14D.
If a value is given for each free name in an expression,
then it can be evaluated to
produce a value. Thus an expression is a rule for computing a
value. The entire program is a single expression. made up
of sub-expressions according to the five constructs above.
Scope: A scope is a region of the program
in which the value bound to a name does not change (although
the value might be a variable, whose contents can change). For each scope there is a binding
called ENV (for
environment) which
determines these values. A new scope is introduced (in the
kernel) by IN (after
LET or
A) or by a REC [...]
constructor for a declaration or binding; e.g..
LET x"3 IN x-l- 5:
LET
REC Fact–A[n: INT]=>[r: INT] IN (IF n THEN 1 ELSE rt*Fact[n-1])
IN Fact[4].
The
first expression evaluates to 8. the second to 24.
Constructors:
Brackets delimit explicit constructors for group.
declaration or binding values. They all have the form [x1, x2....], and are distinguished by
the form of the x.:
an expression for a group; n:
e for a declaration:
n–e or
n: e1–e2 for
a binding.
Recursion: When names are introduced in
a constructor in Cedar, this is done recursively:
If v is
bound to n in
a binding constructor, then in expressions in the constructor n has the value
v. rather than its value in the
enclosing scope. Exception: argument bindings are non-recursive.
If n is declared in a declaration constructor,
then it may not be used in the constructor, unless there is an
ordering of the declarations in the constructor such that a name is used only
by later declarations. Exception: declared names may be used in the bodies of A-expressions
in the constructor (see § 3.3.4).
In the kernel, however, constructors are non-recursive
unless preceded by REC.
Dot
notation: The form an looks up n in some
binding associated with e. and does something with the
result. There are three cases:
if e is a binding, an is
just the value paired with n in
e.
If e is a type. e.n is e.Cluster.n.
Otherwise. e.n
is (V
e.n)[e], and e.n[more args] is usually (V e.n)[e, more
args]. Recall that V e is
the syntactic type of e.
In
all cases you are supposed to think of n as some property or behavior associated with e: an denotes that property or
evokes that behavior.
2.1.4
Conveniences
Coercion: Each type cluster may contain To and From procs
for converting between
values of the type and values of other types (e.g.. Float: PROC[INT]-'[REAL]:
this would be a To proc
in REAL and
a From proc
in 4\T).
One of these procs is applied automatically if necessary to convert or coerce an argument
value to the domain type of a proc: this application is a coercion. Each coercion has
an associated atom called its tag (e.g.. $widen for INT—REAL or $output for INT—ROPE): several
coercions may be composed into a single one if they have the same tag. The tags
thus serve to prevent unexpected composition of coercions: all
are NIL currently.
however.
Exception: There is a set of exception values. An expression e denotes a value
which is either of type Ce
or is an exception. Whenever an exception value turns up
in evaluating an expression e1. it
immediately becomes the value of e1. unless
(in the kernel) e1 has the form e, BuT {...}. The {...}
tests
for exception values and can supply an ordinary value, or another exception. as
the value of the BuT expression. An exception value may contain an ordinary
value, called the argument of
the exception. so
that arbitrary information can be passed along with an exception.
Finalization: When a variable is no longer accessible,
the storage it occupies is freed (automatically in the safe language).
Before this is done. a finalization
proc in the cluster of the variable's type is called
to do any other appropriate resource deallocation. Finalization is done by
separate processes. and hence must be explicitly synchronized with
the rest of the program. The local variables of a proc
or other scope may also be finalized (using UNwiND): this is done synchronously
(§ 3.4.3A).
Safe: The safety
invariant says that all references are legal, i.e., each REF T value is NIL or refers to a
variable of type T. A
proc is safe if
it maintains the safety invariant whenever it is applied to arguments
of the proper types. If a proc body (A-expression) is
checked, the
compiler guarantees that the proc value is safe:
trusted. the
programmer asserts that it is safe (the compiler makes no checks): the proc
value is safe: unchecked, the
compiler makes no checks and the proc value is unsafe.
It is best to
write checked code whenever possible. However, checked code cannot call unsafe
procs (since the compiler then cannot guarantee safety).
Process: Concurrency is obtained by creating a number of processes. Each process
executes a single sequential computation. one step at a time. They
all share the same address space. Shared data (touched by more than one process)
can be protected by a monitor: only
one process can execute within the procs of the monitor at a time. So
that each process can know what to rely on. there must
be an invariant for
the monitored data which is established whenever a monitor proc returns or
waits. A process can wait on
a condition variable within
a monitor: other processes can then enter the monitor. The waiting
process runs again when the condition is notified. or after a timeout.
2.1.5 Miscellaneous
Allocation: Cedar has standard facilities
for allocating new variables of any type (the NEw primitive): related
variables can be allocated in the same zone. Normally. variables are deallocated automatically
by the garbage collector when
they can no longer be referenced: such variables can only
be referred to by REFS.
Variables can also be deallocated explicitly by FREE. but
this is unsafe.
Static: An expression whose value is
computed without executing the program is called static. Literals are static. as are
names bound to literals. and any expression with static operands. Proc bodies
are never static unless they are inline, and often not then.
Pragma: Some language constructs do
not affect the meaning of the program (except possibly to make a legal program
illegal). but only its time and space costs: these are called pragmas. Examples are
INLINE for
proc bodies and PACKED
for arrays.
2.2 Kernel definition
This section gives the syntax and semantics of the Cedar
kernel language. Motivation, and an explanation of the relation
between the kernel and the current Cedar language, can be found in §§
2.3-2.8. Since this section is rather formal, you are advised to read the rest
of the chapter first, and then return here if you want a more precise
definition.
The kernel is subdivided into
A rather austere core; anything can be
desugared into this, but not very readably (§2.2.1). A
set of conveniences: with
these, readable programs can be written (§ 2.2.2). Imperative constructs:
statements and loops (§ 2.2.3).
Exception handling
(§ 2.2.4).
The format of this section interleaves grammar rules which
give the syntax of the language with text which gives the meaning.
The meaning of the core is given in English. For other parts of the kernel,
it is given by desugaring rules which show how to rewrite each construct in
terms of others: if rewriting is done repeatedly. the result is a
core program, which may invoke some primitives. The meaning
of these is also given in English. There is also some English explanation of
the desugaring. but this is only a commentary and does not have
the force of law.
See § 3.1 for the notation used in the grammar and
desugaring.
2.2.1
The core
The Cedar core is a minimal subset of the kernel, barely
adequate as a base into which the rest of the kernal can be
desugared. In the core, there is syntax only for names. literals. application,
X-expressions. a basic and a recursive binding construction, and syntactic
type: everything else is done with primitives. We never
write anything in the core, however, except to show the desugaring of a kernel
construct. Thus the reader need not struggle with programs in the ugly core
syntax.
Many
readers may be happy with the kernel definition given in the other sub-sections
of § 2.2, and may wish to avoid the formalism of this section.
Table 2-1 gives the core syntax (in the first column).
together with a comment suggesting the meaning of each construct
(in the last column). The meaning is given in detail in § 2.2.1A-G. The middle
column gives the syntactic type of each construct. For readability, this is
written in the full kernel language. with a few conventions:
a * in front of the syntactic type indicates that it gives
less information that one would like. For instance. DDOTP has type DECL-4TYPE. which says nothing about the fact that the type is
a cross type whose structure matches the structure of the decl.
A parameter to a primitive declared with :: is the type of
some other argument: the argument for this type parameter may be omitted in an
application of the primitive, in which case it is supplied
as the syntactic type of the other argument. For instance, p: [r: TYPE. x: t]—*[...]
can be applied with p[x-3], which is short for ,[r—INT. x-3].
A bold name is a reference to another parameter, e.g.. t in the previous example.
In the kernel, a core primitive named xooly is in
the cluster of the type of its argument under the name
y. Thus
DDOTP is
in the cluster of DECL under the name P. so that d.P=DDOTP[d] if d is a decl.
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![Text Box: el n e2 I (Vel.RANGE)ted -- Standard application. I
A di = > d2 IN e I dI—od2 -- Standard proc constructor. I
A d1= > d2 IN e dl—>d2 -- Unchecked standard proc constructor. I
[(n—e), !..] [(n: Ve), ...] -- Vanilla binding constructor. I
FIX d e d -- Recursive binding constructor. I
De TYPE -- Syntactic type.
type ::= e Ve -- A type is syntactically just an expression.
deci ::= e -- A decl is syntactically just an expression.](32a-CedarLangOCR_files/image008.gif)
name ::=
letter (letter I digit)... literal ::=
$n[
primitive
primitive ::=
ARROW I
DOMAIN
I RANGE I MKPAIR
GROUP I
MKCROSS I
CDOTG MKBINDD
BDOTDIBDOTVI
MKBINDP I LOOKUP[ THEN
ENV I
MKDECLI
DDOTP
DDOTT I DTOB I
BTOD
THEND I
(VENv).n -- Appears as an e or in a
pattern.
A-10M .;TOM literal.
I
V
primitive
[d: DECL, p: (d—>DECL)]—qa: --arrow--fl PE] --arrow--TYPE]—,[r:
TYPE]
[ri
::
TYPE, firs:: t1, t,:: TYPE. rest: t_]—*[v: t1Xt2]
[ri: PE]—[t: TYPE] --iTYPE
[g: GROUP[TYPE]]—)[c:
--cross--TYPE]
*Et: --cross--TYPE]—+[g:
GROUP[TYPE]]
Ed: DECL, v:
d.-rho[b: d]
[b: BINDING]-0[d: DECL] [d:: DECL, b: [1]-4[v: d.T
[p: PATTERN, I:: TYPE, v: t]-3,[b:
MKDECL[p. t]] =MKBINDD[d—MKDECap, t]. v— v] Ed:: DECL, b: d. n: ATOM]—+[v:
oToB[d].n]
DECL, b1• d1' d2' •
: DECL, b,: dd,]-->[v:d,]-->[v:d1
THEND d, ]
"
*BINDING
*[p: PATTERN, t: TYPE]— [d: DECL]
*[d: DECL]—*[p: PATTERN]
*[d: DECL]—>[l: TYPE]
*[d: DECL]—)[b: BINDING]
--=MKBINDP[p—d.P, v—dT.G]]
*[b: BINDING]—*[d: DECL]
--=MKDECL[p—b.D.P, t—MKCROSS[b.V]] DECL, d2: DECL]—.[v:
DECL] --=BTOD[DTOB[di] THEN ErOB[di]l
BOOL I ATOM I TYPE
TRUE I FALSE BOOL
TYPE DECL I BINDING I TYPE PATTERN[
AtiY I HIDE I HEX
DECLTYPE, BINDINGTYPE
= GROUP[ATOM]
ANY for any type T
-- See § 2.2.4
-- See
§ 2.2.4
Table 2— I: The core language
A name not in a literal (or pattern. in the kernel)
denotes the value to which it is bound in the current environment
ENV (A
below). An ATOM literal
is a value which stands for a name in the primitives which deal with
declarations and bindings.
A
literal denotes a value according to a rule which depends on its syntax. The
core has only numeric and ATOM literals. and the primitives enumerated above.
An expression denotes a value according to a rule which
depends on its syntax. If the expression is a name or literal,
the value is the value of the name or literal. The remaining cases are
discussed in the following sub-sections. Most of these cases
define the value of the expression in terms of the value
of its sub-expressions. The sub-expressions may be evaluated in any order.
A.
The
current environment ENV
The
current environment ENV
is a binding. The value of the expression n is ENV.n. ENV for a sub-expression
is the same as ENV for
its containing expression, except that:
For the b of
a closure being applied, ENV is computed according to B below.
For the e of a FIX. ENV is computed according
to E below.
Thus applying a closure and evaluating a FIX are
the only ways to change ENV.
B. Application
The value of a standard application is obtained by
evaluating el and e2 to
obtain v1 and v,, and applying v1 to v2.
There are two cases for application:
v1
is a primitive. The value of the application is a function of v2
given in the definition of
the
primitive. The core primitives are defined throughout § 2.2.1. the Cedar
primitives in §4.
v is a closure c (C below), with domain declaration d, body b and environment E. The value
of the application is the value of the expression b in the environment
MKBINDD[d,
v2] THEN E
(E
below). Note that if the closure was made with A, the body must be type-checked
when it is applied: a closure made with A was type-checked
when it was made (C below).
Vet must be an arrow type. An
application type-checks if Ve, implies VerDOMAIN (G
below). The type of the application is obtained by applying Ve1.RANGE to
v2. In
simple cases. Ve1.RANGE is a
constant. For
instance. NOT: BOOL—>BOOL
has RANGE=A BOOL=>TYPE IN BOOL. However, the result
type may depend on the argument value. Thus
VMKBINDD.RANGE= A [d: DECL, v: d.T] = >TYPE IN [b: d]
so
that MKBINDD[[i:
INT]. 3] has type [b:
[1: INT]] to
go with its value [b–P-31].
C. Lambda
The value of a A-expression is a closure, which has three parts:
A domain declaration d. equal to the value of d1.
A body b. which
is the expression e (not the value of e).
An environment E. equal to the current environment ENV when
the A is evaluated. A A-expression type-checks if
d1 evaluates
to a declaration
d.
For any x of type d.T. Ve implies d2.T in
the environment MKBINDD[d. x] THEN E.
A A-expression type-checks if d1 evaluates to a
declaration: type-checking of the body is deferred until
the closure is applied.
D. Pairs,
groups and cross types
A pair is the basic structuring mechanism. MKPAIR[x, y]
yields the pair <x. y>. Bigger structures are made,
as in Lisp. by making pairs of pairs. When we are interested in the leaves of
such a structure. we call it a group and call the leaves its elements. A group has type
GRouP[7] if all its elements have type T or are NIL. A flat group
is a pair in which first is not a group, and rest is a flat group
or
NIL.
The type of a pair is a cross type: MKPAIR[x, y] has type TX U iff x has type T and y has type U. Cross types are made with MKCROSS, which
turns a GROUP[TYPE] (i.e.,
a group whose elements are types) into a cross type in the obvious way:
MKCROSS[NI L] = NILTYPE
MKCROSS[T] = T if T is
a type.
MKCROSS[ MKPAIR[x, y]]=MKCROSS[x]XMKCROSS[y]
Note
that MKCROSS of
a flat group is flat. CDOTG
goes the other way, turning a cross type into a GROUP[TYPE] in
which no element is a cross type. Thus MKCROSS is the inverse of CDOTG, but
not necessarily the other way around.
E. Bindings
A binding is either NIL, or an <atom, value> tuple,
or a <binding, binding> tuple. The primitive MKBINDD constructs
a binding from a declaration d and
a matching value v, i.e. (as the type of MKBINDD indicates), one with the
type d.T. The
resulting binding has type a and
consists of the names from d paired with the corresponding values from v. Example:
MKBINDD[ [x: INT, b: BOOL]. [3, TRUE]] = [x-3, b–TRUE]
= < <$X, 3>, «$b, TRUE>, NIL > >
In this example. d-r is INTXBOOL.
The declaration and
group in this example is written using the syntax of §2.2.2: in the core they
would be MKDECL[p—[Sx. Sb], r—imcRossllivr. BooLl] ] and kiKpain[firsi-3. rest—mKPAIRUirsi—TRUE. rest—NIL]] (where we have written the arguments of these primitives
in the kernel syntax).
The primitives BTOD and BTOV return the arguments of the MKBINDD primitive
that made the binding. MKBINDP is redundant: it is like MKBINDD, but
takes a pattern instead of a declaration, and hence accepts any
v with the right structure, regardless of the component types.
LOOKUP returns
the value of the name n in the binding. THEN combines two bindings,
giving priority to the first one in case of duplicate names. It
works only for flat bindings, in which the first element of each
<binding, binding> tuple is an <atom, value> tuple, and the second
element is another <binding. binding> tuple or NIL. The
value of b1 THEN b2 is another flat
binding, obtained
by first replacing any tuple <<a, v>, b> in
b2 where
a is
equal to an atom in b1 by b. and then using this binding to
replace the final NIL in b1.
The binding
constructor [(n–e), has
the value MKBINDP[p—In, ...], v–[e, ...] ].
FIX makes a recursive
binding: the value of FIX d1–e
is MKBINDD[d. v], where d is the value of d1 in
ENV and v is the value
of e in
the environment (LET
FIX d–e IN d–e) THEN ENV. Of course in general this
computation may not terminate: normally the names in d occur in e only in
the bodies of A-expressions, and in this case it does terminate. The FIX typechecks if De in the latter environment implies DTOT[d].
F. Declarations
A declaration is either NIL, or an <atom, type>
tuple. or a <declaration, declaration> tuple. The primitive
MKDECL constructs
a decl from a pattern p and a value 1
of type GROUP[TYPE]. A pattern is a GROUP[ATOM]. i.e.,
either NIL, or
an atom, or a pair of patterns: the ATOM elements must all be different.
An application of MKDECL
typechecks if t
matches p. i.e., if
both p and t are
NIL,
or
p is
an atom and 1 has
type TYPE,
or
p is
a pair [p1, p2]
and t is a cross type 11Xt, and pi matches ti and p2 matches t2.
The
resulting declaration consists of the names from p paired
with matching type values from I.
The primitives DDOTP and
DDOTT return
the arguments of the MKDECL
primitive that made the declaration. Thus
DDOTT[NIL]=NILDECL;
DDOTT[<$n,
7)] = T:
DDOTT[<dt. d2>]=DDOTT[d1]XDDOTT[d2]
DTOB is
redundant: it converts a declaration to a binding in which each name has the
corresponding type as its value. Thus DTOB[[x: INT, y: REAL]]=[x-INT,
jr-REAL]. The inverse is BTOD. also redundant:
it is defined only if all the values in the binding are types. THEND combines
two declarations just as THEN combines two bindings: V(b1 THEN b2)= Vbi THEND Vb2
G. Types and type-checking
A type is a value consisting of a pair:
the predicate,
a function from values to BOOL. the
cluster, a
binding.
A value v has type
T if
Ts predicate applied to v is
TRUE.
T implies U iff
(V x) T.Predicate[x]U.Predicate[x].
Typechecking
consists of ensuring that the argument
of an application has the type specified by the domain of the proc (B above). The body of
a A-expression can then be type-checked (or the implementation
of a primitive constructed) independently, assuming that the parameter
satisfies the domain predicate. Symmetrically, the result of an application can be assumed to have the type specified
by the range of the proc.
To
complete the induction. it is also necessary to check that the value of the
body of a A-expression has
the range type (C above).
The primitive types in the kernel are:
BOOL, with
two values TRUE and FALSE.
ATOM, with
values denoted by literals of the form $n. TYPE, a predicate satisfied by any type value. ANY, a
predicate satisfied by any value.
DECL. the
type of a declaration (F above).
BINDING,
the type of any binding.
Arrow
types, the types of procs (C above). An arrow type has a domain type and a range type.
Cross types, the types of pairs (D above).
GROUP[7],
the type of any pair in which all the elements have type T.
Declarations, the types of bindings (E and F above).
There are no non-trivial implications among any of these
types, except as follows:
DECLTYPE:
BINDINGTYPE: GROUP[I]TYPE.
TANY for
any type T.
T1XT2= U IX U2 iff Ti=Ui
and T2= U2.
GROUP[T]=.GROUP[U] if U.
T1-+T2U1-12 T1 and
(Vx: U1) (A T1
IN T2)[xl(X U1 IN L12)[x]. Note the
reversal of the domains.
d1 'd2for
declarations iff di.P=d2.P and
oToB[d1].nDTOB[d2J.n for
each n in d1.P.
2.2.2 Conveniences
Table
2-2 gives the syntax and semantics for kernel expressions. Most of this is
straightforward sugar. LET adds the binding el to
ENV in
evaluating e2.
The separate case for b, ... simply allows the
H which normally enclose a binding constructor to
be omitted in this case; see below. IF wraps e2 and
e3 in
A's so that they don't get evaluated: the iFPROC primitive chooses
the one to evaluate and applies it.
The dot notation has three cases.
For a binding it just looks up n in the binding.
For a type it looks up n in
the type's cluster.
For anything else, it looks up n in
the cluster of V'
e and applies the result to e. The
special LOOPUPC primitive
does something special if it finds a proc which takes more than one argument:
it splits the proc into one which takes the first argument and returns a proc
taking the remaining arguments. This ensures that if V e.n is
such a proc P, the
expression en[a b] will desugar into something equivalent to P[e a
b].
The usual syntax for application is a proc el followed by an
explicit binding constructor. The kind of application may depend
on the type of el, via
the APPLY
element of its type; for a proc applied by
the standard apply operator 4,
APPLY
is the identity. If e1 is
followed by an group rather than a binding constructor, the
argument is obtained by binding the group to the declaration which is ei's domain.
Infix operators desugar straightforwardly into
application: note that the choice of proc is determined by
the type of the first operand only. AND and OR are not ordinary infix operators. since they evaluate
no more than necessary; this is expressed by the desugaring into IF.
The
remaining expression syntax is various constructors, described below, and the
imperative and exception features described in the next two
sections.
expression = coreExpression
d1
—> d2
A(lel)(I=>e,)We31
LET el
IN e2
I
LETb....INeI
IF el
THEN
e2 ELSE e3
e
n
...1I
el infixOp e2 I
e1 AND e2 I el OR e2
I
PATT p I [ b, !.. 1 I
REc[(p
: t"'e)....]I
[
d. !.. ] X
X
statements
I simpleLoop but
infixOp
::=
X
PLUS THEN
literal :: = coreLiteral digit
digit ... I
declaration ::= p: t 1
[(p: I
binding :: =
10-el
d e I
pattern ::=
n I
[pi. -1
primitive
:: = corePrimitive
LOOKUPILOOKUPCI
PLUS
I IFPROC
ARROW
4 [di.
A
di = >DECL IN d2]
--
The domain defaults to [], the range to Vet (A Vei IN e, ) 4
ery ei
a binding I
LET [b....] IN e
iFPROC[Ve2. ei,
A IN e2.
A IN e311
[]
IF VeBINDING THEN LOOKUP 4 Eve. $n]
ELSE IF VeTYPE
THEN LOOKUP I [V e.cluster, $n]
ELSE ( LOOKUPC 4 [V e.cluster, $n]
) 4
[e]
el . APPLY 4 [b,
...] I el . APPLY I MKBINDD[Vei.DOMAIN, [e,....]
ei
. infix0a[e2]
IF e1
THEN
e2 ELSE FALSE 11F e THEN TRUE ELSE e2
1
NIL I mkPAIR[el,
[ (I
e2, )]-- Group constructor. I -- Pattern
constructor: see the rule for p below. I
b PLUS ... PLUS NIL I
FIX [p,
...] : MKCROSS[[t, ...11-[e, ...]
d PLUS ... PLUS NIL I
xxxxxx I --Also recursive d maps into this? --
See § 2.2.3
--
See § 2.2.4.
MKCROSS
INT -- Numeric literal,
giving the decimal representa
-- A d is not an e: a d must be before - or after LET or DECL. MKDECL[ PATT p, t]
[p, ...]: MKCROSS[[t ...]] -- to separate names and types
-- Only the [...] form is an e: a b must be written after
LET. I
MKBINDP[PATT p, e] MKBINDD[LI,
e]
-- Note: a pattern is not an e: it can appear only
before - or or after PATT in the kernel.
PATT n=$n
PATT [pi,
...]=[PATT pi, ...]
-- Fill in types
The precedence of operators in e is: (highest) a infixOps (all the same), BUT. IN (lowest).
All are
left associative.
Table 2 — 2: Kernel expression
syntax and semantics
Constructors
A bracketted sequence of expressions (e.g., [1, 2. 3])
denotes a flat group with its elements in the same order (e.g..
mKPAIR[1, MKPAIR[2. mkPAIR[3, Nit]]]. Thus a group constructor is just like the
LIST function in Lisp. A pattern is a similar construct,
except that it contains names which stand for the corresponding ATOM literals;
PATT yields the group obtained by replacing each name n by the literal $n. After desugaring a pattern
always appears after PATT
and hence is always desugared into
an atom or a GROUP[ATOM].
Brackets are also used to delimit binding and declaration
constructors. They are distinguished from each other, and from group
constructors, by the presence of – in each element of a binding constructor,
and : in each element of a declaration constructor. The elements of a binding
or declaration constructor are sugar for applications of the
MKDECL. MKBINDP and
MKBINDD primitives.
The constructor itself strings the resulting declarations
into a big one using the PLUS operator. which is just like THEN except that it does
not allow duplicate atoms; the motivation for this is to allow
the names and corresponding types or values to be written together, instead of
factored as the primitives require. As a result, values made
from constructors are always flat.
Note that these constructors do not nest, so they can only
be used to build flat values. The only exception is that a d can be [(p: ... ]. This is intended for the d–e form of binding; e.g., if DivRem returns two INTS, you can write
[d: INT, r. INT]-'DivRem[...] instead of [d, r]: INTXINT–DivRem[...].
The REC binding constructor is sugar
for FIX
which exactly parallels the non-recursive one.
2.2.3
Imperatives
These constructs are generally used together
with non-functional procs.
statements ::= e; } IF (IsvoiD[e]) AND ... THEN [] ELSE ERROR
-- Ordering by non-prompt evaluation.
simpleLoop ::= SIMPLELOOP statements LET REC [loop'–(X
IN { statements:
loop[] })] IN loop[]
-- Only an exception (such as EXIT) will
terminate the loop.
Each e in
the statements must evaluate to VOID, which is a distinguished null value; this is to catch mistakes
like writing x+1 as a statement. The definition of AND ensures that
the is are evaluated left-to-right.
The simpleLoop is the standard way to express a loop in
terms of recursion. You are supposed to use an exception to get out
of this loop; Cedar provides a number of convenient ways to do this, such
as EXIT and
RETURN.
2.2.4 Exceptions
An exception is treated as a special value returned from
an application. The exception value contains an exception code and an args value which may be of any
type. When an application sees an exception value, it immediately
abandons the application and returns the exception value; thus application
is strict. There
has to be some way to stop this, or the first exception would be the value of
the program. The HIDE
primitive takes any value and returns a variant record of
type HEX. It
turns:
a normal value into the normal variant, with the value in its v field;
an
exception into the exception variant,
with the code in its code field
and the arguments in its args
field.
UNHIDE
takes a HEX value and returns the
original unhidden value.
An exception code has the type EXCEPTION[7], where
T is
a declaration which is the type of the args: it is the domain of the exception, and (VEXCEPTION[T]).DOMAIN= T. An
exception value is constructed by the primitive
RAISE: [T:: TYPE, code: EXCEPTION[T], args:
T]
Thus
the args always
has the type demanded by the code.
This
is dressed up with the following syntax.
but ::= e BUT { butChoice; ...
butChoice ::
= el => e2
e e1.'...
=> e21
ANY
=> e2
LET
v"—HIDE[e] IN (
IF ISTYPE[v', HEX.normal] THEN UNHIDE[v]
ELSE IF ISTYPE[v', HEx.exception] THEN
LET h"—NARRow[v%
HEx.exception ] IN
LET selector"--11`.code
IN butChoice ELSE ... ELSE UNHIDE[v] ELSE ERROR
IF
selector' = ei THEN LET
MKBINDD[Ve1.DOMAIN, h'.args] IN e2
IF (selector =e1) OR ... THEN e2 IF TRUE THEN e2
A BUT
expression evaluates e. If it is a normal value, that is the value of the
BUT. If
it is an exception, each butChoice in turn gets a look at it. If
one of them likes it, then it supplies the value of the BUT: otherwise
the exception is the value.
The el
in a butChoice must evaluate to an exception code. If
there is just one, and it matches code
in the exception. then args in the exception is bound to the domain of the
code, and e2 is
evaluated in that environment. If there is more than one. then e2 is just evaluated
in the current environment. An ANY butChoice
matches any exception, but of course doesn't bind the arguments.
2.3
Values and computations
A computation in Cedar is the evaluation of an expression
in some environment. This section describes the kinds of values which can be computed by Cedar programs,
and the basic mechanisms for doing computations.
2.3.1 Application
The
basic mechanism for computing in Cedar is applying a proc to argument values. A
proc is a mapping
from argument
values and the state of the computation. to
result values,
and a new state of the computation. The state is the values of all the variables.
A
proc is implemented in one of two ways:
By
a primitive supplied
as part of the language (whose inner workings are not open to inspection,
but which is defined in § 4).
By a closure, which
is the value of a A-expression whose body in turn consists of an expression,
which may contain further applications of procs to arguments. e.g., A [x: INT] IN x+ 3. When a closure
is applied, the parameters declared
after the A are bound to the arguments, and then the body after IN is evaluated in
the new environment thus obtained.
In Cedar, each parameter value thus obtained is used to
initialize a variable, which is the object named by the
parameter in the body. Thus the body can assign to the parameters. Use of this feature
is not recommended.
Note that when a A-expression is evaluated to obtain a
closure its body is not evaluated,
but is saved in the closure, to be evaluated when the closure is
applied. Some constructs (IF, SELECT, AND, OR)
are defined (see § 2.2.2 and § 3.8) by wrapping
A-expressions around some arguments, and then applying them only
when certain conditions hold; e.g., IF b THEN f[x] ELSE g[y] evaluates f[x] iff b iS TRUE and g[y] iff b iS FALSE.
Application is denoted in programs by expressions of the
form fiarg, arg. mi. If
the value off
is a closure, this expression is evaluated by evaluating f and all the arg's, and then evaluating the body
of the closure with the formal parameters bound to the
arguments (unless an exception value turns up; see § 2.6.2).
Thus to evaluate (A [x: INT] IN x+3)[4]:
evaluate the A-expression to obtain a closure;
evaluate the argument 4 to obtain the number 4; evaluate
x+ 3 with x bound to 4 to
obtain the number 7.
The first two
evaluations can be done in either order (with different results in general,
though not in this case).
To evaluate a primitive application such as x+3, evaluate
the arguments, and then invoke the primitive on those arguments
to obtain the result and any state change. With a few exceptions (e.g., assignment
and dereferencing or following references), primitives are functions and can be
thought of as tables which enumerate a
result value for each possible combination of arguments. Invoking a primitive
can therefore be viewed as a simple table lookup using the arguments as the
table index.
Actually there may be one more step in an application. If
an argument doesn't have the type expected by the proc, the
argument is coerced to
the proc's domain type if possible. If no coercion can
be found, there is a type error. Coercion is discussed further in § 2.6.1 and §
4.13.
Most procs take a binding as argument, in which the
various parts of the argument are named. E.g., OpenFile: PROC[name: ROPE, mode: Files.Mode] takes a binding with two
values named name and
mode. It
might be applied like this: OpenFile[name–"Budget.memo",
mode–$read]. If the names are missing, there
is a positional coercion
which supplies them left-to-right, see § 2.3.6. There is also a defaulting coercion that
supplies missing parts of the binding: see § 4.11.
If
f is
neither a primitive nor a closure, the meaning of applying it is defined by the
APPLY proc
for its type: this case is discussed further in § 4.4.
There are many ways of writing applications other than
f[x]. In fact, many Cedar primitives cannot be the values of
expressions. and can only be applied by writing some other construct. The desugaring
rules show how large parts of the Cedar syntax denote various special kinds of application.
In each case. the meaning is defined by the standard meaning of application and
the specific meaning of the primitives involved: see § 4.1.
This is partly because of history.
and partly because specialized syntax makes the program more readable. Future evolution of the language will improve the situation.
Functions and order of evaluation
An expression is functional if
its value does not depend on the state, but only on the
values bound to its free names, and evaluating it does not
change the state.
As
a consequence of this definition,
Two identical
functional expressions in the same scope will always have the same value.
Note that a
functional expression must not depend on values contained in variables bound to
its free names. Thus. v.VALUEOF is not functional.
A proc is a function
if every application of it is functional. It doesn't
matter when or how many times a function is applied: the order of
evaluation doesn't matter for functions. Thus Cedar functions
can be thought of as mathematical functions for many purposes. Note that a
constant can be regarded as an application of a function of
no arguments.
Non-functional procs, on the other hand, are more
complicated objects. Cedar makes no formal distinction,
either in syntax or in the type system, between functions and procs. However,
it does not define the order of evaluation in an expression,
except that:
all arguments are evaluated before a proc is applied:
because
of the desugaring of IF, SELECT, AND and OR into A-expression. the order of evaluation
for these expressions is determined by the first rule:
statements separated by semi-colons are evaluated in the
order they are written.
As a consequence,
two applications of non-functions should not be written in the same statement
unless they don't affect each other: if this is done the effect of the program
is unpredictable.
An expression is guaranteed to be functional if it only
applies functions: thus if f is
a function, p a
non-functional proc, and x a variable, f[3] is functional and p[3] and p[ x] may not be. Furthermore. f[x] may not be functional, because
it is sugar for f[x.vALuEoF], and VALUEOF is not a function. The value
of a A-expression is a function if its body is functional. There are more
complicated ways of guaranteeing that an
expression is functional, just as for any other interesting property.
Because the values of variables constitute the
state. it is only the existence of variables that allows non-functional
procs to exist. In particular. the VALUEOF proc which returns the
value of a variable is non-functional (because its result depends on
the state), and the ASSIGN
proc which changes the value of a
variable is non-functional (because it changes the state).
2.3.2 Values
A Cedar program manipulates values. Anything which can be
denoted by a name or expression in the program is a value. Thus
numbers, arrays, variables, procedures, interfaces, and types are all values.
In the kernel language, all values are treated uniformly. in the sense that
each can be:
passed
as an argument, bound to a name, or returned as a result.
These operations must work on all values so that
application can be used as the basis for computation and
A-expressions as the basis for program structure. In addition, each particular
kind or type of
value has its own primitive operations. Some of these (like assignment and
equality) are defined for most types. Others (like addition or
subscripting) exist only for certain specific types (numbers or arrays). None
of these operations. however, is fundamental to the language. Formally.
assignment or equality has the same status as any operation on an abstract type
supplied by its implementor: thus INTEGER.ASSIGN has the same status as 10.GetInt. In practice, of
course, special syntax is usually used to invoke these
operations, and the implementations are not Cedar programs open
to inspection by the editor or debugger. A complete description of the
primitives supplied by the language can be found in Chapter 4, organized
by the type of the main operand. Table 4-5 is an alphabetized
index of these descriptions.
Restrictions on types, declarations,
bindings and unions: In current Cedar, however, there are restrictions
on values which are types, declarations or bindings: they can only be arguments
or results of modules, and hence are first-class values only
in the modelling language, and not within a module. Also,
declarations and bindings cannot be constructed or bound to identifiers within
a module. Unions are also restricted: they can only appear
inside records. Nonetheless, it is simplest to emphasize the
uniform treatment of all values, and consider separately the restrictions on
types, declarations, bindings and unions. Future evolution will
improve this situation.
Restriction on dot notation: In
current Cedar you can only use dot notation for some operations of built-in
clusters: the procs which access record fields, and others as noted in Table
4-5. As a substitute, there are various syntactic forms which are sugar for dot
notation: infix, prefix and postfix operators, built-in
functions, and funny applications. These desugarings are given in rules 20-24
of the Cedar grammar in § 3.
2.3.3 Variables
Certain values, called variables, can contain other
values. A variable containing a value of type T has type VAR T. If
the variable doesn't allow the value to be changed. the type is READONLY T: this is not the same as T, because there may be a VAR T value which is the same
container. The value contained by a variable (usually called the value
of the
variable) can be changed by assigning a new value to the
variable. The set of all variables accessible from the process array constitutes the
state of the computation: these are all the variables which can
be reached from any process. and a variable which cannot be
reached cannot affect the computation. Note that a variable value is a container, which like all
values is immutable; it may help to think of it as (the address of) a block of
storage. The contents of
a variable can be changed by assignment. Thus the value of a variable can
change. even though the value that is
the variable is immutable.
A suitable abstract
representation for a VAR T is
a value of type [Get: []–*T.
Set: T–>0].
This representation is not used in Cedar, but it
clarifies the way in which variables fit into the type system:
VAR TVAR U only if T and U have the same predicate,
because the Get proc requires TAU and the Set proc requires UST. READONLY T corresponds to [Get: 0–>T] and a write-only variable
type would be [Set: T– U].
There
is a coercion (an automatically applied conversion: see § 2.6.1) from VAR T to T. so that a variable
can be passed without fuss as an argument to a proc which expects a value.
Restriction on variables: In
current Cedar, variables generally cannot be passed as arguments or results.
The only exception is that an interface can declare a variable (called an
exported variable) for which an implementation supplies a value;
this is normally written x: VAR INT in the interface, but for historical
reasons it is also possible to write just x: INT. Certain primitives (e.g.. dereferencing
a REF or
POINTER) return
variables. a variable can (indeed, must) be passed as the first
argument to ASSIGN, and
a variable can be bound to a name by a declaration in a LET or block (LET x— I NT.NEW IN ... binds a VAR INT value to x). For the most part, however,
a program which wants to handle variables must do so at one
remove, through procs or REFS (or, unsafely,
POINTERS).
A variable is often represented by a block of storage; the
bits in this block hold the representation of its value. All
the built-in VAR types
are represented in this way. A variable u overlaps another variable v if assigning to u can change the value of v. The primitive ASSIGN procs have the property that
if r and s are REFs. then rr overlaps st
iff r= s.
For
any variables
u and v with the same VAR type, u overlaps
v iff u= v, provided that no unchecked program has given
overlapping blocks of storage to the two variables (if u and v have different types, one might be
contained in the other).
The role of variables in non-functional
expressions is discussed in § 2.3.1.
2.3.4 Groups
There is a basic mechanism for making a composite value
out of several simpler ones. Such a composite value is called a
group. and the simpler values are its components or elements.
Thus [3, x+ 1, "Hello"] denotes a group, with
components 3, the value of x+ 1, and "Hello". The main use of explicit
groups is for passing arguments to procs without naming them (these are
sometimes called positional
arguments). This is done by binding the group to the
declaration which is the domain type of the proc; the result is a binding
which is the argument the proc expects. Thus, with P:
[x: INT, y: R EA L]--q. . .1, the application
P[2. 3.14] is sugar for P[ [x: INT, y REAL]-'[2, 3.14] ], which is equivalent to P[x-'2. y-3.14].
A group has a type which is the cross type of its component
types: if x has
type T and
y has
type U, then
[x y] has
type TX U. Thus for syntactic types, V[ei, e2. ]=Ve/XVe2X ... The X type
constructor is associative, and type implication (§2.4.2)
extends to cross types elementwise. If the T
are types, there is a coercion called MKCROSS from [T1. T2.
...] to T1X T2X
...; because of this, the
explicit cross type is usually not needed.
Restriction on cross types: Current
Cedar provides no way of making cross types except as domain and
range types of a
proc type (or other transfer type); e.g.. PROC [INT, REAL]—+[BOOL,
ATOM]. There are no procs taking groups except the group-to-binding
coercions. Hence the only thing to do with a group is pass it to one of the built-in coercion procs by writing it as a
proc argument or to a record or array constructor as described in the
next section. Current Cedar does not have X, but it does
have the MKCROSS group
to cross type coercion described in the last paragraph and illustrated in
the example.
2.3.5 Bindings
A binding is a group in which each component has a name.
Thus, it is an ordered set of [name, value] pairs. There are
three main uses for a binding:
• As an argument in an application. Thus, if P is a proc with type PROC[i: INT, b: BooL], its argument
must be a binding such as [1-3, b—TRUE].
The application then looks like this: P[i-3, b—TRUE]. A binding argument is sometimes called a keyword argument list. See
the next section for details.
· In
a LET expression,
to give names to values in the scope of the LET. Thus,
LET 1-3. b—TRUE
IN (IF b THEN 1+5 ELSE 0)
has
the value 8. Current Cedar doesn't have LET expressions. but a binding at the beginning of
a block has the same effect. See § 2.5.4 on scopes for details.
· As
a way of collecting and naming a set of related values. A value can be
extracted from the set using dot notation. Thus if b is the binding [1-3, b—TRUE], the
value of b.i is
3. In current Cedar this only works for interfaces; see § 3.3.4
and § 4.14 for details.
A binding is usually denoted by a constructor, which
takes the form
[1-3,
b—TRUE]
or redundantly (if
there are no coercions)
[i: INT-3, b: BOOL—TRUE]
in which the types are
specified explicitly (but you can't write the second form as the argument of an
application). See § 2.5.5 on constructors for details.
2.3.6 Arguments
When a group or binding is bound to a declaration (d–v),
there are various conversions called coercions which may be applied to the values. This usually
happens when the arguments of a proc application are bound to the
parameter declaration.
First,
if v is a group rather than a binding, it is coerced to a binding by attaching
the names from d to
the elements of v in order. Thus in
[a:
[NT, b: REAL]–[2, 3.14]
the group
constructor is coerced to [a-2, b-3.14].
Next,
if v is shorter than d, elements
of the form n—OMITTED
are appended. where n is the corresponding
name from the declaration. Thus in
[a:
[NT, b: REAL]—[2]
the
group constructor is coerced to [a-2, b—OMITTED].
Now
the items of the binding are matched by name with the items of the declaration.
There is an error unless the names match exactly. The remaining
coercions are done on individual items. n: from the declaration and the
corresponding n–v from the binding. If v has a type implying 1. all is well.
Otherwise, if there is a sequence of
coercions from the type of v to 1.
these are applied to v. If
no such sequence exists, there is an error. In particular,
there is a coercion from OMITTED to the default value for :, if any.
Thus in
[a: INT -0, b: REAL4-1.1]—[b-3.14]
the group
constructor is coerced to [a-0, b-3.14], and in
[a:
INTO-0, b: REAL4-1.1]–I]
it is coerced to [a-0, Coercions are
discussed in § 2.6.1 and § 4.13, defaulting in § 4.14.
An
important special case is constructors for record and array values. A record
type has a construction proc; e.g.,
TYPE•RECORDIa:
INT, b: REAL.-0.01
has
a proc R.CONS of
type PROC[a: INT, b: REAL€0.]—)[R]. Thus R.CONS[a-2, b-3.1416]
constructs a record value. There is also a coercion from BINDING to
the particular declaration RB which
is the domain type of R.CONS, so that
rl:
R4-[a-2,
b-3.1416]
is
short for
rl:
R4- R.coNs[a-2, b-3.1416].
Composing the
positional coercion from GROUP to RB with
R.CONS makes
rl:
R<-[2.3.1416]
also
short for the previous line.
The same scheme works for arrays. but only an array
indexed by an enumeration has a corresponding binding which
can be written; the elements of an array indexed by numbers don't have
names which can be written in a binding. However, the group constructor still
works.
2.4
The type system
This section describes the way in which types can be used
to make assertions about the program which the compiler can verify. It also
discusses the role of types in organizing the names of the program.
2.4.1 Types
Types serve two independent but related functions
in Cedar:
· A
type contains an assertion about some property of a value, e.g.. that it is a
whole number between 0 and 10 represented in a single machine
word. A value which has the property is said to be of that type, or to have that type.
The assertion part of a type is called its predicate. It is a function
which accepts a single value (of any type) and returns TRUE iff the value
satisfies the assertion. In principle the predicate can be applied to
any value at runtime, but in practice a lot of optimization is done
by the compiler.
· A
type contains a collection of named procs (and perhaps other values) related in
some useful way. Most often, the procs of type T take a value of type T as their first argument. For
example, INT has
PLUS, TIMES and
MINUS procs
(usually written as infix or prefix operators) which can be
applied to INTS. The
dot notation (see § 2.4.4) makes it easy to refer to the procs
in a type's collection.
The collection part of a type is called its cluster. It is simply a binding. No
rules are enforced about what kind of values are in the binding.
However, the idea is that the cluster is an interface for
manipulating values of the type (perhaps the main or even only interface).
As with any interface, a tasteful choice of names and values is important.
The predicate and the cluster serve rather
different purposes:
The predicate provides the basis for type-checking (§
2.4.2). The most important function of type-checking is to
guarantee the integrity of abstract data types: this is done with basic predicates
called marks (§ 2.4.3).
The cluster provides the basis for convenient naming of a
large collection of procs and other values (§2.4.4).
Clusters are organized into a hierarchy of classes (§ 2.4.5).
Like everything else which can be named, a type is a
value. Hence there is nothing special about binding a type
value to a name. If T is
a type expression, the binding
U: TYPE— T
binds
Ts value to U. In
the scope of U, T and
U are
completely interchangeable (provided T is not rebound). Furthermore, with
two exceptions, all type expressions are functional: identical type expressions
in the same scope denote the same type value. The exceptions are the record and
enumeration type constructors, which make a distinct type
each time they are used (by constructing a new mark: see § 2.4.3).
Restriction on uses of types: Current
Cedar has a number of restrictions on the use of TYPE values, given in §4.8.
2.4.2
Type predicates and type-checking
Type predicates provide a way of making assertions in the
program which can be checked mechanically. These
assertions take the form of declarations for the formal parameters of procs. In
general the checking must be done during execution. Thus,
if the program says
a: ARRAY [0..10] OF INT4-ALL[0]:
is INT4- s.Readl
ni:
s.PutF[ a[i]]:
there
must be a check that i>0 and
1<10 just before the expression a[i] is
evaluated. This is called a bounds
check: if it fails there is an exception called Runtime.BoundsFault. Where did this
check
come from? Note that aid is short for Va.APPLY[a. d.
and Va.APPLY is
SUBSCRIPT, the
subscript procedure for ARRAY [0..10] OF INT. The type of SUBSCRIPT iS PROC[array: VAR ARRAY
[0..10] OF INT, index: [0..10]1—qvAn
INT]. So when i is
passed as the index
argument. the declaration of SUBSCRIPT says
it must have the type [0..10]. The predicate for this type is
[x: ANY] IN HASMARK[x, INT] AND LET
y—NARROW[x. INT] IN y>= 0 AND ye= 10.
Leaving the HASMARK term
for later discussion. we see that the rest of the predicate is the same as the
bounds check.
The type system is designed, however, so that most
assertions can be checked statically
(i.e., proved), by examining the text of the program without
running it. Static checking has three obvious advantages:
It
reports any errors after a single examination of the program, leaving none (of
this kind) to be discovered later in Peoria.
It introduces no cost in time or space for run-time checking.
The compiler can take advantage of the assertions to
generate better code.
Of course, there
is a corresponding drawback: the assertions made by parameter declarations must
be simple enough that the compiler can reliably prove or
disprove them.
The
proofs done for type-checking have exactly the same form as program correctness
proofs based on preconditions and postconditions. Consider a
proc whose value is the A-expression
A[x:7]=>[y:U]INe.
The
domain declaration [x: 7]
is a precondition for the body e. This means that any
application of the proc must satisfy this condition. As a
consequence, the body e can be analysed on the assumption that
the precondition holds, i.e., that x has type T. Similarly, the range declaration [y: U] is a postcondition for the
body. This means that given the precondition, any evaluation of e must
produce a value y which has type U. In summary, for the body we
assume the
precondition and must establish
the postcondition.
To make this hang together, each application must establish
the precondition; this means that the argument must have the domain type. In
return, the application can assume the postcondition; this means
that the result of the application has the range type. Thus we have a linkage:
argument
domain range result
The result in turn
will be the argument of another application. In this way the proof is extended
to larger and larger expressions, and finally to the whole
program. In summary:
Application — establish pre-condition: arguments have the domain
type;
rely
on post-condition: results have the range type.
Body — rely
on pre-condition: parameters have the domain type;
establish post-condition: returns have the range type.
These proofs require showing that an expression
always has a particular type T. This is done by observing
that every expression has a unique syntactic type
U, which is the type of every evaluation of
that expression; e.g.. an application always has the range type of its proc
(see below for a more detailed discussion of syntactic type). If every value of
type U has
type T. we
are done. Hence the usefulness of type implication. One type implies another.
iff (V x)
T[x] U[x]; sometimes we say that T is
a sub-type of
U. If
two types are equal, each implies the other. However, there are many
other useful cases of implication. For instance. VAR INT implies READONLY INT. The
type implications in current Cedar are given in § 4.12.
Of
course, not all arguments are applications. The kernel grammar gives the other
possible forms of argument expressions, and we enumerate the proof
rules for each:
A
literal is like a zero-argument proc: it has a known range (e.g.. 3 has type INT, 'A has type
CHAR).
A name has the type specified in its declaration or
binding.
If there is only a declaration n: T (e.g., x: INT), it
must be the domain declaration
of
a A-expression, and we have already seen how to ensure that the n's value has type
T when
the resulting proc is applied.
If
there is a binding n: T—e for
the name (e.g.. x: INT-3),
we must check that e has type T.
A A-expression A [x: 7]=>[y: U]
IN e has the type [x: U].
This works for the reason
discussed in the next paragraph.
A
binding constructor [x—e. y—A has
the type of the corresponding declaration. [x: V e. y: VA.
There is one more link in the chain. An application f[x] has an arbitrary expression
for f not
necessarily a A-expression. The requirement is that f must have a proc type. say D—4R: D is the domain
type and R the
range type. Since the type of A D=>R
IN e is D-.R, satisfying
the precondition D
for the application is the same as satisfying the
precondition D for
the A-expression, and similarly in reverse for the postcondition.
The value of f may
be a primitive rather than a closure obtained from a
A-expression. In this case, the implementation of the primitive can still depend
on the precondition and must still establish the postcondition, but since the
implementation cannot be examined (within the framework of
Cedar) we can say nothing about how this is accomplished.
Example: INT.PLUS, which
is implemented by the machine's 32-bit add instruction.
In a proc type D—*R,
D and R may
be declarations which provide names for the arguments and results.
In general. the expression R may
include names declared in D. The
range type of an application then depends on the argument values.
Restriction on dependent proc types:
In current Cedar only a module has a type whose range depends
on its argument values; the type returned by an interface, or the interfaces
exported by an implementation, may depend on the interface and
implementation parameters.
As
a by-product of the type-checking proof rules just given, a syntactic type is derived for
every expression e in
the program. It is denoted by De. and
computed as follows:
for a name, the declared type; for
a literal, its type;
for an application, the range type (which may depend on
the argument);
for a A the obvious proc type;
for
a binding constructor. the declaration obtained by pairing the names with the
syntactic types of the value expressions.
Typechecking ensures that whenever e is evaluated, the resulting
value will have type De (though
it may have other types as well, i.e., it may satisfy other predicates). The
main use of syntactic types is in connection with dot
notation (see § 2.4.4).
In order to carry out the proofs described above, the
compiler must either compute the values of all types, including
those denoted by complex expressions such as ARRAY [i...j] OF INT, or
it must be able to prove the equality of unevaluated type
expressions. For the most part. current Cedar requires the former
approach: hence a type expression must have value which the compiler can compute.
Such a value is called static: the
rules for static values are given in § 3.9.1.
2.4.3 Marks
By
this point you may have thought of asking why the assertions provided by type
predicates are worth all this fuss. The reason is simple: they
are the basis for authenticating values of an abstract type. so
the implementation can be sure that it is working on properly formed values. Suppose
you are the implementer of an abstraction. e.g.. Table. You provide operations to Lookup a key in the table,
to Insert a
[key. value] pair, and to Enumerate
the items in the table. A Table is implemented as a
REF to
a record containing a sorted array a
of items and an INT n which
gives the number of
items. Lookup is
implemented by binary search. All three operations are programmed on the assumption
that elements 0 through n-1 of
a are
sorted, and that n is
smaller than the size of the array. They will not work
properly if these assumptions are not satisfied, and indeed they may try to
subscript the array with an out-of-bounds index or to violate other
requirements of the abstractions they depend on.
Here
is a lower level, but perhaps more dramatic example. The dereferencing
operation t for
a REF REAL returns
a VAR REAL, which
can, for instance, be assigned to, as in the program fragment
r: REF REAL—NEW[REAL(-1.0];
rt.
<-
3.14159
A
REF REAL is
represented by the address of a four-byte block of storage which holds a REAL, and
the assignment to rt stores the four bytes which represent 3.14159
into that block. If somehow a REF BOOL finds its way into r. the assignment will still
store four bytes, since it doesn't know any better. But the REF BOOL points
to a two-byte block: the other two bytes that will be modified belong
to some unrelated variable, which will be clobbered without warning.
The second example is scarier because the consequences of
the bug seem more unpredictable. In both cases, however, the fundamental
problem is the same: even if the implementation is correct, the
wrong thing happens because it is given an improper value to work on. Or to
make the same point in different
words, the implementation cannot be held responsible for bad results from one
of its operations, if it has no control over the validity of the arguments it
receives.
So that the implementation of an abstraction can take
responsibility for correct operation. there must be a way to authenticate a value of the
abstract type. In Cedar this is done by placing a mark on the value: think of it as
a little flag stuck into the value. The mark uniquely identifies the abstract
type, and authority to affix it is under the control of the implementation. A
correct implementation will mark only values which have the
properties needed for a representation of an abstract value, and
if no one else can affix the mark, the implementation can be sure that every value
with the mark has the desired properties.
A mark can be thought of as an abbreviation for an
assertion or type invariant which
characterizes a proper abstract value, such as Table or REF REAL. Such
an assertion can be quite complex. In the Table example, it would say that the representation is
a record of the proper form, that n
is less than the array size, and that the first n array elements are sorted.
In the REF REAL example,
it would say that the address points to a block of storage
such that at least the first four bytes don't overlap any other
blocks. Such assertions are not easy to write down formally, and proving them
is certainly beyond the power of any existing program. So the
abbreviations are not a mere convenience, but a necessity.
A new mark can be created on demand by the primitive
CREATEMARK: PROC[Rep: TYPE, tag: UNIQUEID]—>[m: MARK, Affix: [Rep]— [TYPEFROMMARK[m]] ]
The primitive HASMARK tests a value
for the presence of a mark, so HASMARK[x. m] tests
x for
the presence of the mark tn.
Affix adds the mark to a Rep value.
Restriction on marks: MARK, UNIQUEID, CREATEMARK, HASMARK
and TYPEFROMMARK are
not accessible in current Cedar. Record and
enumeration type constructors provide some access to CREATEMARK, as
described below. The 1STYPE
primitive, also described below, is closely related to
HASMARK.
With these facilities, it is easy to create a new abstract
type. Choose its representation type, and obtain a new mark m. TYPEFROMMARK[m] with an
appropriate cluster added is the new abstract type. The
implementation must use Affix to
mark only values which satisfy the properties it demands.
The
type returned by TYPEFROMMARK[m]
has the predicate
A [x:
ANY] = >[BOOL] IN HASMARK[x. m]
and
an empty cluster. Except for subranges and bound unions, all types in current
Cedar have a predicate of this form. The built-in types (INT. BOOL etc.)
come with such predicates, and the built‑
in type constructor procs (ARRAY, RECORD etc.) obtain a mark from CREATEMARK. So
that two invocations of ARRAY [0..10] OF INT will produce the same type, ARRAY and
most of the other constructors use a canonical encoding of the
constructor and its arguments for the UNIQUEID, and hence
are functional. RECORD
and ENUMERATION produce a different type each time they are invoked,
so they obtain fresh unique identifiers. Since the program cannot invoke CREATEMARK directly.
we need not explain how to prevent forgery of UNIQUEIDS. Future versions of Cedar will address
this problem.
In current Cedar you make a new abstract type by
declaring it as an opaque type in an interface:
T: TYPE[ANY]
This
generates a new mark, and declares T to be a type which has that mark. You get such a
type by explicitly painting some other type, normally in an
implementation which exports T to the interface which declared it:
T:
PUBLIC TYPE—
Interface.T PAINTED RECORD [...].
See
§ 4.3.4 for more details.
The implementation actually stores a mark with each
variable allocated by NEW.
Such a variable can be referenced by a REF. and in particular
by a REF ANY value.
The type of a REF ANY value
can be tested at runtime using the primitive
ISTYPE: PROC[x: ANY, U: TYPE]—>[Boot]
If
V e is
REF ANY and
RT= REF T, then the value of ISTYPE[e, is
TRUE iff
the predicate for T
just tests for mark m. and xt has the mark VAR m. ISTYPE is described in detail in §
4.3.1. along with the WITH
... SELECT construct and the NARROW primitive, which are more
powerful operations built up from ISTYPE.
For other values, there is no mark actually stored:
instead, types must be computable statically using the methods
described in the last section. The AMTypes
interface, however, gives a way to refer
to any value in a uniform way, and to test its type at runtime.
There
is only room for one mark on a variable, and this must encode all the marks
that the value actually carries. We arrange for this by
imposing a partial order on the marks, and requiring that:
The set of marks on a value must have a maximal element.
Every mark
smaller than the maximal one must be on the value.
With these rules, a single mark stored on the value is
enough to code all the others. In current Cedar. a value actually has only one
mark, since:
The
only way to create a new mark is with the record or enumeration type
constructors, or by declaring an opaque type.
When
you paint a type T
with the mark of an opaque type, T must
be a record or enumeration type, and the opaque type mark replaces the mark it had before.
Note
that VAR T, READONLY T and
T are
different types with different marks, although VAR TREADONLY T, and
there is a coercion VALUEOF
from either one to T.
2.4.4 Clusters
and dot notation
It is convenient to associate with a type the procs
supplied by its implementor for dealing with values of the type.
This is done by putting these procs into the type's cluster. The cluster is
simply a binding which is part of the type value (the predicate is
the other part). There are no rules enforced about what goes
into the cluster. However, there is a special dot notation which makes it
desirable to populate Ts cluster with procs which take a T as
their first argument. The usual effect is like this: Ln is
sugar for Vt.n[1], and t.n[other args] is sugar for V t.n[t, other
args].
For example, if t has type T, and
a proc [T, INT]-0[Boot.] is in Ts cluster under the
name P, then
the proc can be applied by an expression like t.P[3],
which is sugar for V' t.P[r, 3].
The name P is
looked up only in Ts cluster, not in the current scope. If
Q: [T]–[INT]
is also in the cluster, it can be applied with t.Q. which is sugar for V t.Q[d.
The general rule that makes this work is the following: t.n is
sugar for LooKuPc[Vt, $n][t]. LOOKUPC[Vf. $n]
is just V t.n. except that if Vr.n is a proc that takes several
arguments. it is split into a proc that takes the first argument and
returns a proc taking the remaining ones. Thus LookuPc[Vt. $ n][t] will
be a proc taking the remaining arguments. and tifi[other args] = LOOKUPC[V r. $n][t][other
args] will be the same as V t.n[t, other args].
Dot notation can also be used to obtain values from a
binding or from the cluster of a type without any application: T.P would be the proc named P in the previous example. The
possible cases of dot notation in current Cedar are described in detail
in § 4.14.
Restriction on constructing
clusters: There is currently no way to explicitly
construct clusters. The built-in types and type constructors have clusters:
they are described in detail in § 4. In addition, there is a clumsy way to
provide a cluster for an opaque or record type in an interface: every proc name in the interface
is put into the type's cluster. For a record. the procedures supplied by the record
constructor are also in the cluster. and they win if there are name conflicts.
There is one of these clusters for each type in each imported interface value: if
a module imports more than one value of the same
interface, however, there are severe restrictions (see § 3.3.3).
2.4.5 Declarations
A declaration is the type of a binding. Thus, the binding
[x-3, y-3.14] has
the type [x: INT. y: REAL]. All
the relationships among types, and between types and values, are carried over elementwise
to decls and bindings: the elements are matched up by name rather than by
position. A decl itself simply has the type DECL.
A decl is made up of two parts: the names or pattern, and the types. The basic
operation for making decls. MKDECL, takes a pattern and a type.
Thus MKDECL[ PATT[x. y], INTXREAL]= [x: INT, y: REAL]. In
general, a pattern is one of NIL, a simple name, or a pair of patterns. just like a Lisp S-expression.
Similarly, a type argument to MKDECL is one of NIL, a type, or a cross type. The type must decompose in a way
which matches the pattern. Normally. as in Lisp, we deal only in flat patterns.
where the first element of a pattern is always a name. Such flat patterns are
conveniently denoted by constructors of the form [x, y. ...]. The
reason for defining things in terms of pairs is that it makes it
much simple to write down precise rules for the semantics, using structural induction
on the values.
The main use of a decl is to type-check a binding. The
basic binding constructor is MKBINDD[d, e],
where d is
a decl and e is
matching group or binding. If e is a-binding. then its structure and names must
match the structure and names of d,
and each element of e must have the type demanded
by the corresponding component of d, after
a possible coercion. Thus MKBINDD[[x: INT, y: REAL].
[x-3. y-3.14]1= [x-3, y-3.14]. This
may seem pointless, but it has two important uses:
Such a binding is used to bind the argument of a proc to
the domain declaration. Even though the resulting
binding is the same as the argument, the type-checking is essential.
There may be coercions involved, so that the resulting
binding is not the same. Coercions on the component values are
discussed in § 2.6.1. There are also coercions on the binding itself,
which can default missing elements: these are discussed in § 2.3.6.
If e is a group, it is first coerced to a binding by attaching
the names from the decl, as discussed in § 2.3.6. Thus in MKBINDD[[x: INT, y: REAL], [3.
3.14]] the second argument is coerced to [x-3. y-3.14], and things then proceed as
before.
Bindings may also be used in LET expressions. Here
the types are often redundant, and it is better to use the MKBINDP primitive
to bind the value directly to a pattern. The syntactic type of the result is
the decl whose type is the syntactic type of the value. Thus [x-3. y-3.14] is
short for
MKBINDP[PATT[x. [3.
3.14]]: its syntactic type is MKDECL[[x. y]. V[3,
3.14]]= MKDECL[[x. y].
INTXREAL]=[x: INT,
y: REAL].
A decl D in
a block is interpreted somewhat differently. It becomes the argument of the NEWFRAME primitive.
which turns the type of the decl D.T into the corresponding vAR type VT= D.T.MKVARO, allocates a new value v of
type VT, and
makes the binding MKBINDP[D.P,
v] over the scope of the block. Thus
lx:
INT: y: REAL: S}
becomes
LET [x
y]—[VAR INT. VAR REAL].NEW IN S
Here D=[x: INT: y: REAL], VT=[VAR INT, VAR REAL], and
v= [VAR INT, VAR
REAL].NEW. Note that
the types might
have defaults, which are used to initialize the variables as part of the NEW operation.
Actually this is a bit oversimplified, since NEWFRAME has
to separate the bindings in the block from the decls.
construct the variable binding just described from the decl. and then combine
it with the binding from the block. Thus
ix: INT: y: REAL; z: BOOL—TRUE: S}
becomes
LET
[x. y. z]—'([VAR
INT, VAR REAL].NEW PLUS [TRUE]) IN S
or more readably
LET x—VAR
INT. y—VAR REAL. z: BOOL—TRUE IN S
Anomaly about uninitialized names or
variables: In Cedar the names in a block are introduced recursively,
so that the d's and b's can refer to each other. It is possible for a binding
or type to refer to a value which has not yet been initialized, with
undefined results. See § 3.4.1 for a further discussion of this
point.
2.4.6 Classes
Another important use of a declaration is to characterize
the cluster of a type. Since the cluster is just a binding, it is characterized
by its type, which is a decl. When used for this purpose, a decl is called
a class. See
§ 4.1 for further discussion of classes, and an enumeration of the primitive
classes of Cedar.
2.5 Programs
This section describes how meaning is assigned to kernel
programs.
2.5.1 Structure of programs
A kernel program is an expression, which is either atomic
(a name or literal), or is an application which involves
sub-expressions: the proc being applied, and the arguments. The concrete syntax
treats certain kinds of expressions specially: modules,
blocks (which introduce new variables and return no value), and
statements (which return no value). All desugar into simple expressions, however,
and are treated identically in the kernel.
2.5.2 Names
A
name is a part of a program which usually serves to denote a value. There are
two contexts in which the occurrence of a name n denotes a value:
It may occur as an expression. Then n denotes the value bound to
it in the scope in which the expression appears (see § 2.5.4 for
details).
It
may occur after a dot. as in en. Then
the expression e.n denotes
the binding for n supplied
by e (see
§ 2.4.4 and § 4.14 for details):
the value bound to n in e, if e is a binding:
the value bound to i? in the cluster of e. if e is a TYPE; roughly
(V e). n[e] otherwise.
There are also two defining contexts for a name n (see
§ 2.5.5 for details):
It
may occur before a – in a binding constructor, as in n--e. The value of e is
the value bound to n in the binding denoted by the constructor (see §
2.3.5 for details).
It
may occur before a : in a declaration constructor. as in n: t. The
value oft is the type of n in
the declaration denoted by the constructor (see § 2.4.5 for details).
These constructors are usually recursive in Cedar: that is. the
expression n elsewhere
in the constructor denotes the value bound to n in that constructor: see § 2.5.6 for details. In the
kernel they are non-recursive unless preceded by REC.
A name is not a value, but there are values of type ATOM which
are related to names. An atom has a print name which is a rope
(an immutable sequence of CHARS). A name following a $ is an atom literal:
$n denotes
the atom with print name n. Other
properties of atoms are described in § 4.5.1A.
Caution on names: Current
Cedar has several complications in its treatment of names:
•In
an argBinding27, n: e may be written instead of n- e. The
syntactic context distinguishes this from a declaration, but
this usage is not recommended.
An argBinding is not recursive: in {a-1; f[a-3. b–a+ 1]}
b is
bound to 2. not to 4.
The declaration in an import list is non-recursive: IMPORT M is short for IMPORT M: M. and the second M denotes its
binding in the surrounding scope (i.e.. the binding supplied by
the DIRECTORY). Inside
the body of the module, of course, M denotes the imported parameter.
Names which appear in an enumerationTC54 are
treated specially: see § 4.7.1A for details.
2.5.3 Scope
A scope is a region of the program in which all names
retain the same meanings (note that many names denote variables,
which can change their values in
the same scope. but each name continues to denote the same
variable). In the kernel there are only three constructs which introduce a new scope.
X, LET and
REC. In
current Cedar, these are sugared in a variety of ways: modules, import lists,
proc bindings, blocks, exit labels, open. iterators, safeSelects and
withSelects. All have straightforward desugarings, however.
2.5.4 Constructors
The
kernel has constructors, denoted [...]. to make expressions which denote group,
decl and binding values more readable. There is one flavor of
constructor for each class:
A binding constructor is a list of binding elements (b in the kernel
syntax) of the form I,– e or
el– e. The
presence of the – distinguishes it from the others. Here d is a decl element (not a declaration), and p is a pattern, in which the
names are being defined rather than evaluated.
A
decl constructor is a list of decl
elements (d in the syntax) of the form p: t. The presence of the :
without any – distinguishes it from the others. Again. p is
a pattern.
A
group constructor is a list of expressions. Note that decl and binding elements
are not expressions,
although constructors are expressions.
Constructors are useful for making decls and bindings
where the names are literal. This is the normal case, and in fact the
only case in current Cedar. If you want to make them out of other decls,
for instance to bind an expression to a decl which is the value of a name dn, you cannot use a constructor:
[dn-'e] would bind the
value of e to
the name dn, not
to the decl which is its value. You have to write the
decl-constructing primitive directly: MKDECL[d, e].
The only kinds of constructor you can write in
current Cedar are:
Decl constructors for proc domains and ranges, and for
records and unions (fields43 in the syntax).
Binding constructors for arguments in an application, or
as an expression alone if a record or array value is needed
(argBinding27 in the syntax).
2.5.5 Recursion
In the kernel, you get recursive definition of names only
if you write PEG (or
the unsugared form Fix)
explicitly. In Cedar, on the other hand, decls and
bindings are normally recursive, except for argBindings and
import lists.
The recursion is legal in a block or interface body
(although anomalies are possible in some cases when names are used
before they are defined: see § 3.4.1). In fields it is illegal.
2.6 Conveniences
The facilities described here are not
fundamental, but they are of great practical importance.
2.6.1 Coercion
A coercion is a proc which is automatically applied under
some circumstances to map a value of one type T (called the source) to a value of another type U (called the dest), e.g. from [O..5) to INT. Coercions are
obtained from the clusters of the types involved. The coercion mechanism adds
no new functionality, since the programmer could always write
the applications himself, but it is important in concealing some
of the distinctions made by the type system when they are distracting rather
than helpful.
There is exactly one (desugared) context in which a
coercion is applied: when an expression e of syntactic type T appears as an argument in an application which
expects a value of type U: this means that there is a
binding n: U"e.
Since nearly all Cedar constructs are desugared to
application, coercions are widely applicable. The only
(desugared) context in which there is no coercion is for the
first operand of dot, since in that case the cluster of the operand is used to
interpret the name which is the second operand. Thus in the expression e.n, it is always V e. the
syntactic type of e, that
is used to look up n, regardless
of the fact that this expression may appear as an argument to a parameter
of type U. If
e is
not a type or binding, however, then e.n desugars to P[e], where P= LOOKUPC[V e.Cluster, $n], and in the
application of P, e does
appear as an argument and can be coerced. Usually the cluster
for T is
set up with procs which take an argument of type T, so the domain of P is De and no coercion happens.
This isn't always true, though: a subrange T of INT inherits the arithmetic procs of INT, for
example, and there is a coercion from T to INT when PLUS is applied.
If
it is sometimes natural to think in terms of a coercion from T to U that is implemented by the
identity function. In fact, implication is stronger than that, since it
propagates through many type constructors, including PROC, while
coercion does not. Implication is discussed in § 2.4.2 and § 4.12.
There is a rather general rule for finding coercions from
the clusters of types, though it is not of much practical
importance in current Cedar, since there is no way for the user to define
coercions. The rule goes like this. Each cluster may have a From item and a To item. T.From should consist of pairs
with type [tag: ATOM, proc: T-+ U1, and
T.To of pairs with type [tag: ATOM, proc: U-+71. Ignore the tags for the moment. Consider the
binding n: U-'e. where
V e= T, and
TA U is
false. For each proc P in T.From or U.To we try n: U- P[e.].
If P: T-> V is
in T.From, it maps e to a value of type V, and we have to bind n: U- P[e]. If we are done: otherwise we
can recurse on this sub-problem.
If P: V-0 LI is in U.To,
we have to bind m: If V we are done: otherwise we can
recurse on this sub-problem.
The whole process fails if no path of coercion procs
takes us from T to
U. The
search can terminate when all paths have been explored, and a
particular path can be abandoned when a type appears on
it for the second time. Since the search is done statically (by the compiler),
and since the results of an attempt to coerce T to U can be cached, the time
required for the search is not a problem.
There are two obvious difficulties with this scheme.
First. it may transform erroneous applications into legal ones, by coercing an
argument in ways not intended by the programmer. Second, more than one path of
coercion procs may exist, and different paths may give different results. The second
difficulty can be avoided, and the first minimized, if every coercion proc P is chosen so that it
has a (partial) inverse, and P-1[P[x]]=
x for all x in P.DOMAIN. This says that a coercion
does not lose information. and that different paths give the same
answer. Sometimes this is not feasible. e.g. for the narrowing coercion from INT to [0..5). The
following rule gives the builder of clusters control over
proliferating coercions:
If two procs on a coercion path have non-Nit. tags, they must have the same tag.
In general, coercions that don't lose information can
have NIL tags,
and others should have different tags.
The coercions in current Cedar are described in § 4.13.
All have NIL tags.
and none loses information except the subrange narrowing. Note
that coercions extend componentwise to groups and bindings.
2.6.2
Exceptions
The basic idea behind exceptions is to extend the value
space, so that it includes not only ordinary values, but also a set
of exception values. An
exception value has the special property that whenever it
appears in an application, it becomes the value of the application, so that it
propagates up through the control stack of the program until it
finally becomes the value of the whole program. Of course this
isn't always what is wanted, so there is a special HIDE construct which is
not an ordinary application, but takes its argument value,
ordinary or exception. and bundles it in a variant record which is a normal
value. Then ordinary code can be used to test for the exception and take appropriate
action. This construct is sugared to give distinctive ways of catching an exception: in the kernel
with BUT (§ 2.2.4),
and in Cedar with ENABLE,
EXITS and REPEAT (§ 3.4.3). Cedar has two kinds
of exception: GOTO labels
and ERRORS, which
must be raised and caught separately. and have slightly different
semantics.
The main point of this treatment is that it does not
require continuations or any other baroque explanation of how control is
transferred to catch an exception. The view is that exceptions are simply a
convenience feature: the same job could be done by returning a slightly larger
result from each proc, with an appropriate status code.
An exception consists of a code and an optional argument value. The type of the code
is ERROR T, where T is the type of the argument
which goes with it. GOTO labels always have
empty arguments. The argument is a way of passing some information
along in addition to the identity of the exception.
|
A proper treatment of exceptions in the type system
would require that each proc the exceptions that can
emerge from an application of the proc. In fact, this is not possible in
current Cedar. Cedar also has signals, which historically were viewed as a kind of
exception but different interpretation. as a way of obtaining
dynamic rather than static scoping are discussed in § 3.4.3A. |
range include all required or even now have a very for names. They |
2.6.3
Finalization
This subject is discussed in § 3.4.3A.
2.6.4 Concurrency
This subject is discussed in § 4.10. where the Cedar
facilities for writing concurrent programs are given. Writing good concurrent
programs, or even correct ones, is another matter, which is beyond the
scope of this manual to more than hint at. Unfortunately, an adequate reference
is lacking.
2.7 Miscellaneous
The different kinds of allocation are discussed in § 4.5.
Static values are defined in § 3.9.1.
2.7.1 Pragmas
A pragma is a construct that does not change the meaning
of the program, except perhaps to make something illegal which was legal
without the pragma. Its purpose is to affect the implementation, generally
by requesting optimization to favor one criterion over others. The pragmas in
current Cedar are:
INLINE, which
causes a proc body to be expanded inline when it is applied. See § 3.5.1 for details.
PACKED, which
causes array components that fit in 8 or fewer bits to be packed, at the expense
of more expensive code to access them (§ 4.4.2).
CHECKED, which
forbids application of unsafe procs in a block, and adds runtime checking for
some primitive procs which are otherwise unsafe—in particular, narrowing to a subrange,
and assigning a proc (§ 3.4.4).
PRIVATE, which
forbids access to items in an interface or instance except to modules which EXPORT (or SHARE) it
(§ 3.3.6).
MACHINE DEPENDENT,
which allows positions of record fields (§ 4.6.1) and
representation values for enumeration elements (§ 4.7.1A) to be
specified (strictly, it is the absence of MACHINE DEPENDENT that is the pragma, since
the positions or representation values are legal only when it
is present.)
2.8
Relations among groups, types, bindings and declarations
Cedar has are four closely related basic ways of building
product values from simple values (all are given precise meanings in § 2.2.1
and § 2.2.2):
a group is simply an n-tuple of
values (see § 2.3.4):
a X-type is the type of a group (if x: T and
y: U then
[x, y]: TX
U) (see § 2.4.5): a binding
is an n-tuple of [name, value] pairs (see § 2.3.5):
a
declaration is
the type of a binding, an n-tuple of [name, type] pairs (see § 2.4.5).
Figure
2-1 illustrates the relations among these kinds of objects. In current Cedar
most of these objects can be constructed and manipulated only
as interfaces and instances. In the kernel and the
modeller,
all of them are first-class citizens. The primitives which go between them are
defined in §
2.2.
[a: T a— ea, b: Tb—eb]
binding<nAKBINDD
(instance)
MKBINDP
BDOTV
group<
[ea, eb]
values
[a: Ta. b: T
b] [a: TYPE— Ta. b: TYPE ^
BDOTD> decl Dios .BTOD. binding
(interface) -t.
MKDECL t \NI MKBINDP
1 4. DDOTP
[a. b] pattern
DDOTT 4, BDOTV
>X-type MKCROSSCTOG> group
TaXT b [T
a, T b]
types types as
values
Figure
2— 1: Relations among groups, types, bindings and decls
2.9
Incompatibilities with current Cedar
Most
of the syntax is current Cedar is an extension (or sometimes a restriction) of
kernel syntax. There are a few things that have different meanings in the kernel,
however, and these are potential sources of confusion:
Type
expressions in Cedar
do not have the same syntax as ordinary expressions and cannot appear in the
same contexts, for the following reasons:
The use of <- for specifying a default value for a type
vs
its use for assignment. The use of {} for
enumeration types vs its use for a block.
The use of parentheses and brackets to specify subranges
·The use of adjectives for variants (red Node).
Target type overloading for union constructors ([rator-'$plus, rands—binary[...]]), and
·enumeration
literals (red instead
of Color.red or
$red) is incompatible with the kernel's simple rules for the meaning of names.
·In
addition to writing n: t—e or
n—e for
a binding, you can also write n: t= e (in a module header or block)
and n: e (in
an argBinding). The most unfortunate consequence is that a Cedar
argBinding can look like a kernel decl constructor!
It
is now possible to avoid all the conflicting constructs except the relatively
harmless ones: <- for defaults, {} for enumeration, and union
constructors.
Chapter 3.
Syntax and semantics
This
chapter gives the concrete syntax for the current Cedar language. together with
an informal explanation of the
meaning of each construct, and a precise desugaring of each construct into the kernel language defined in § 2. The desugaring,
together with the definitions of the kernel primitives used in it, are the authority for the meaning; the informal
explanation is just for your reading
pleasure. However, paragraphs beginning Anomaly or Restriction document properties of Cedar
not captured in the desugaring. The primitive procs and types of Cedar are
specified in § 4.
In addition to the grammar
rules and desugaring, there are examples for each construct. These are intended
to illustrate the constructs and do not form a meaningful program. The Cedar
Manual has longer examples which do
something interesting, and also illustrate the use of the standard Cedar packages.
There are several summaries
which may be useful as references:
A two-page summary of all
the syntax. desugaring and examples in this chapter (CLRMSumm.press).
A one-page summary of the
full syntax (CLRMFuIIGram.press).
A shorter and less cluttered summary of the syntax for the safe language;
it also omits a number
of constructs which are obsolete or intended only for efficiency hacking (CLRMSafeGrampress).
The chapter begins with a description of the
notation (§ 3.1) The next sections deal systematically with the rules of the grammar, explaining
peculiarities of the syntax and giving the semantics:
§ 3.2, rules
56-61: § 3.3, rules 1-5: §
3.4, rules 6-10: § 3.5, rules 11-13: § 3.6, rules 14-18: § 3.7, rules 19-27: § 3.8, rules 28-35:
The lexical structure of programs. Modules.
Blocks, OPEN, ENABLE, EXITS.
Declarations and bindings. Statements.
Expressions.
Conditional constructs: IF and SELECT.
§ 3.9 treats various
miscellaneous topics. §4 deals with the syntax and semantics of types.
The order of the grammar rules is:
module, block, declaration, statement,
expression, conditional
type,
name, literal
and top-down within these.
3.1 Notation
This section describes the notation used in the grammar,
desugaring, and commentary of this chapter.
3.1.1 Notation for the
grammar
The grammar is
written in a variant of BNF:
Bold parentheses are for grouping: ( interface I implementation). Item
item means choose one.
?item
means zero or one occurrences of item.
item; ... means zero or more occurrences of item separated
by ":". The separator may also be ",",
ELSE, IN. or
OR, or it may be
absent. If the separator is ":", a trailing 11;1'
is optional. item; !.. is just like item; ... but there is at
least one occurrence.
A
terminal is a punctuation character other than bold ()?1, or any character underlined, or a word in
SMALL CAPS. Note that
a and {} are terminals, and do not denote optional occurrence and
repetition as they do in many other variants of BNF.
The rules are numbered sequentially.
Special symbols mark constructs with special properties: t
= unsafe:
· = obsolete;
=machine-dependent;
*= efficiency hack.
The
grammar is written so that a non-terminal never expands to the empty string.
When an element of a rule is optional, that is always indicated
explicitly by "?" or "..." .
The
following non-terminals are so basic to the language and so frequently used,
that they are represented in the grammar by abbreviations:
b=
binding13
d = declaration"
e
= expression 19
n
= name56 (identifier)
s
= statement14
t=
type36
I'm afraid this
means that you must learn the meaning of these six abbreviations in order to
read the grammar.
With the exception of these abbreviated non-terminals,
each use of a non-terminal is cross-referenced with a small
superscript number59, unless the non-terminal is defined in one of
the next few rules. If a non-terminal (other than e, t or n) is
used in more than one rule, then all the rules that use it are
listed in a comment after its definition.
Except for the entries in Table 3— 1, a terminal symbol
appears in only one rule. These duplications do not lead to
syntactic ambiguity. In most cases they are harmless, since the symbol has
essentially the same meaning in each case, and the rules are separate
only for greater readability, to highlight an unusual use of a
construct, or for historical reasons. In some cases, however, the symbol has quite
different meanings in different rules. These are marked on the left as follows:
·
In the rules whose numbers are marked with * the
symbol has a different meaning than in the others, and confusion is
quite possible. The programmer should beware.
m In
the rules whose numbers are marked with * the symbol has a different meaning
than in the others, but the context is sufficiently clear that
confusion is unlikely.
· The
rules whose numbers are marked with • are obsolete and should be avoided. A
superscriptxn indicates that the terminal is repeated n times in that rule.
![]()
![]()
|
Symbols |
Rules |
Explanation |
|
|
|
m
• f
o{}
•
m :
•
o
o
*
•
=> 4-
0
m ANY
m CODE EN DCASE
m ERROR
IN
LONG NOT
· NULL PACKED SELECT FROM SHARES
m SIGNAL TRASH TRUSTED
m USING
m WITH
19. 25. *51.1. *54
19,
25, 26, 37, 43, 51
2, 6, 8, 13, *54
2.3, 6, 7, 9, 17, 27. 29,30, 32, 34, 35, 43, 51, 52
6. 8.10, 17. 27.1. 30, 33, 35
1,
2, 3, 5, •7, 11, 13,
18,
•27, 33, •34. 51. *51.1. 53
19, 37
25x4. *51.1
21, *53
21, 58
20,
21, 58
·13, 22
6,
9, 17. 31, 33, 35, 52 14, 16, 18, 21, *55
2,
3, 13, 20, *22, *27
7, 34
*9.
40, 43
*13, 23 31, 52
*19,
*24. 41.1
18, 22
38x2,
45.1, 48
20. 22
14,
•27, •52, •55
44, 45
29,
32, 34, 52
2, 3
*24,
41.1
27x2, 55x2
6, 13 1, *5 *32, 34
See note in § 3.2.
introducing
names with types, except *51.1= position,
·7
= open, 027= argBinding 034= withSelect
dot notation for e is repeated for types
subrange, *position infixOp, *tag
infixOp,
exponent
prefix0p,
infixOp, exponent
·binding,
infixOp
exits,
enable, repeat, select choicesx4, unionTC
s,
e'-STATE,
iterator, e, *defaultIC interface.implementation,b,argBinding,*unary0p,*relOp
open, withSelect
*enable, variableTC, fields *new exception.
convert t to e
select endChoice, unionTC *expression,
*funnyAppl, transferTC
iterator,
relOp
cardinal/unspecified,
pointer, descriptor
prefixOp,
relOp
statement,
•argBinding, •unionTC, •defaultTC
array,
sequence
select,
safeSelect, withSelect, unionTC
interface
and implementation
*funnyAppl,
transferTC argBinding, defaultTC block,
machine code directory, *locks
*safeSelect,
withSelect
![]()
Table 3- 1: Terminal symbols appearing in more than
one rule 3.1.2 Notation for desugaring
The right-hand column is desugaring into the Cedar kernel
language, or in a few cases into comments describing the
meaning in English. This is a purely textual transformation: i.e., it is done on
the text of
the program, not on the values. The
rewriting is done one rule at a time: a single step of rewriting
involves elements from exactly one rule. The desugaring is specified by
slightly informal but straightforward rewriting rules, in which:
An occurrence of a non-terminal (written in bold) denotes
the text produced by that non-terminal in the grammar
rule.
A I reflects a corresponding alternation in the grammar
rule. ? reflects a corresponding optional item in the
grammar rule, and (bold parentheses) are for grouping as in a grammar rule.
As in grammar rules, literal parentheses are underlined.
Everything else is taken literally.
An underlined non-terminal
in the right column means that the desugaring specified for that non-terminal must be
done in order to obtain a legal program. Otherwise the transformations can be done in any order, yielding a legal program at each step.
Every
occurrence of e (expression) and t (type) in the desugaring is implicitly
parenthesized, so that the
desugared program parses as the rewriting rule indicates. To reduce clutter,
these parentheses are not written in the
desugaring rules.
For type
options like PACKED,
the desugaring of the
construct in which they appear is a call on a built-in
type constructor which takes a corresponding BOOL
argument defaulting to FALSE: if the attribute is
present, the argument is supplied with the value TRUE.
Examples: the following rule for subranges:



(typeName
f INT).MKSUBRANGE
([el. e2 I e2.PRED )1 )
[e1.succ, ( e2 I e2.PRED )] )
Index.MKSUBRANGE[10. 20]
Index.MKSUBRANGE[10, 20.PRED INT.MKSUBRANGE[1.SUCC, 100.PRED
Names
introduced in the desugaring are written with one or more trailing prime
("'") characters. Such names cannot be written in a Cedar program,
and hence they are safe from name conflicts. The desugaring is constructed so that the Cedar scope
rules prevent multiple uses of these names from being
confused.
3.1.3 Notation for the commentary
Each section of the
commentary begins with grammar rules, desugaring and examples for part of the language. It continues with text which
explains the meaning of the constructs. Generally the meaning is fairly clear
from the desugaring, and this text is short. For blocks and especially for modules, however, there are many non-obvious implications of the
desugaring. and a number of restrictions: these constructs have a lot of
explanatory text.
Some kinds of information are put into specially marked
paragraphs, which begin with one of the following
italicized words:
Anomaly: the meaning of this Cedar construct is not
explained by desugaring into the kernel, but by the special rule given here.
Caution: here is an implication of the definition which
might surprise you. Performance: facts about the time or space required by some
construct.
Representation: the values of a data type are represented in
terms of other types like this.
Restriction: a construct is not fully general. and will cause a
static error unless the additional
conditions stated here are satisfied.
Style: advice about good Cedar style.
Symbols
written in SANS-SERIF SMALL CAPITALS are in the kernel but not in current Cedar. The superscript notation used
to cross-reference non-terminals in the grammar is also used in the examples, usually to point to a rule whose example introduces a name.
3.2
Lexical structure
56
name
:: = letter (letter I digit)...
57 literal ::= num ?( ( Did I 13111
) ?num ) I digit
(digit IAIDICIDIEIE) ( Lib ) ?num I ?num . num ?exponent I
num exponent I
'
( extendedChar I ' I " ) I • digit !.. (CIO I
" ( extendedChar I ' ) " MUDI
$ n
58
exponent ::= (Ele) ?(+
I —) num
59 num :: = digit !..
60 extendedChar :: = space I \
extension I anyCharNot"'Or\
61 extension :: = digits
digit, digit3 i --
The character with code digit, digit2 digit.; B. I
(nIN I 1111) I (lID I (121M I -- CR, \ 015 I TAB, '\011
BACKSPACE,
'\010
OD I (111--) I' I " I \ FORMFEED, '\014
I LINEFEED, '\012 " I \
|
Examples m, xl, x59y, longNameWithSeveralWords: INT: n: iNT-1 + 12D+ 2B3 +2000B + 1H +OFFH; rl:
REAL-0.1+.1+1.0E-1 +1E-1; al: ARRAY [0..3] OF CHAR—'['x, '\N, '\141]; |
= 1+12+1024+1024 -- +1+255 = 0.1+0.1+0.1 -- +0.1 |
The
main body of the grammar (rules 1-55) treats a program as a sequence of tokens; these are the terminal
symbols of the grammar. Rules 56-61 give the syntax of most tokens. A token is:
-
A litera157. More information about
literals of type T is
in the section of § 4 devoted to T.
- A
name56, not one of the reserved words in Table 3-2. Note that case
matters in names.
- A
reserved word, which is a string of uppercase letters that appears in Table
3-2. A reserved word may not be used as a name, except in an ATOM literal.
- A
punctuation symbol: any printing character not a letter or digit, and not part
of one of the two-character sequences below. The legal punctuation
symbols in programs are:
! @ # $ *
— + = I( ) } +-
t : • " . > /
The
following ASCII characters
are not legal punctuation symbols (and must not appear in a program except in
an extendedChar6o):
% &
\ ?
-
One of the following two-character symbols (used in the grammar rules
indicated): not equal22
less than or equal22
not less than22
greater than or
equal22
not greater than22
chooses
subrange
constructor25.51.I bind by name6.34
Note that Cedar
uses a variant of ASCII which includes the
characters (instead of
the underbar and t (instead of
the circumflex ).
Also, the character written — here is the ASCII minus, code 55B. and not any of the
various dash or
typographer's minus
characters with other codes, which are not in the standard Ascii set.
![]()
![]()
|
ABS |
ELSE |
ISTYPE |
PACKED |
SIGNAL |
|
ALL |
ENABLE |
JOIN |
PAINTED |
SIZE |
|
AND |
END |
LAST |
POINTER |
START |
|
ANY |
ENDCASE |
LENGTH |
PORT |
STATE |
|
ARRAY |
ENDLOOP |
LIST |
PRED |
STOP |
|
ATOM |
ENTRY |
LOCKS |
PRIVATE |
STRING |
|
BASE |
ERROR |
LONG |
PROC |
SUCC |
|
BEGIN |
EXIT |
LOOP |
PROCEDURE |
TEXT |
|
BOOL |
EXITS |
LOOPHOLE |
PROCESS |
THEN |
|
BOOLEAN |
EXPORTS |
MACHINE |
PROGRAM |
THROUGH |
|
BROADCAST |
FINISHED |
MAX |
PUBLIC |
TO |
|
CARDINAL |
FIRST |
MIN |
READONLY |
TRANSFER |
|
CEDAR |
FOR |
MOD |
RECORD |
TRASH |
|
CHAR |
FORK |
MONITOR |
REF |
TRUSTED |
|
CHARACTER |
FRAME |
MONITORED |
REJECT |
TYPE |
|
CHECKED |
FREE |
NARROW |
RELATIVE |
UNCHECKED |
|
CODE |
FROM |
NEW |
REPEAT |
UNCOUNTED |
|
COMPUTED |
GO |
NIL |
RESTART |
UNTIL |
|
CONS |
GOTO |
NOT |
RESUME |
USING |
|
CONTINUE |
IF |
NOTIFY |
RETRY |
WAIT |
|
DECREASING |
IMPORTS |
NULL |
RETURN |
WHILE |
|
DEFINITIONS |
IN |
OF |
RETURNS |
WITH |
|
DEPENDENT |
INLINE |
OPEN |
SAFE |
ZONE |
|
DESCRIPTOR |
INT |
OR |
SELECT |
|
|
DIRECTORY |
INTEGER |
ORDERED |
SEQUENCE |
|
|
DO |
INTERNAL |
OVERLAID |
SHARES |
|
Table
3 — 2: Reserved words and predefined names
The program is parsed into tokens by starting at the
beginning and successively taking from the front the longest
sequence of characters which forms a token according to the rules above, after
first discarding any number of initial whitespace characters or comments.
The
whitespace characters are space, tab, and carriage return. A Tioga node
boundary is also treated as a whitespace character.
A comment is one of:
A
sequence of characters beginning with --, not containing -- or a carriage
return. and ending either with -- or with a carriage return.
A Tioga node with the comment property.
Note that whitespace and comments are not tokens, but may
appear before or after any token: they are token delimiters, and
hence cannot appear in the middle of a token. Whitespace and comments thus
do not affect the meaning of the program except:
When they delimit a token.
Within
a CHAR literal
or a ROPE literal,
where they are taken literally. Thus • is equal to '\040,
and "I
am --not--" is equal to "I\Nam --not--" and
different from "I\Nam ".
Both
reserved words (Table 3-2) and most names with predefined meanings (Table 4-5)
are made up entirely of upper case letters. All are at least three
characters long except the following:
DO GO
IF IN OF OR TO.
Caution on use of reserved words and predefined
names: They should not be rebound by the program:
in some but not all cases the compiler forbids their rebinding.
A note on lists of items and their separators. In general,
semicolons are used to separate statements. or slightly larger
constructs that contain statements. Commas are used to separate the items in
all other kinds of lists. Precisely:
Semi-colons are used to separate
declarations, bindings and statements in a bodylO, and to separate
choices in a select statement
29.32.M
or in an exits6. 17 or enable&
27.1.
Commas are used to separate
declarations in fields43.51 (i.e., in a proc domain or range, a recordTC
or a unionTC), bindings in an application27 or an open7,
choices in a select expression29.32.34
or in a unionTC52. expressions in a choice6.9.17. 30.35, 52, items in imports. exports
or shares lists2.3.
In
general these lists may be empty, and an extra separator at the end is harmless
when there is some kind of closing bracket, except when the
sequence is bracketed with U.
The
braces { } which delimit a block6, interface body2,
choices in an enable, or MACHINE CODE body13 may be replaced by BEGIN and
END reserved
words. BEGIN replaces
"{" and END replaces
"}". If one brace is replaced, its matching
partner must also be replaced. The braces delimiting an enumTC54
may not be replaced by BEGIN
and END.
3.3 Modules
module ::= DIRECTORY (nd
(: TYPE
(n, 1 ) 1 )
?(USING [ nu. ...
) ),
( interface I implementation )
2 interface :: = : ?CEDAR DEFINITIONS
?locks (imports I) ?.(SHARES ne
...) ?eaccess12 { ?open7
(d I b): .
3 implementation
::= rim : ?CEDAR
?safety ( PROGRAM ?drType42
MONITOR ?drType42 ( I locks)) (imports I )
?(EXPORTS
ne,
...)
?.(SHARES
ns,
?*access12 block .
3.iimports
::= IMPORTS
( (fly : I ) n, ), ... --In 2, 3.
4 safety ::= SAFE I UNSAFE --In 3.41. Stocks
::= LOCKS e?( USING nu: t)
Examples
A [ (nd : ( (TYPE nt
I TYPE
Ild) I TYPE nd)), ] IN
LET (nd•RESTRICT[n [$n ] )
IN ( interface I implementation
)
LET r' [ nm:
INTERFACETYPC
$rim, ...]] ] IN (imports I
X = >r) IN
-- SHARES allows access to PRIVATE names
in ne LET REC nm–open [
?(1'–locks, ) (d I b), IN tin,
LET [(ne:
ne) , , FRAME: TYPE nni
nm: FRAME, CONTROL: PROGRAM]
IN (imports I X = >1.1) IN
( I LET l'—( LET LOCK ...NEWLOCK IN (A IN LOCK) I
locks) IN)
LET b'–NEWPROGINSTANCENOCklUNCONS
IN
[ (ne--.BINDDFROM[ne,
b' PLUS nm–b`.nm] ), ,
FRAME~MKINTTYPE[block],
nm–b' ,
CONTROL–b%n in] where the block body is desugared: [
(d I b), rim:
PROGRAM
drType–{s:
[(ni:nr), ...]= >r' IN LET [((nv I ni)– (n, PLUS nf.BINDING) ]
X ?( [nu t] ) e
DIRECTORY
Rope: TYPE USING [ROPE, Compare],
CIFS: TYPE USING [OpenFile,Error,Open,read]. 10:
TYPE 10Stream,
Buffer: TYPE;
-- For Bufferlmpl below.
--
There should always be a USING clause -- unless most of the
interface is used -- or it is a standard one like Rope or 10. -- or it is exported.
|
Buffer: DEFINITIONS - { Handle: TYPE-REF BufferObject: BufferObject: TYPE = Rope.RoPE New: PROC RETURNS[h: Handle]; Get: PROC[h: Handle] RETURNS[BufferObject]; Put: PROC[h:
Handle. o: BufferObject] }: Bufferlmpl: MONITOR [f: CIFS.OpenFile] LOCKS Buffer.GetLock[h]t USING
h: Buffer.Handle IMPORTS Files: CIFS, 10, Rope EXPORTS Buffer — module body -- 1 . |
-- Implementations can
have arguments. --
LOCKS only in MONITOR, to specify -- a non-standard lock. -- Note the absence of
semicolons. -- EXPORTS in PROGRAM or MONITOR. -- Note the final dot. |
Modules serve a number of functions (which might
perhaps better be disentangled, but are not): A file of source
text (BufferImpLmesa), or
its translation into object code (BufferlmpLbcd).
The
unit handled by the editor, named in DF files and models, and accepted by the compiler,
the binder, and the loader.
A
set of related structures (types, procedures, variables) which are freely
accessible to each other, hiding secrets or irrelevant information
from other modules.
A
procedure which can accept interface types and bindings as arguments, and
returns interface instances as results.
The procedures of a monitor, perhaps with its protected
data.
The
first two uses are not relevant to the language definition, and are not
discussed further here. The others are the subject of this section.
There are two kinds of modules: interface modules (written
with DEFINITIONS) and
implementation modules (written with PROGRAM or
MONITOR). They
have the same header (except that interfaces have no EXPORTS list);
it defines the parameters and results of the module viewed as a proc (§ 3.3.1) and
specifies the name nm of
the module. The bodies (following the —) are different. Table 3 —3
summarizes
the structure of modules and their types; it omits a number of details which
are given in rules 1-3 and explained in the text.
![]()
![]()
|
Example |
Module |
Module
type |
Result |
Result type |
|
DIRECTORY Rope. 10; |
Interface |
[Rope:
TYPE Rope, 10: TYPE 10] |
Interface |
TYPE Match |
|
Match: DEFINITIONS-{...} |
module |
-0[TYPE Match] |
|
|
|
DIRECTORY Match, Rope. 10; Matchlmpl: PROGRAM |
Implementation module |
[Match: TYPE Match, Rope: TYPE Rope,
10: TYPE 10. |
Exported instance |
Match |
|
IMPORTS R: Rope. 1: 10 |
|
R:
Rope. 1: 10]—).[Match] |
|
|
|
EXPORTS Match-1...1 |
|
|
|
|
Table 3-3: Interface and implementation modules
The
ensuing sub-sections deal in turn with:
§ 3.3.1: Modules as procedures. and the interface or instance values they return. § 3.3.2:
How modules are applied.
§ 3.3.3: Module parameters: the DIRECTORY and
IMPORTS
lists: USING clauses.
§ 3.3.4: Interface module bodies and interfaces.
§
3.3.5: Implementation module bodies: the EXPORTS list,
§ 3.3.6: SHARES and access12.
The meanings of the other parts of a module
header are discussed elsewhere: CEDAR in § 3.4.4.
MONITOR
and LOCKS in § 4.10.
3.3.1 Modules and instances
A module is a proc which takes two kinds of arguments:
Interfaces, declared in the DIRECTORY list.
These arguments are supplied by the model (or on the compiler's command line), and used during compilation.
Instances of
interfaces, declared in the IMPORTS list. These arguments are also supplied by the
model (or in a config file passed to the binder, or implicitly by the loader),
and used during loading.
§ 3.3.3 discusses the types of these arguments and how
they are declared. In addition, an implementation may take PROGRAM arguments
declared in the drType following PROGRAM or MONITOR. These
are ordinary values: they are discussed in § 3.3.2A.
When a module is applied to its arguments, the resulting
value is
For an interface module, an interface.
For an implementation module, a binding whose values are
instances:
one interface instance for each interface it exports:
one for the program
instance, also called a global frame;
one for the program proc derived from the
module body (§3.3.2A), called
CONTROL.
This application cannot be written in the program, only in
the model: it is described in § 3.3.2.
An interface (sometimes
called an interface type) is
a type, as the latter name suggests. This type is a declaration
(obtained from the declarations which constitute the module body), with an
extended cluster that includes all the bindings in the module body
that don't use declared names (§ 3.3.4). In the example, the Buffer interface (obtained by
applying the Buffer module
to the arguments declared in its DIRECTORY) has declarations for New, Get, and Put, and its
cluster includes values for Handle
and
BufferObject.
An
interface instance is
a value whose type is an interface: such values are the results of instantiating
implementation modules. In the example, Bufferlmpl returns (exports) an instance of Buffer.
A
program instance or
a global frame is
a frame. as the latter name suggests, i.e., a binding obtained from
the bindings and declarations of an implementation (PROGRAM or
MONITOR) module
body, just like any proc frame (§3.3.5). Normally code outside
the module does not deal with the instance directly, but only
with the exported interface instances. In the example, Bufferlmpl exports a program
instance for the module and a CONTROL proc.
In most cases, there is:
Exactly one application of each module, and
hence exactly one interface or one instance. Only one module
which exports an interface.
Only one interface exported by a module.
Only
one argument of the
proper type for each module parameter (§3.3.3); hence it is redundant
to write the arguments explicitly.
When these conditions hold, there is a close
correspondence among the following four objects: an interface
module;
the
interface it returns (since its arguments need not be written explicitly);
the
implementation module which exports the interface;
its
instance (again, since its arguments need not be written explicitly).
The
distinctions made earlier in this section then seem needless; it is sufficient
to simply consider the interface and implementation modules, and
identify them with the files which hold their text. In more
complicated situations, however, it is necessary to know what is really going
on.
In
the example at the start of this section, BufferImpl
is an implementation module with seven parameters:
Four interface parameters, declared in the DIRECTORY: Rope, CIFS, 10 and Buffer.
Three instance parameters, declared in the IMPORTS: Files (of type CIFS), 10 (of type 10), and Rope (of type Rope). Since the instance
parameters are declared in an inner scope, the instance Rope is the one visible in the
module body; the interface Rope is
visible only in the header. The same is true for 10, but both the interface CIFS and the instance Files are visible
in the body.
When BufferImpl
is compiled, the four interface parameters must be
supplied, in the form of (compiled) interface modules named Rope, CIFS, 10 and Buffer. When BufferImpl is instantiated
(normally by loading it), the three instance parameters must be supplied, i.e.
there must be other instantiated implementation modules which export
the Rope, CIFS, and
10 interfaces.
Normally there will be one of each, and the entire program will consist
of eight modules:
the interface modules Rope, C1FS, 10 and Buffer,
implementation
modules normally named Ropelmpl,
CIFSImpl, 101mpl and Buffer1mpl,
each exporting an instance of the corresponding
interface
The
instantiated Bufferlmpl exports
an instance of Buffer, which can thus be used as a
parameter by some other module.
3.3.2 Applying modules
A
module is not applied to all its arguments at once. Instead, the arguments are
supplied in two stages:
A module is applied to its interface (DIRECTORY) arguments
by compiling it; the result is a BCD (represented by a .bcd file). The bcd is still a proc, with
instance parameters. Like any proc, a module can be
applied to different arguments (i.e., different interfaces) to yield different
results (BCDs).
A BCD
is applied to its instance (IMPORT) arguments by loading (or
binding) it; the result is a program instance, together with any interface
instances exported by the module. Again, the BCD can be applied to
different arguments (i.e., different interface instances) to yield different
instances. Indeed, because an instance may include variables, even two
applications to the same arguments will yield different
results (instances).
These two stages are separated for several reasons:
All
the type-checking of a module can be (and is) done in the first stage, by the
compiler. The only type error possible in the second stage is
supplying an unsuitable argument.
Compiling is much slower than loading, and a module needs
to be recompiled only when its interface arguments change, not when the
interface instances change. The latter are changes in the
implementations of the interfaces, and are much more common.
When there are multiple instances of the same module with
the same interface parameters, they automatically get the
same code.
We've always done it that way.
3.3.2A
Initializing a program instance
The statements in the body of an implementation module
form the body of a proc called the program
procedure. The function of this proc is to initialize an instance
of the module. When program instance PI is made, no code in the
module is executed: hence PI may
be uninitialized. It is the job of the program proc PP' to initialize PI, perhaps using the PROGRAM arguments
if there are any. Until PP' has been called, PI is not in a good state. It
would be better to supply the PROGRAM arguments along with the imported instances, and
call PP' as part
of making P1, so
that PI is
never accessible in its uninitialized state. But it isn't done that way: hence
the programmer must ensure that PP is called before any use is
made of P1. The
preferred way to get hold of PP is
from an interface to which it is exported: see § 3.3.5.
To confuse things. PP is not an ordinary procedure but a PROGRAM, and
it must be called using the START construct (see § 4.4.1). Note that in addition to
the statements of the module body, PP
also contains the type-specific initialization code for
any variables or non-static values in the instance: e.g., if x: INTO-3, the
value of x will not be 3 until after PP' has been called.
There is some error detection associated with this kludge.
If a proc in the instance is called before the instance has been initialized by
START, a
start trap occurs.
At this point, if PP' takes
no arguments it is called automatically, and the original
call then proceeds normally: if PP
does take arguments, there is a Runtime.StartFault ERROR.
Caution on initializing monitors: If
the module is a monitor, PP' runs
without the monitor lock: if another process calls into
the module while PP' is
running, it will not wait, but will run concurrently with PP. This is unlikely to be
right. It is unwise to rely on a start trap to initialize a monitor
module: call PP explicitly
with START.
Caution on referencing module
variables before initialization: If a variable in the
instance is referenced before the instance has been initialized, no
error is detected, and the uninitialized value will be obtained. PP' can still be called to
initialize the instance, and may still be called automatically by a
start trap.
The program proc is bound to the name CONTROL in
the result of an implementation module if its type is PROGRAM[] RETURNS (otherwise the proc RuntimeReportStartFault is bound to CONTROL). This allows the modeller
(and binder) to get access to PP so
as to control the order in which modules are started.
3.3.3
Parameters to modules: DIRECTORY and
IMPORTS
The interface parameters of a module are declared in the DIRECTORY. An
interface / has
type TYPE n, where
n is
any one of the names given before DEFINITIONS in the header of the
interface module that produced I. The INTERFACETYPE primitive in the desugaring
takes a list of atoms and returns a type which implies TYPE n for
each $n in
the list. The reason for allowing several names is to aid conversion
of an interface from one name to another: both names can continue in use for a
while.
The use of these names provides a clumsy check that the
proper interface is supplied as an argument. DIRECTORY n: TYPE and
DIRECTORY n are
both short for DIRECTORY
n: TYPE n.
The compiler must be able to find the interface arguments.
which in general are stored as files. When the modeller is used, it supplies
these arguments from the specifications in the model. Otherwise,
they may be specified explicitly on the compiler's command line, or failing
that. the compiler gets the interface I from the file I.bcd.
An interface is a type which can only be used:
Before a dot (§4.14). to obtain a value from its cluster,
which simply consists of the bindings in the interface
module body (§3.3.4).
In an IMPORTS list as the type of an
instance parameter to a module. After POINTER TO FRAME
(§ 4.5.3)
The USING clause in the DIRECTORY, if present. restricts the
cluster of the interface to contain only items with the names nu,
... Thus in the example, only ROPE and Compare are
in the cluster of Rope
in
the BufferImpl module. This means that Rope.ROPE and Rope.Compare are
legal, but Rope.n for any other n will
be an error. Note that USING
affects only the cluster of the parameter: it does not
affect the clusters of any types or the bodies of any INLINE procs
obtained from the interface. Thus within Rope. Compare might
be bound by
Compare: PROC[rl, r2: ROPE] RETURNS NOOLHNLINE IF Length[r1]'
= Length[r2] THEN ... I
A call of Rope.Compare in
BufferImpi is
all right, even though RopeLength in BufferImpi is an error.
In the example, CIFS, 10, and Rope are interfaces. They are the
types of three IMPORTS
parameters named Files, 10, and Rope (if the
IMPORTS clause
gives no name for the parameter, the name of the interface is
recycled). An actual argument for an IMPORT parameter must be an
interface instance, i.e., a value whose type is an interface type. Such a value
is obtained from one or more modules which export the interface (§ 3.3.5). An
instance is a binding: in it, the value of a name declared in the interface is
provided by the exporter: the value of a name bound in the interface (e.g.,
x-3) is just the value the interface binds to the name (in this
case, 3). This rule has two effects:
The client can ignore the distinction between names bound
and declared in the interface, since both appear in the
instance binding and are referenced uniformly with dot notation. This
means that the client is not affected, for example. when a proc is moved from
an INLINE in
the interface to an ordinary definition in an implementation.
The client can often ignore the distinction between the
interface and the instance, since all the values in the interface
are also in the instance, with the same names. This is the motivation
for the shorthand which allows the name of an IMPORT parameter to default to the
name of the interface: the interface is no longer accessible, but I.x has
the same meaning (namely 3) whether / is the interface or
the instance.
Caution on inlines in interfaces: Names
bound to inline procs in an interface do not appear in the interface
binding, but only in an instance. This somewhat dubious rule ensures that
clients won't have to add to their imports lists if a proc
stops being an inline.
Restriction
on importing multiple instances: An interface module may not import more
than one instance of a given interface I. If an implementation module P imports more than one instance of I. the principal instance of / is the one with no name in
the IMPORTS list (which is therefore
named I by default). if P imports only one instance of type I. then
that
instance is the principal instance.
Restriction on importing a principal instance
into imported interfaces-. Often an interface module has no IMPORTS. because it only needs access to the static
values (types and constants) bound in its interface parameters, and does not
need values for any
names declared there (procs and interface variables). If an interface module
does have IMPORTS, however, and there is more than one instance of any
imported interface around, then there is a restriction on the argument values. Suppose that Intl imports Int2, and that a program module P imports Intl. Then Intl may only import one instance of Ina. and if P also imports MO. the principal instance of /n/2 in P must be the same as the value of Intl imported by the Intl imported by P. For example, with
DIRECTORY MO: Intl: DEFINITIO\S IMPORTS Int2V: Int2...
DIRECTORY Intl. Intl: P. PROGRAM
IMPORTS Int1V: Intl. Int2V: Int2...
we must have in P
that Int1V.Int2V--=Int2V.
3.3.4 Interface module bodies
The
body of an interface module I is a collection of bindings (e.g.. x: INT-3) and declarations (e.g..
y:
VAR INT or P: PROC[a:
INT] RETURNS [REAL]).
Restriction on bindings in interfaces: The
construct that follows the – in one of the bindings13
is restricted:
If it is an expression, it must be static (§3.9.1). Thus,
no imported names. As a result, P: PROC—LP
E: ERROR—LE
are not allowed.
If
it is a block (providing the body of a proc), it must be INLINE (because
there isn't any place to put the compiled code).
It may not be CODE. This is an
unfortunate accident of the implementation.
The result of applying an interface module is an interface
(§3.3.2), which is a type I obtained by applying the
primitive MKINTTYPE to
the d's and b's of the body. This type is simply the declaration obtained
by collecting the declarations in the body, with a cluster which is extended to
include all the bindings of the body. However, MKINTTYPE omits
any inline proc bindings from the type's cluster, instead leaving the proc
declarations in I. It
puts an extra item BINDING
in Is cluster with the inline procs in it. When an
instance Ins: of 1 is
imported, the binding actually imported is Inst PLUS 1.B1NDING. This slightly dubious arrangement
ensures that clients don't have to change imports lists
if a proc stops being inline. This policy is not extended to other items,
however, even though they might change from being bound in the
interface to being interface variables.
The interface returned by
Red Blue. Green: DEFINITIONS—...
has
the types TYPE Red TYPE Blue and TYPE Green.
Restriction on referring to names
introduced in an interface: The types and expressions in
the declarations and bindings of an interface may refer to
other names in the bindings as usual, but they may not refer to names introduced
in the declarations, except that:
Any declared name may be used
in the body of an INLINE, or
after
a "4-"
in a defaultTC55
in the fields43
of a transferTC41
which is the type of a decl in the interface's body.
A declared (opaque) type may be used anywhere.
For example, if an interface contains
I: DEFINITIONS—
x:
INT-3;
y VAR INT;
T: TYPE[ANY]
then the following may also appear in the
interface: xx: INT—x+ 1;
P:
PROC RETURNS[INT]—INLINE {RETURN[x+A}:
Q:
PROC [INT4-y];
V: TYPE—RECORDV. REF T. g:
but the following are illegal: xy: INT—y+ 1:
U: TYPE—INT4-y:
W: TYPE—ARRAY [0.4] OF INT;
The
values of the bindings can be accessed directly by dot notation in any scope in
which the interface is accessible. Thus if the value of the
previous interface module is bound to J, e.g., because
J: TYPE / appeared
in the DIRECTORY,
then J.x is equal to 3. The
declarations cannot be accessed directly (J.y is an error).
The
declarations in an interface module are not quite like ordinary declarations.
They are of three kinds, depending on whether the type of a
declaration is:
A transfer type (§ 4.4.1); this is just like a declaration of a
transfer parameter to an ordinary proc, except that it is
readonly.
TYPE[ANY] or
TYPE[e]; the type being
declared is an opaque type or
exported type. discussed
in § 4.3.4. The expression e must be static. TYPE[ANY] or
TYPE[E] is
not allowed in an ordinary declaration; except in an interface, a
type name must be bound to a type value when it is introduced.
VAR T, or
READONLY T for any type T except TYPE: this
is an interface variable: discussed
in § 3.3.4.1 below. You can also write simply T here, but this is not
recommended.
An interface instance // has the interface type / if for
each item n: T in the
interface, there is an item n— v in the instance, and v has type T. This is the same rule which
determines that a binding has the type of a declaration; e.g., that a proc
argument has the domain type. In this respect there is nothing
special about an interface.
Note that a name can be declared PRIVATE in
an interface, even though it must be declared PUBLIC in the exporter (§ 3.3.6).
This can be useful if the name is used in a type constructor or inline proc in
the interface, but its value should not be accessible to the client
3.3.4A
Interface variables
An interface variable v gives clients of an interface direct access to a variable
in a program module, namely the variable which is exported to v. This is the only kind of variable parameter in
current Cedar.
·If
you use the obsolete shorthand of T
for VAR T in
an interface variable declaration, you cannot declare a transfer
type variable as an interface variable, since that already means passing the
transfer value.
Caution on uninitialized interface
variables: the variable which is exported to provide the
value for an interface variable is not initialized until its module
is initialized (§ 3.3.2A). However, there is nothing to stop it
from being accessed sooner, with possibly undesired results.
Performance of interface variables:
An interface variable can be read and (if not READONLY) set
directly, which is significantly faster than Get and Set procs. Of course, the
implementor gives up some control. These operations are not quite as
fast as access to an ordinary variable, since there is an
extra level of indirection which costs one or two extra instructions each time.
There is also one pointer per interface variable per module which
refers to it. If you use a private interface variable and
inline Get and
Set procs,
you pay nothing in performance, but retain the option of changing the proc
definitions later.
·You
can get direct access to all the variables of a module by using a POINTER TO FRAME type
(§ 4.5.3).
3.3.5 Implementation module bodies
The body of an implementation module Imp is simply
a block. This block plays two roles. On the one hand, it is an
ordinary block, the body of an almost ordinary proc PP' called
the PROGRAM proc,
which has parameters and results like any other. PP' is
special in one way: it has a PROGRAM type rather than a PROC type. When PP' is applied (using
the special construct START:
see § 4.4.1), its declarations and
bindings are evaluated, its statements are executed, and its results are
returned
as with any proc. The only difference is that the values
bound to the names introduced in the block (i.e., the frame
of PP') are
retained after the proc returns: in fact, forever (unless Runtime.Unnew is used
to free the frame). Procs local to the block can access these values in the
usual way. and values of exported names can also be accessed through
interfaces, as explained below: see § 3.3.2A.
As
with any proc (§ 3.5.1). the frame of PP includes the parameters and results from Imp's drType42
as well as the names introduced in the block's d's and
b's. It also includes an additional item:
Imp: PROGRAM T- PP'
where Imp is the name of the module
and T is
its drType.
The body of Imp
has a second role: to supply values for the names
declared in the interfaces exported by Imp. For each interface Ex which Imp exports. an
interface value Exi
of type Ex is
constructed. Each name n in Exl acquires a value as follows:
If n: T is in Ex