This document was made by
OCR from a scan of the technical report. It has not been edited or proofread
and is not meant for human consumption, but only for search engines. To see the
scanned original, replace OCR.htm with Abstract.htm or Abstract.html in the URL
that got you here.

![]()
![]()
![]()
![]()
![]()
Report
On The Programming Language Euclid
by B. W. Lampsonl, J. J. Hornin and
g2, R. L. London3, J. G. Mitchell1, G. J. Popek4
This report describes the Euclid
language, intended for the expression of system programs which are to be verified.
Authors'
addresses and support:
1.
Xerox Research Center, 3333 Coyote Hill Road, Palo Alto,
CA 94304
2. Computer Systems Research Group, University
of Toronto, Toronto, Canada M5S 1A4 Supported in part by a Research Leave Grant from the
University of Toronto.
3. USC Information Sciences Institute,
4676 Admiralty Way, Marina del Rey, CA 90291 Supported by the Advanced Research
Projects Agency under contract DAHC-15-72-C-0308.
4. 3532 Boelter Hall, Computer Science
Department, University of California, Los Angeles, CA 90024 Supported in part by the Advanced
Research Projects Agency under contract DAHC-73-C-0368.
The
views expressed are those of the authors.
Table of contents
Preface 1
Acknowledgements 1
1.
Introduction 2
2.
Summary of the language 4
3.
Notation, terminology and vocabulary 9
3.1
Vocabulary 9
3.2
Legality assertions 10
3.3
Lexical structure 11
4.
Identifiers, numbers and strings 12
5.
Manifest constants 13
6.
Data type declarations 14
6.1
Simple types 15
6.1.1 Enumerated types 15
6.1.2 Standard simple types 16
6.1.3 Subrange types 18
6.2 Structured types 18
6.2.1 Array types 18
6.2.2 Record types 19
6.2.3 Module types 21
6.2.4 Machine-dependent
records 24
6.2.5 Set types 25
6.2.6 Pointer and collection types 26
6.3 Parameterized types 29
6.4
Type compatibility 31
6.5
Explicit type conversion 32
7. Declarations and denotations of constants and variables 34
7.1
Entire variables 36
7.2
Component variables 36
7.2.1 Indexed variables 36
7.2.2 Field designators 37
7.2.3 Referenced variables 37
7.3
Scope rules 38
7.4
Binding 40
8. Expressions 39
8.1 Operators 43
8.1.1 Multiplying operators 44
8.1.2 Adding operators 44
8.1.3 Relational operators 44
8.1.4 Other
operators 45
8.2
Function designators 45
9. Statements 46
9.1 Simple statements 46
9.1.1 Assignment statements 46
9.1.2 Procedure statements 46
9.1.3 Escape statements 47
9.1.4 Assert statements 48
9.2
Structured statements 48
9.2.1 Compound statements and blocks 48
9.2.2 Conditional statements 49
9.2.2.1 If statements 49
9.2.2.2 Case statements 50
9.2.3 Repetitive
statements 51
9.2.3.1 Loop statements 52
9.2.3.2 For statements 52
9.2.4 Other uses
of binding 53
10. Procedure declarations 55
11. Function declarations 58
12. Programs 60
13.
A standard for implementation and program interchange 63
13.1 Representation of basic
symbols 64
13.2 Standard format for programs 64
13.3. Annotation 65
14.
Implementation notes 66
14.1 Identifiers 66
14.2 Parsing 66
14.3 One-pass translation 66
14.4 Routine parameters 66
14.5 Routines in modules 67
14.6 Constant components of records and modules 67
14.7 Finalization 67
14.8 Inline code 67
14.9 Reference counts 68
14.10 Representation of pointers 68
14.11 Parameterized types 68
References 70
Appendix
A. Collected syntax 71
Appendix
B. Zone Example 78
Preface
This report describes
a new programming language called Euclid, intended for the expression of system programs which are to be verified. Euclid draws heavily on Pascal
for its structure and
many of its features. In order to reflect this relationship as clearly as
possible, the Euclid report has been
written as a heavily edited version of the revised Pascal report.
Proof rules for Euclid
appear in a separate report [London et al 1977]. A Euclid implementation is under
development by the System Development Corporation, 2500 Colorado Avenue, Santa
Monica, California; information may be obtained from Mr. Hugh C. Lauer.
No implementation of
the language has yet been completed, and no Euclid programs of any size have been written. As a
result of experience in implementing and using it, changes in
the language or its defining report
may be made.
This is the third version of the Euclid report; earlier versions appeared in May 1976 and August 1976.
Acknowledgements
Obviously, we are
greatly indebted to Wirth, both for the aspects pf the language which are copied from Pascal, and for the structure
and much of the wording of the report. We are also much in debt to Hoare's work in the areas of
programming language design, axiomatic methods, and program
verification. In particular, we have tried to follow his suggestion that "the language designer
should b_e,,,familiar with many alternative features designed by
others, and should
have excellent judgement in choosing the best... One thing he should not do is to include untried ideas of his
own. His task is consolidation, not innovation." [Hoare 1973].
We have consciously
borrowed ideas and features from Alphard [Wulf, London, and Shaw 1976], BCPL [Richards 1969], CLU
[Liskov 1976], Gypsy [Ambler et al. 1976], LIS [Ichbiah et al. 1974], Mesa
[Geschke and Mitchell 1975], Modula [Wirth 1976], and the SUE System Language [Clark and Horning 1973, Clark and
Ham 1974]; other languages and suggestions
for language features have also undoubtedly influenced our thinking. We have benefitted greatly from comments and criticisms by
numerous individual colleagues on previous
versions of the language and report, and from the comments of the implementors, especially
Lauer.
We are grateful to Prof. Wirth,
and to Springer-Verlag, for permission to use portions of the Pascal report [Wirth 1971,
Jensen and Wirth 1975] in this report.
Our work has been
significantly aided by the Arpanet, which allowed us to maintain effective and rapid communication
in stating and resolving problems, in spite of the wide geographical distribution of the
authors.
1.
Introduction
"There is no royal road to geometry."
Proclus, Comment on
Euclid, Prol. G. 20.
The programming language Euclid has been designed
to facilitate the construction of verifiable
system programs. By a verifiable program we mean one written in such a way that existing formal techniques for proving certain
properties of programs can be readily applied;
the proofs might be either manual or automatic, and we believe that similar considerations apply in both cases. By system we mean
that the programs of interest are part
of the basic software of the machine on which they run; such a program might be
an operating system kernel, the core of a data base
management system, or a compiler.
An important consequence of this goal is that
Euclid is not intended to be a general-purpose programming
language. Furthermore, its design does not specifically address the problems of constructing very large programs; we believe most of
the programs written in Euclid will be
modest in size. While there is some experience suggesting that verifiability
supports other desired goals, we assume the user is
willing, if necessary, to obtain verifiability by giving up some run-time efficiency, and by tolerating some
inconvenience in the writing of his
programs.
We see Euclid as a (perhaps somewhat eccentric)
advance along one of the main lines of current programming language
development: transferring more and more of the work of producing a correct
program, and verifying its correctness, from the programmer and the verifier (human or mechanical) to the language and its
compiler.
The main changes relative to Pascal take the
form of restrictions, which allow stronger statements about the properties of
the program to be made from the rather superficial, but quite reliable, analysis which the compiler can perform.
In some cases new constructions have been introduced, whose meaning can be
explained by expanding them in terms of existing
Pascal constructions. The reason for this is that the expansion would be
forbidden by the newly introduced restrictions, whereas the
new construction is itself sufficiently restrictive
in a different way.
The
main differences between Euclid and Pascal are summarized in the following
list:
Visibility: Euclid provides explicit control over
the visibility of identifiers, by requiring
.the program to list all the identifiers imported into a routine or module, or .exported from a module.
Variables: The language
guarantees that two identifiers in the same scope can never refer to the same
or overlapping variables. There is a uniform mechanism for binding an identifier to a variable in a procedure
call, on block entry (replacing the Pascal with
statement), or in a variant record discrimination. The variables referenced or modified by a routine (i.e.,
procedure or function) must be accessible in
every scope from which the routine is called.
Pointers: This
idea is extended to pointers, by allowing dynamic variables to be assigned to collections, and guaranteeing that two
pointers into different collections can never refer
to the same variable.
Storage allocation: The program can control the
allocation of storage for dynamic variables
explicitly, in a way which confines the opportunity for making a type error very narrowly. It is also possible to
declare that some
dynamic variables should be reference-counted, and
automatically deallocated when no pointers
to them remain.
Types: Types have been generalized to allow
formal parameters, so that arrays can
have bounds which are fixed only when they are created, and variant records can be handled in a type-safe manner. Records are
generalized to include constant components.
Modules: A new kind of record, called a module,
can contain routine and type components, and
thus provides a facility for modularization. The module can include initialization and finalization statements which
are executed whenever a module variable is created or destroyed.
Constants: Euclid defines a constant to be a literal, or an
identifier whose value is fixed throughout the
scope in which it is declared.
For statement: A generator can be declared as a module
type, and used in a for statement to enumerate a sequence of values.
Loopholes: features of the underlying machine
can be accessed, and the type-checking can be
overridden, in a controlled way. Except for the explicit loopholes, Euclid is designed to be type-safe.
Assertions:
the syntax allows assertions to be supplied at convenient points.
Deletions: A number of Pascal features have been
Omitted from Euclid: input-output, seals,
multi-dimensional arrays, labels and gotos, and functions and procedures as parameters.
The only new features in the list which can make
it hard to convert a Euclid program into a legal
Pascal program by straightforward rewriting are parameterized types, storage
allocation, finalization, and some of
the loopholes.
There
are a number of other considerations which influenced the design of Euclid:
It is based on current knowledge of programming
languages and compilers; concepts which are not
fairly well understood, and features whose implementation
is unclear, have been omitted.
Although program portability is not a major goal of
the language design, it is necessary to have
compilers which generate code for a number of different machines, including mini-computers.
The object code must be reasonably efficient,
and the language must not require a
highly optimizing compiler to achieve an acceptable level of efficiency in the object
program.
Since the total size of a program is modest, separate
compilation is not required (although it is
certainly not ruled out).
The required run time support must be minimal, since it
presents a serious problem for verification.
2. Summary
of the language
"Be
sure of it; give me the ocular proof."
Othello III,
iii, 361.
This section contains a summary of Euclid. The
information here is intended to be consistent
with the remainder of the report, but in case of conflict the body of the
report (sections 3-12) governs. Because it is a
summary, many details are omitted, and some general
statements are made without the qualifications which may be found in the body of
the report.
An algorithm or computer program consists of two
essential parts, a description of actions which
are to be performed, and a description of the data which
are manipulated by these actions. Actions are described by statements, and data are described by type definitions. A data type essentially
defines a set of values and the actions which may be performed on elements of that set.
The data are represented by values. A value may be constant, or it
may be the value of a variable. A value occurring in a
statement may be represented by a literal constant, an identifier which has been declared to be constant, an identifier which has been declared as a variable, or an expression containing values. Every
identifier occurring in the program must be
introduced by a declaration.
A constant or variable declaration associates
with an identifier a data type, and either a value or a
variable.
In general, a definition specifies
a fixed value, type, or routine, and a declaration introduces
an identifier and associates some properties with
it. A data type may either be directly described
in the constant or variable declaration, or it may be referenced by a type identifier, in which case this identifier must be
introduced by an explicit type declaration.
A constant declaration associates
an identifier with a value; the association cannot be changed within the scope of the declaration. If the value
can be determined at compile-time, the constant
is said to be manifest;
the expression defining a manifest
constant must contain only literal
constants, other manifest constants, and built-in
operations.
An enumercited type
definition indicates an ordered set of values, i.e., introduces identifiers standing for each value in the set. The simple data types are the enumerated types, the subrange types, and the four standard simple types: Boolean,
integer, char and StorageUnit. The real type has been omitted. For
the first three, there is a way of writing literal
constants of that type: True and False for Boolean, numbers for integers, and
quotations for characters. Numbers
and quotations are syntactically distinct from
identifiers. The set of values of type char is
the character set available in a particular implementation.
The type StorageUnit has values which occupy the minimum unit in which storage allocation is done; this may of course differ from
one implementation to another. Since no operations are defined on StorageUnit
values, nothing more need he said about them.
A type may also be defined as a subrange of a simple type by indicating the smallest and the largest value of the subrange.
Structured
types are defined by
describing the types of their components, and indicating a structuring method. The various structuring methods differ in the
selection mechanism serving
to select the components of a variable of the structured type. In Euclid, there
are five basic structuring
methods available: array, record, module, set, and collection.
In an array structure, all components are of the same type. A component
is selected by an array selector, or computable index. The index type, which must be simple, is
indicated in the array type
declaration. It is usually a programmer-defined enumerated type, or a subrange of the type integer. Given a value of
the index type, an array selector yields a variable or constant of the component type. Every array structure can
therefore be regarded as a
mapping of the index type into the component type.
In a record
structure, the components (called fields) are not
necessarily of the same type. In order that the
type of a selected component be evident from the program text (without executing the program), a record selector is not a
computable value, but instead is an identifier uniquely denoting the component
to be selected. These field identifiers are declared
in the record type definition. Records may include constant as well as variable
components; manifest constant components, of
course, do not need to be stored in each record
instance.
A
record type may be specified as consisting of several variants. This implies that different record values, although declared to be of the
same type, may assume structures which differ in a certain manner. The difference may consist of a
different number and different types of components. The variant which is assumed by a record value is
indicated by a constant of some
simple type which is called the tag.
A module
structure is much like a record, but
may include routines and types as components.
In this way, the operations which are defined on a data structure can be conveniently packaged with the structure.
Module components cannot be accessed outside the module body unless they are
explicitly exported. Thus in a properly written program it is evident from the lexical structure how the state of a
module can be altered.
A set structure defines the set of values which is the powerset of its
base type, i.e., the set of all subsets of
values of the base type. The base type must be a simple type.
Variables declared in explicit declarations are
called static. The
declaration associates an identifier with the
variable, and the identifier is used to refer to the variable. The language guarantees that two identifiers which can legally be used
in the same scope cannot refer to the
same variable, or to overlapping variables. Thus, an assignment to an
identifier cannot change the value of any
other identifier accessible in the same scope.
In contrast, variables may be generated by an
executable statement. Such a dynamic
generation yields a pointer value
(a substitute for an explicit identifier) which subsequently serves to refer to the variable. This pointer may be
assigned to other variables, namely variables
of type pointer. Each pointer variable may assume values pointing to variables
in a single collection
C, all of whose members are of the same type. It
may, however, also assume the value C.nil, which points to no variable. Because
pointer variables may also occur as components'
of structured variables, which are themselves dynamically generated, the use of pointers permits the representation of finite
graphs in full generality. Although the
language cannot guarantee in general that two pointer variables do not refer to
the same variable, it can make this guarantee for two
pointers in different collections.
A zone can be associated with each collection to provide
procedures for allocating and deallocating the storage required by variables in
that collection; if the zone is omitted, a standard
system zone is used. The program may free a dynamic variable explicitly, in
which case the program is responsible for ensuring that there will be no
further references to the non-existent
variable. .Alternatively, the collection may be reference-counted, in
which case each variable is
automatically freed when no pointers to it remain. The main advantage of reference-counted variables, as compared with explicit
deallocation, is that the correctness of the
deallocation does not have to be verified.
Throughout this report, the word variable means
a container which can hold a value of a specific
type. A variable may or may not be associated with an identifier. A constant, by contrast, is simply a value of a specific type. The
fundamental difference is that assignment to a
variable is possible.
A type declaration may have formal parameters;
such a parameterized declaration
represents a set of types, one of which
is specified each time the type is referenced and actual parameters are supplied for the formals.
Two types are the same if their definitions are
identical after any type identifiers which are not
opaque have
been replaced by their definitions, and any actual parameters and any identifiers declared outside the type have been replaced
by their values. A type identifier is opaque
if it is a module type, or is exported from a module.
The most fundamental
statement is the assignment
statement. It specifies that a newly
computed value be assigned to a variable (or a
component of a variable). The value is
obtained by evaluating an expression. Expressions consist of variables, constants, sets,
operators and functions operating on the denoted
quantities and producing new values.
Variables, constants, and functions are either
declared in the program or are standard entities.
Euclid defines a fixed set of operators, each of which can be regarded as
describing a mapping from the operand
types into the result type. The set of operators is subdivided into groups of:
1. arithmetic operators of
addition, subtraction, sign inversion, multiplication, division, and computing the remainder (mod).
2. Boolean operators of
negation (not), conjunction (and), disjunction (or) and implication
(->).
3. set operators of
union, intersection, set difference, and symmetric, difference (xor).
4. relational operators of
equality, inequality,
ordering, set membership and set
inclusion. The results of relational operations are of type Boolean.
The procedure statement
causes the execution of the designated procedure
(see below).
There
are two kinds of escape statements: an exit statement is used
to terminate a loop, and a return
statement to terminate a
routine. An escape statement may be qualified by a when clause, which
causes termination only if a Boolean expression is True.
Assignment,
procedure, and escape statements are the components or building blocks of structured statements, which specify sequential, selective, or repeated
execution of their components.
Sequential execution of statements is specified by the compound statement;
conditional or selective execution by the if statement and
the case statement; and repeated execution
by the loop statement and the for
statement. The if statement serves to
make the execution of a statement dependent on the value
of a Boolean expression, and the case statement
allows for the selection among many statements according to the value of a selector. The discriminating case statement provides a
safe way of discriminating the current variant of
a variant record. The for statement is used when a bound on the number of iterations is known beforehand, and the loop statement
is used otherwise.
A block
can be used to associate declarations with
statements. The identifiers thus declared have
significance only within the block. Hence, the block is called the scope of
these identifiers, and they are said to be local to the
block. Since a block may appear as a statement,
scopes may be nested. An if, case, for or loop statement, or a module type declaration, also defines a scope in a similar way.
A block can be named by an identifier, and be
referenced through that identifier. The block
is then called a procedure, and its declaration a procedure declaration. However,
an identifier which is not local to a given
procedure body is accessible in that body only if it is accessible in the immediately enclosing scope, and
it is pervasive in some enclosing scope or
it is
explicitly imported into the given procedure body.
A procedure has a fixed
number of parameters, each of which is denoted within the procedure by an
identifier called the formal
parameter, which is local to the
procedure body. Upon an activation of the procedure
statement, an actual quantity has to be indicated for each parameter which can be referenced from within the
procedure through the formal parameter. This
quantity is called the actual
parameter. There
are two kinds of
parameters: constant parameters and variable
parameters; routine and type parameters are not
allowed. In the first case, the actual parameter is an expression which is
evaluated once. The formal parameter represents a local
constant whose value is the result of this evaluation. In the case of a
variable parameter, the actual parameter is a variable and the formal parameter is bound to this variable. Possible
indices or pointers are evaluated before execution
of the procedure.
Functions are declared analogously to procedures; procedures and
functions are collectively called routines. The
main difference lies in the fact that a function yields a result, which may be of any assignable type and must be specified in the
function declaration. Functions may
therefore be used as constituents of expressions. Variable formal parameters
and imported variables are not permitted within function declarations; as a
consequence, functions cannot have side
effects.
Since Euclid is intended for the writing of
programs which are to be verified (either mechanically
or by hand), there are a number of explicit interactions between the language and the verifier, in addition to the many aspects of the
language which have been motivated by the
desire to ease verification. These explicit interactions fall into two main
categories:
embedding of assertions in
the program: the special symbols assert, invariant, pre and post introduce assertions. These may be written as comments
which are ignored by the compiler. Presumably
they will be used by the verifier, which can take advantage of their relationship to the structure of the
program. Alternatively, an assertion may be written
as a Boolean expression, which is compiled into a run-time check if the checked option has been specified for an
enclosing scope.
compiler-generated assertions: in cases where
the compiler needs to be able to assume
that some condition holds, but is unable to deduce that it does, the compiler
may generate an assertion (in a new listing of the program) for the verifier,
and then proceed as though confident of its truth. The
legality of the program will then depend
on the validity of the compiler-generated assertion. Each case in which such an assertion may be generated is spelled out in this
report.
3. Notation, terminology, and vocabulary
"The best words in the best
order."
Coleridge.
The syntax is described in a modification of
Backus-Naur form, in which syntactic constructs
are denoted by English words or phrases, not enclosed in any special marks. These words also suggest the nature or meaning of the
construct, and are used in the
accompanying description of semantics. Basic
symbols of the language are written in
boldface or enclosed in quote marks; e.g., begin
and ":". Possible repetition of a construct is
indicated
by enclosing the construct within metabrackets { and } . Possible
omission of a
construct is indicated by enclosing the construct
within metabrackets [ and ] The word
empty denotes the null sequence of symbols.
The grammar defining the syntax of Euclid is distributed
throughout this report; for convenient
reference, it has also been collected in Appendix A.
3.1 Vocabulary
The primitive vocabulary of Euclid consists of
basic symbols classified into letters, digits,
and special symbols. Note that this vocabulary is
not the character set. The character set is
implementation dependent, and each implementation
must define, in its character set, distinct
representations for all the basic symbols. Suggestions for doing
this in some common cases may be found in
section 13.
Each implementation must specify a single break character which can be used within an identifier.
Two identifiers are similar if they are composed of the
same sequence of characters, except for
changes from upper to lower case letters or vice versa, and for the presence or absence of break characters. The intended use of
the break character is to visually separate
an identifier into its component parts. In an implementation which can print upper and lower case letters, a transition from
lower to upper case can also be used for this
separation. It is recommended that this convention be used when possible, in preference to the explicit break character, for
implementations which have lower case letters;
obviously it cannot be used if the entire
identifier is upper case, for example. Thus, an
identifier might be represented as
alpha
Beta using
96-character ASCII, with capitalization as the break.
ALPH
A_BETA using the IBM PL/l character set,
with as the break.
ALPHA\BETA using
the Model 33 Teletype character set, with \ as the break.
All of these identifiers would be similar to the
identifier ALPHABETA. With these conventions,
it is possible to convert from one representation to another in a reasonable
way (see 13.).
Each time an identifier is used, it must be
written in exactly the same way (i.e., with the same
capitalization and use of break characters) as it was written when
it was declared.
However, another identifier which is similar according to the
above rules
may not be
declared in any scope in which the first
identifier is accessible (see 7.3).
The following capitalization convention is
generally used in this report: type and routine
identifiers begin with a capital letter (except
for certain standard types); other identifiers
begin with a small letter. This
convention is not part of the definition of Euclid, hOwever.
letter ::= "A" I "B" I
"C" I „D,. "E" I „F,. '°G" "I-I" I "I" I
"1" I "K" I "L,"
"M" I ,.N,. "0"
1 "R" I "Q" I "R" 1 "S" I ".I,,,
"U" I "V" I "W" I ,.X,.
"Y" 1
"Z"
1"a" 1 "b" 1"c" 1 "d" 1"e"
1"f" 1 "g" 1"h" I"i" I
"}" 1 "k" I "I" 1 "m" 1"u" 1 "0" I "p"
I "a" I "r"
I "s" I
IT° 1 "u" I "v" 1 "w" 1
"x" I "Y" I "z"
octalDigit ::= "0" 1 "1" I "2" I "3" 1 "4" 1 ""5.. "6" 1 "7"
digit ::= octalDigit
1 "8" I "9"
hexDigit ::= digit I "A" 1 .,B,. "C" I ,.D,. "E" I "F"
breakChar ::= <some
implementation-dependent character not a letter or digit>
specialSymbol ::=
"+" I
"" 1 "*" I "=" I "<" I ">" I "<="
1 ">=" "->"
1 "(" I ")" I
"{"
I„}„ 1 ":=” I
"" I "" ":"
I ":" 1„ ° " I "c"
I "=>"
I "<<=" I "S" I "#" I
wordSymbol
wordSymbol ::=
abstraction I aligned 1 all 1 and I any
I array I assert I at I begin I bind I bits bound
I case I checked 1 code 1 collection I const I counted I decreasing dependent I discriminating I div I else I elseif I
end I exit 1 exports I finally
for I forward 1 from 1 function I if 1
imports 1 in I include I initially 1 inline invariant 1 loop I machine I mod 1 module 1 net l
of 1 on 1 or 1 otherwise packed 1 parameter I pervasive I post I pre 1
procedure I readonly 1 record
return I returns I set I then 1 to 1 type I unknown 1 var 1 when I with I xor
The construct
"{" <any
sequence of symbols not containing "}"> "}"
may be
inserted between any two identifiers, literal constants (see 4.), or special
symbols. It is called a comment, and
may be removed from the program
text without altering its meaning.
The symbols "{" and "}" do not occur otherwise in the
language.
Any verification system
which accepts Euclid programs as input may define a convention for distinguishing comments which have special meaning for
the verifier. One reasonable convention is that every comment in which a
certain character appears as the first character after
the is
intended for the verifier.
The word routine
is used
as a synonym for the phrase "procedure or function."
3.2 Legality assertions
Throughout this report, various restrictions
will be placed on legal Euclid programs. Many of
these restrictions cannot be checked syntactically, and in some cases they
involve dynamic conditions that are
difficult (or impossible) to check statically. Nevertheless, programs that violate them are not considered to be meaningful Euclid
programs. It is the responsibility of the
compiler to verify as many of these properties as it can, and to produce
Boolean expressions called legality assertions for
those it cannot. Thus, any program whose legality assertions can all be verified is a legal Euclid program,
with well-defined semantics. Note that
legality assertions are produced only for conditions specified in this report.
If checked is specified for a scope and
not overridden by an inner not checked (see 6.2.3 and 9.2.1), all legality assertions in the block, and all
programmer-supplied assertions which are Boolean
expressions, are compiled into run-time checks, as an aid in detecting illegal programs, even before the verification process is
complete.
3.3
Lexical structure
The
text of a program is built up out of declarations and statements, collectively
called units, according to the syntax specified below. In
general units are separated by semicolons. The syntax is constructed in such a way that a unit may always be legally followed by a semicolon. In order to make it unnecessary to
write semicolons between units which appear on separate lines, a semicolon is automatically inserted at the end of a
line whenever the last token
of the line could be the end of a unit, namely one of:
identifier, literal constant, ), t, exit, return, forward, or
end possibly followed by if,
loop, case, record, or module,
and the
first token of the next line could be the beginning of a unit, namely one of:
identifier, literal constant, abstraction,
assert, begin, bind, case, const, exit, finally,
for, function, if, initially, inline, invariant, loop, machine, pervasive, procedure,
return, type, var or with.
Commas are used as separators in enumerated
types, case label lists, element lists, and parameter
lists, and within declarations in identifier lists, bind lists, and
import/export lists.
There are several kinds of brackets which are used to
group declarations and statements for various
purposes. The following list gives the unique closing bracket for each opening bracket.
|
if loop
case => begin code record module |
end
if end loop end case end
caseLabel end, or end
routineldentifier whenever the block is the body of a routine end routineldentifier. end record, or end
typeldentifier whenever the record definition is the declaration of a type identifier. end module, or end
typeldentifier whenever the module definition is the declaration of a type identifier. |
4. Identifiers,
numbers and strings
"And twenty more such names as these
Which never were nor no man ever
saw."
The Taming of the Shrew, Induction, ii, 95.
Identifiers serve to denote constants,
variables, types, and routines. Their association must be unique within their scope of validity, i.e., within the
scope in which they are declared (see 6, 7.3).
identifier
::= letter { letterOrDigit }
letterOrDigit
::= letter I digit I breakChar
The usual decimal notation is used for numbers,
which are the literal constants of the data type
integer (see 6.1.2.). Numbers may also be written in octal or hexadecimal
notation. Note that unsigned numbers are always positive;
a negative manifest constant can be written as an
expression, e.g., -14.
unsignedNumber
::= digit { digit } I
octalDigit octalDigit } "#8" I
digit { hexDigit } "#16"
Examples:
1 100 717#8 OCAD1#16
123#16
Sequences of characters enclosed by quote marks
are called literal
string constants. They are the literal constants of the standard type string (see
6.2.2). A character code, whether or not it
is in the printing character set, can also be represented in a literal string
constant as follows:
$ddd, where each d stands for a
decimal digit, represents the character code with the decimal representation ddd. Note: char.Ord($$ddd) = ddd (see
6.1.2).
For convenience, $S, $T, $N, $$, $' represent space, tab,
newline, $, and ' respectively. The $ddd construction can
be used only in a machine-dependent module (see 6.2.3).
literalString
::=" ' " extendedCharacter } " ' "
extendedCharacter
::= character I "$" extension
extension ::= digit
digit digit I "S"
I "T" I "N" I "$" I "
Examples:
1.1 'Here comes a null: $000 and there it went'
'Euclid'
• 'THIS IS A STRING"This$Sis$Sa$Sstring'
A single character preceded by a dollar sign is a
literal constant of the standard type char (see
6.1.2). The $
convention may also be used in these constants.
literalChar
::= "$" extendedCharacter
Examples:
$a $$S
{space character} $$000 {the NUL character} $" $$$
Manifest constants
"One here will constant be, Come wind, come weather."
Pilgrim's
Progress
A manifest constant is a literal constant or an
expression which can be used in place of a literal
constant. The value of a manifest constant can be computed in a straightforward
way at compile time. Thus, a manifest constant expression
may not involve any functions, except the
standard ones defined in this report. A general constant, by contrast, has a
value which is fixed during the lifetime of the scope
in which it is defined, but may be computed in an
arbitrary way at the beginning of that lifetime.
literalConstant ::=
unsignedNumber I literalString I literalChar I enumeratedValueld manifestConstant ::= literalConstant I
manifestConstantExpression manifestConstantExpression
::= expression
Examples:
-100 'Euclid' red 3*Co/or.Ord(Colorlast)
6. Data types
"What
is written without effort is in general read without pleasure."
Johnson
A data type determines the set of values which
variables and constants of that type may assume,
and the set of basic operations that may be performed on them. A type
declaration associates an identifier
with the type. The type identifier is considered to denote the same type as its definition, unless it is declared as a module
type (see 6.2.3) or is exported from a module, in which
case the identifier denotes a different type; type equivalence is discussed in detail in 6.4. Parameterized types are introduced in
6.3.
A type declaration introduces a new scope in
which the formal parameters of the type, if any,
are declared (see 6.3). If the type definition is a module type, the definition
is a second, closed scope (see 7.3), and identifiers
declared outside it are inaccessible unless imported.
If the type definition is not a module type, however, its scope is the open
scope of the declaration (or possibly a second, open
scope for a record definition), and importing is
not necessary (or possible).
An identifier must be declared before it is used. When there are mutually recursive routines or types, however, it is impossible to give the definition of every identifier before its
use. In this situation, a definition of forward may be given instead, and later
another declaration, of the form type (or procedure
P=..., or function Fr...) must appear to provide
the true definition. Between its forward and true definitions, or within its
own definition, a type may only be used as the object
type of a collection, or to declare a formal parameter.
If an identifier declared with forward has parameters, these must appear in the
forward definition and must not be repeated in the true "definition.
A type declaration may not contain variable
identifiers which are free, i.e., declared outside the type declaration, except that a module definition may
explicitly import variables. As a consequence,
an unparameterized type identifier denotes the same type throughout the scope
in which it is declared, and the type denoted by a parameterized type
identifier depends only on the values of the actual
parameters. Note, however, that the same type identifier may denote different types in different scopes, e.g., in
different instantiations of a routine or a module.
All types (except integer) automatically acquire
components when they are declared. For example,
an array type T has the component T.lndexType (see 6.2.1). Any component
of a type is automatically also a component of every
constant or variable of that type. Thus if e is a
constant or variable of type T, eindexType is the
same as T.IndexType.
This report specifies the standard representations, in terms of bits, for the values of
certain types. These specifications are given so that
machine-dependent records and machine-code procedures
can be sensibly defined, and so that the effect of an explicit type conversion
can be predicted. The following contexts are defined
as sensitive:
a variable component of a
machine-dependent record (see 6.2.4) a variable declared at a fixed address
(see 7.)
an actual parameter or
result of a machine-code routine (see 10.)
the actual parameter or
result of an explicit type conversion (see 6.5)
A type may not appear in a sensitive context
unless its standard representation is specified, either
in this report, or explicitly by the implementation. Except in these sensitive
contexts, there is no way for a Euclid program to determine
the standard representation of any value, and an
implementation is therefore free to use other representations, provided that it
converts each value to the standard
representation when it appears in a sensitive context.
type ::= simpleType I
structuredType I pointerType I parameterizedTypeReference
typeDeclaration ::= type typeldentifier formalParameterList =
preAssertion typeDefinition
typeldentifier
::= identifier
typeDefinition ::= type I forward
There are two components
implicitly declared for each type other than integer:
T.size the result, of type integer, is the number of
StorageUnits (see
6.1.2) required for the representation of a
variable of type T.
T.alignment the result, of type
integer, is the required alignment of variables
of type T, in StorageUnits.
Thus, if p :
pType is a pointer to such
a variable, then (AddressType<<=pType(p)) mod
pType.alignment=0.
There is a component implicitly declared for each variable or constant: x.itsType the type of x
6.1. Simple types
There are no type variables in Euclid. However,
a type may be a component of a module (see
6.2.3), and hence may be referenced by a field designator (see 7.2.2), as well
as by an identifier.
simpleType ::= enumeratedType I standardSimpleType I subrangeType I derivedSimpleType
derivedSimpleType ::= [ containingVariable
"." I simpleTypeldentifier simpleTypeldentifier
::= identifier
6.1.1.
Enumerated types
An enumerated type defines an ordered set of
values by enumeration of the identifiers which
denote these values. There must be at least two such identifiers. The
identifiers are declared as constants in the current scope. if the current
scope is a type declaration of type T, for which the
enumerated type is the definition, however, the identifiers are declared in the enclosing scope instead, i.e., the scope in which T is declared. In any case, the identifiers may not be used for any other purpose in the scope in
which they are declared.
The standard representation of the ith identifier
(counting from 0) in the enumeration is the
same as the representation of the unsigned integer i. Thus, if T is an enumerated
type,
T.first is represented
exactly like the integer 0.
enumeratedType ::=
"(" enumeratedValueld { "," enumeratedValueld }
")" enumeratedValueld
::= identifier
Examples:
type Color = (red, green, blue, orange, yellow, purple)
type Suit = (club,
diamond, heart, spade)
type Day = (Monday,
Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday) type SexType = (female, male)
type Classification =
(confidential, secret, topSecret)
type Device = (disk, display, keyboard, printer, tape)
Components implicitly declared for each enumerated and subrange (see
6.1.3) type T are:
the
first value (in the enumeration) the last value (in the enumeration)
the value succeeding x (in the enumeration).
T.Succ(T.last) is undefined if T is an enumerated type, or T is a subrange of an enumerated
type U and T.last=U.last.
the value preceding x (in the enumeration).
T.Pred(T.first) is undefined if T is an enumerated type, or 7' is a subrange of an enumerated type U and
T.first=U.first.
an
unsignedlnt which is the ordinal number of the value x in the enumeration of T. Thus,
T.Ord(T.first)=0.
For instance, Suit.last is spade, and Day.first is Monday.
6.1.2.
Standard simple types
The following types are standard in Euclid, and are
pervasive throughout all programs (see 7.3):
integer Its values are the positive and negative
integers, in the mathematical sense. It is not
possible to declare a variable to be of type integer. instead, variables can be declared to be of some suitable subrange type. Constants
of type integer may be declared, however, and numbers
are literal constants of type integer.
|
signedlnt, unsignedlnt |
Every
implementation has two standard types, signedlnt and unsignedlnt; they are ordinary subranges of integer which are pre-defined
for convenience. These are implementation-defined; the intention is
that they should be large subranges of integer type which can be handled
efficiently by the machine, and which contain: for signedlnt. equal numbers of positive and negative
numberS, or perhaps one more
negative number. for unsignedlnt, 0 and no
negative numbers. |
|
Boolean char |
An operation is called well-behaved if its operands are in the range signedint,
or unsignedInt, and it yields an integer result in the same range. An implementation must support the evaluation of
any expression in which all
the operations are well-behaved (see 8.1). An implementation may also support the evaluation of expressions involving larger
integers (e.g., double-precision
integers), but this is not required. It is recommended, however, that on machines which have unsignedInt.0..216-1,
at least double-precision
integers should be supported. If T is defined by
type T = m n, for any manifest constants
m, n>=0, and i is a value of
type T, the standard representation of i is the ordinary binary representation of
the integer i, filled
out on the left with any number of extra
zeros. The standard representation of a signed integer must be defined by
the implementation, but is not defined in this report. There
are two standard functions defined on any subrange of integer: Abs(x) returns an integer which is the
absolute value of x. Odd(x)
returns a Boolean which is True if and only if x is odd. Its values are the truth values denoted by the
identifiers False and True. It is defined
by type Boolean = (False, True). It is an enumerated type whose values are a set
Of characters determined by the
implementation. They are denoted by the characters themselves preceded by
a dollar sign (see 4.). There is a standard function Chr(x:
0..char.Ord(char.last)) with the property that
char.Ord(Chr(x))=x. Warning: the ordering of the values of type
char is implementation dependent. Use of this
ordering in comparisons of chars, subranges of chars, or the Chr and char.Ord functions, will in general
result in non-portable programs. Unlike most
other machine-dependent features of Euclid, this one is
not restricted to machine-dependent modules. |
StorageUnit
This is the basic unit for storage allocation (see 6.3). There are no distinguishable values of this type, and no operations
are defined on this type. Thus, a StorageUnit variable
simply serves to occupy a known amount of
space. The standard representation of a StorageUnit is not defined.
There are two standard components of the type StorageUnit:
sizeinBits, an integer constant which defines the number
of bits in a StorageUnit.
Address, a function
declared by
function Address (A:array 0..n of StorageUnit) returns
Add ressType,
which returns the machine address of A(0). It may only be
used in a machine-dependent module.
AddressType This is an unsigned subrange of
integer, large enough to hold a full machine address;
i.e., a value returned by the function StorageUnit.Address.
6.1.3.
Subrange types
A type may be defined as a subrange of another
simple type by indication of the smallest and the
largest value in the subrange. The first constant specifies the lower bound,
and the second the upper bound. If type A is a subrange of type B, and
type B is a subrange of type C, we
say that A is also a subrange of C. The Succ, Pred, first and last
components are defined for all subrange
types. If A is a subrange of B, and a is of type A and b is of type B, and a=b, and the standard representation of b is defined, then the standard representation of a is the same as the standard
representation of b.
subrangeType ::= constantSum ".."
constantSum constantSum ::= sum
Examples:
type OneToOneHundred = 1 .. 100
type SymmetricRange = -10 .. 10
type Primary = red .. blue {the
values of a Primary are red, green, and blue} type ScreenPosition = 1 .. 525 {y coordinate
for display screen}
6.2. Structured types
A structured type is characterized by the type(s)
of its components and by its structuring method.
Moreover, a structured type definition may contain an indication of the
preferred data representation: if a definition is prefixed with the symbol
packed, this is a hint to the compiler that
storage should be economized even at the price of some loss in efficiency of
access, and even if this may expand the code necessary for accessing components
of the structure.
Adding occurrences of packed
may make a legal program into an illegal one
(because of type compatibility (see 6.4) or
if a component of the structure has been renamed as an entire variable (see 7.4)), but will not otherwise change the
meaning of the program.
structuredType
::= [ packed unpackedStructuredType
I derivedStructuredType
unpackedStructuredType arrayType
I recordType I moduleType I
mdRecordType I setType I
collectionType
derivedStructuredType ::= [containingVariable
"."] structuredTypeldentifier structuredTypeldentifier
::= identifier
6.2.1.
Array types
An array type is a structure consisting of a
fixed number of components which are all of the same type, called the component type. The elements of the array are designated by indices,
values belonging to the index type. The array type definition specifies both the component
type and the index type.
The
standard representation of an unpacked array A of type array I of C is defined as follows. Let
t.C.size+(C.alignment-1), and s=t-(t mod
C.alignment). Then
successive
components
of A
occupy successive groups of s StorageUnits, with no unoccupied StorageUnits in between. A(I.first) occupies the
first s StorageUnits, i.e.,
the ones with the smallest
machine addresses, and A(I.last) occupies the last s StorageUnits, i.e., the ones with the largest
machine addresses. Each component is aligned in the same way as a variable of type C. The entire array thus occupies
max(0,e(Llast - /.first + 1)) StorageUnits. The standard representation of a packed array is not defined.
arrayType ::= array
indexType of componentType indexType ::= simpleType
componentType ::= type
There are two standard
components of an array type T:
T.IndexType the
index type
T.ComponentType the
component type
Like
other components of types, they are also components of any variable or constant
of the type (see 6.). Thus if a is a variable of type T, then aindexType is the same as T.IndexType, and likewise for ComponentType.
|
Examples: type
type type type |
ArrayO = array 1 .. 100 of signedInt Arrayl = array -10 .. 10 of 0 .. 99 Array2 array Boolean of Color NameTable = array OneToOneHundred of string(50) |
6.2.2.
Record types
A record
type is a structure consisting of a fixed number of components, possibly of different types. The record type definition
specifies for each component, called a field, its type
and an identifier which denotes it. The scope of these field identifiers
is the record definition itself. They are also accessible
within a field designator (cf. 7.2) referring to a record variable or constant of this type. Record
components may be constants or variables. Components other than variables are accessible within a field designator
referring to the type
itself, as well as in a designator referring to a variable.
For the syntax of constant and variable
declarations, see 7. If the record type appears as the definition of a type identifier, it must end with the
clause end identifier;
otherwise it must end with end record.
The size of a record containing an unpacked array
is equal to some constant value, independent
of the array size, plus the size of the array. The standard representation of a
record is not defined.
recordType
::= record fieldList endRecord endRecord end
record I end identifier
variantPart ] [ , ]
fieldList ::= , [ recordDeclaration [";"] ] [
recordDeclaration ::= pervasive recordDeclarationPart
"," per-Vasive recordDeclarationPart
}
recordDeclarationPart constantDeclaration
I variableDeclaration
pervasive
::= pervasive I empty
A record type may have several variants. In this case a constant of some simple type must be used as a selector in a case construction which
enumerates the possible variants. This constant
is called the tag,
and its value indicates which variant is assumed
by the record variable at a given time.
Each variant structure is identified by a case label which is a set of manifest constants of the type of the tag. Usually the
tag will be a formal parameter of the type
declaration in which the case appears (see 6.3). When a variable of the record
type is declared, however, the tag must be a manifest
constant, or any.
The case label lists must be disjoint.
Furthermore, the union of the lists must exhaust the enumerated type of the tag, unless there is an otherwise
variant, in which case all the tag values not
mentioned explicitly are lumped under that variant. The case label following
the end of the variant must be one of the labels specified by the
case label list.
variantPart
::= case tag of variant { ";" variant } [ otherwiseVariant ] [
";" ] end case variant caseLabelList
"=>" fieldList end caseLabel I empty
caseLabelList caseLabel ","
caseLabel }
caseLabel ::=
manifestConstant I subrangeType
tag ::= constant
otherwiseVariant ::= otherwise ".>" fieldList
Examples:
type Date = record
var day 1 .. 31
var month: (Jan, Feb,
Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec) var year 1900 .. 2100
end Date
type stream (dev: Device) = record
case dev of
display =>
var first, last:
DisplayControlBlock {not
defined in this report} var height: ScreenPosition ScreenPosition.first
var nLines: 0 .. (ScreenPosition.last)/8
end display
tape,
disk =>
var file: FileHandle {not defined in this report}
var position..
unsignedInt := 0
var buffer array 0 .. 255 of char
end tape
keyboard =>
var buffer. array 0 .. 20 of char
var bufFirst, bufLast:
buffer.IndexType :=
buffer.IndexType.first end keyboard
otherwise => {null fieldList}
end case
end stream
There are three pervasive
standard types having to do with strings, declared as follows.
type Stringlndex = 1 stringMaxLength
type StringLength = 0 stringMaxLength
type string(maxLength: StringIndex) = record var length: 0 .. maxLength 0;
var text: packed array 1 .. maxLength of char end string
Literal string constants
are of type string, with length equal to the number of
characters. The maxLength of a literal string constant is chosen to make its type
suitable for the context in which it appears; it must not be less than the length, however. Routines can readily be defined to extract substrings, do pattern matching, or
perform any other desired operations on
strings (see 10. and 11. for examples). Furthermore,
these routines might be
machine-coded for
efficiency. The value of
the pervasive manifest constant
stringMaxLength is implementation-defined.
6.2.3
Module types
A module type is a generalization of a record
type. Module components may be declared as constants,
variables, routines, or types. These declarations have the same form and the
same meaning as the declarations in a block. Thus, a module
serves as a package for a collection of
related objects. A module type is always opaque; i.e., it is not the same as
any other type (see 6.4). Thus, two different module definitions always define
different types, even if the definitions are
textually identical.
Identifiers declared in a module are not known
outside unless they are exported explicitly, so the packaging supplied by the module also provides
protection against improper use of components
which are intended to be known only within the module definition. An exported identifier x is
accessible (but only within a suitable field designator) in any scope in which the module type is accessible. A constant
identifier is always exported as a constant;
this may be specified by a binding condition of const, or the binding condition
may be omitted. A variable identifier may be
exported as a variable, using the binding condition
var, or as a readonly variable, using the binding condition
readonly or omitting the binding condition (see 7). A variable may not be
exported as a constant. No two variables in the
union of the import and the export lists of a module may overlap (see 7.4).
A constant or variable identifier may not be
exported unless its type is exported, or is accessible
in the scope S enclosing the module definition; the itsType component of an exported constant or variable is automatically exported.
Similarly, a routine identifier may not be
exported unless the types of its parameters and results are exported or
accessible in S.
When a type T is
exported from a module M, it is always opaque outside
its defining module; i.e., it is not the
same as any other type (see 6.4). Thus all operations on values of type T, other than possibly assignment and equality tests, must be
performed by routines defined in M. The export item for
a type may specify, in an appended with clause, that it is to be exported with assignment, equality or both; an
operation not specified is not exported. If
assignment is not exported, then T is not assignable;
i.e., outside M constants or constant parameters
of type T cannot be declared, and variables of type T cannot appear on the left side
of an assignment statement. Furthermore, Ts field identifiers (including
standard ones, like size) are not accessible outside 41,
unless the with clause specifies that they should be exported. If T is an array type,
subscripting is exported if and only if the IndexType
component is exported. If T is an enumerated type, its enumerated value identifiers
are not exported unless they are mentioned explicitly in
T's with clause; subrangeType is permitted in an
exportltem to facilitate this. Only an exported type identifier may have a with
clause.
Values of the module type itself may not be
assigned or compared for equality unless := or =,
respectively, appear in the export list. Furthermore, := and = may not appear
in the export list if the module imports any variables.
The identifiers which may be exported from a module
are those which are accessible in the scope between the end of the module's
declarations and the end of the module; loosely, these are just the identifiers -declared at the outermost level
in the module.
A module is a closed scope; i.e., identifiers
declared outside the module are not known inside unless
they are known in the immediately enclosing scope, and either are pervasive or
are explicitly imported into the module (by the
imports clause in the module definition, or because
they appear in the formal parameter list of the module declaration). Importing
is discussed in detail in 7.3.
The optional identifier following module in the type definition may be used within the definition to name the entire value, i.e., for
self-referencing.
A module type may be prefixed with the symbols machine dependent, which simply means that it may contain other
machine-dependent module and record declarations, fixed addresses in variable declarations (see 7.), extended
characters of the form $ddd (see 4.), type converters
involving types with implementation-dependent representations (see 6.5), uses of StorageUnit.Address (see 6.1.2), and machine-code
procedure declarations (see 10.); a module
which is not machine-dependent may not contain any of these things. Thus it is possible to tell from the heading of a module type
whether it contains any machine
dependencies or not. Note that machine-dependent
identifiers exported from a machine-dependent
module, as well as the machine-dependent module type itself, may be used in other modules which are not machine-dependent.
Since exported types can only be
manipulated by exported routines, the routines serve to encapsulate the
machine-dependencies.
The standard representation
of a module is not defined.
moduleType ::= [ machine dependent ]
module [ identifier ]
importClause exportClause
moduleBody endModule
endModule ::= end module I end identifier
importClause ::= imports "("
importltem "," importltem } ")" I
empty
importltem ::= pervasive
bindingCondition identifier I typeConverter exportClause ::= exports exportList [";"] I empty
exportList ::=
"(" exportltem { "," exportltem} ")"
exportltem ::= bindingCondition identifier [ with
exportList ] I ":=" I "=" subrangeType
bindingCondition ::= const
I readonly I var I empty
A module may include an
initial action which is executed whenever a new variable of the module type is created, and a final action which is executed whenever such a variable
is
destroyed.
Note that if several module variables are declared, as in the program fragment
begin x:Ml; y.M2; z•Ml;
... end
the order of creation is x, y, z, and
the order of destruction is z, y,
x. This follows from the fact that the three declarations start three nested
scopes, all of which end at the end (see 7.3).
A module may also specify an invariant which is supposed to be true during the lifetime of the module variable (i.e., after the execution of the
initial action and before the execution of the
final action), except perhaps when one of the procedures of the module has been
called and has not yet returned. Like other
assertions, this one may be empty or a Boolean expression.
In the former case the content of the assertion is supplied in a comment. This comment is of course ignored by the compiler, but
presumably is interpreted by the verifier.
In the latter case the assertion must be a legal Boolean expression, but it
generates code only if the checked option is enabled for
the enclosing scope.
The abstraction function, necessary for one
approach to the verification of modules [Hoare 1972,
London et al. 1977], maps the variables inside the module into a value of the
module type. The need for this mapping in verification
arises because a variable of the module type is
represented inside the module by different variables. The body of the
abstraction function may contain
constructs outside Euclid in the same way that assertions may, i.e., within comments. The abstraction function is not callable
from a Euclid program.
moduleBody ::= checkedClause declaration [ ";" initialAction
invariant finalAction
checkedClause ::= checked
I not checked I empty
declaration ::= empty I
pervasive declarationPart { ";" pervasive declarationPart }
declarationPart ::= constantDeclaration I variableDeclaration I typeDeclaration
procedureDeclaration I functionDeclaration
initialAction initially
routineDefinition ";" I empty
invariant ::= [abstraction
functionDeclaration] invariant assertion ";" I empty
assertion ::= expression I empty
finalAction ::= finally
routineDefinition
";" I empty
The following example outlines how one might
package floating-point numbers and operations
on them in a module. Examples of complete modules may be found in section 12
and Appendix B.
var
Real: machine
dependent module
exports (Add, Subtract, Times, Div, Greater, const
zero, number, Value with
(:=, =, sign, exponent, mantissa))
type
Exp = -80#16 7F#16; type Mant = 0 OFFFFFF#16 type Value = machine
dependent record
var
sign (at
0 bits 0 .. 0): 0 .. 1 := 0 var
exponent (at
0 bits 1 .. 7): Exp := 0
var mantissa (at 0 bits 8 .. 31): Mant := 0 end Value
const
zero: Value := (0,0,0)
intine function number (m: signedInt, ex: Exp) returns
num; Value =
imports (Mara)
begin
if
m < 0 then num.sign := 0 else num.sign := 1
end if
num.mantissa
:= Abs(m); num.exponent := ex
end number
function
Add (const 1, r Value) returns sum: Value =
code ... end Add
inline function Greater (const 1, r Value) returns
Boolean =
imports
(signedInt<<=Value)
begin
{use
type converters to compare I & r as signedlnts}
return True when (signedInt<<=lia/ue(0)
> (signedlnt<<=Value (r)) return False
end Greater
•••
end Real
6.2.4
Machine-dependent records
A machine-dependent
record type is a restricted kind of record type which allows the programmer to. specify
the exact position and size of each variable field. The position is specified in StorageUnits, where
the first StorageUnit of the record is numbered 0, and then in bits, where the first bit of the
specified StorageUnit is numbered 0, and the bit numbering continues to successive StorageUnits in
the obvious way. The ordering of bits in a StorageUnit is implementation-defined. If the bits
clause is omitted, the field occupies an integral number of StorageUnits, and its size is computed
from the size of its type; the value is right-justified in the field. The compiler's
responsibility is to check that fields do not overlap and that each field is at least large
enough to hold values of its type. The size of a value of machine-dependent record type is
determined by the largest position specified by the declaration, where the value of the at clause
(in StorageUnits) is added to the ending value of the bits clause (in bits). An
implementation may place restrictions on how fields overlap natural storage
boundaries. The index type specifying bits must be a subrange of integer, and must be manifest
except possibly for the last component of the record.
A machine-dependent
record may have constant components like an ordinary record. It may not have
any parameters. All its variable components must have position specifications, and they cannot
be exported or passed as variable parameters. Furthermore, they must all have types whose
standard representation is specified. Note that the standard representation of a
machine-dependent record is specified, but the standard representation of an ordinary record is not.
An alignment clause,
aligned mod a, in a machine-dependent record declaration forces a value of the record type to be
allocated so that the machine address of its first StorageUnit is 0 mod a; a must be a power of 2.
The module in which a
machine-dependent record type appears must be a machine-dependent module.
mdRecordType
::= machine dependent record [ alignmentClause ] mdDeclarationPart ";"
mdDeclarationPart } endRecord
mdDeclarationPart ::=
constantDeclaration I
var identifier "C at manifestConstant [ bits
simpleType ")" ":" typeDefinition [ initialization

6.2.5.
Set types
A set type defines the range of
values which is the powerset of its base type. Base types must be simple
types. Operators applicable to all set types are:
union
set
difference: i appears in a-b if and only if it appears in a and
not in b. xor symmetric
difference: i appears in a xor b if and only if it appears
in
exactly one of a and
b.
intersection
in membership
<=, >= set inclusion
Sets can be built up
from values of the base type as described in section 8. The standard representation of a set S : set
of B is defined if. B is a manifest type and the standard representation of values of B is
defined. It is a sequence of n significant bits, where n=(B.last-B.first+1), preceded by
any number of insignificant zero bits. If the significant bits are numbered 0, 1, ..., n-1, then bit i is one if
and only if x is in S and B.Ord(x)=i.
setType
::= set of baseType
baseType
simpleType
There is one standard component of a set
type T: T.BaseType the
base type
Examples:
type Hue = set of Color
type SubtractivePrimaries = set
of red .. green type SymSet
= set of -5 .. +5
type
EntrieslnUse = set of Arrayl.lndexType
6.2.6.
Pointer and collection types
A variable which is declared in a program (see
7.) is referred to by its identifier. The variable exists during the entire
lifetime of the scope to which it is local, and such a variable is therefore called static. In contrast, variables may also be generated dynamically, i.e., without much correlation to the structure of the
program. These dynamic
variables are generated
by the standard procedure component New described below; since they do not occur in an explicit variable declaration, they cannot be
referred to by an identifier. Instead, they may
be referred to by a pointer value which is provided by
New when the dynamic variable is generated.
A pointer type thus consists of an unbounded set of values pointing to elements of the same type. No operations are
defined on pointers except the test for
equality, the pointer following operator t which yields the variable
referred to by the pointer, and the standard
function component, Index, which converts a pointer into an integer.
The standard representation of a pointer is the same as
the standard representation of an AddressType.
A dynamic variable must be an element of a collection. A collection is not a type: it is a variable which behaves very much like an array variable.
Just as an element of an array variable A can be referenced by subscripting A with an index whose type is the index type of A (A.lndexType), so an element of a collection C can be
referenced by subscripting C with a pointer
whose type is the pointer type of C (tC). There are two differences:
No two collections have the same pointer type.
Hence the pointer alone is sufficient to
specify the collection, and we allow pt as shorthand for C(p), where p is of type tC.
There are no operations which produce pointer
results, except the standard procedure C.New
which creates a new variable (or an explicit type conversion). Hence the storage allocation strategy for collections can
be quite different from the strategy for arrays.
The reason for having collections is that two
pointers to different collections are guaranteed to
point to different variables; two pointers to the same collection are either
equal, and point to the same variable, or unequal, and point to non-overlapping
variables. Hence collections are a means by
which the programmer can express some of his knowledge about the ways in which his program is using pointers. If he
prefers not to do this, or has no knowledge
about pointers to variables of type T which can be
expressed in this way, he can simply declare a
single collection of Ts and use it everywhere.
There are no operations on collections. A
collection may not be assigned to another collection.
In fact, there is nothing to do with a collection except to subscript it, or to
pass it as an actual parameter.
Associated with every collection is a zone which provides storage for its variables. A zone is a module variable with three special components (and
possibly other components):
a variable storageBlocks which
is a collection of a record type containing a special component (and possibly
other components):
theStorage,
a StorageUnit
a procedure Allocate(size,
alignment: unsignedlnt,
var pointer tstorageBlocks) a procedure Deallocate(pointer tstorageBlocks, size: unsignedlnt)
These components must not be exported; they are
intended only for use by the standard procedures
New and Free. A zone must be a machine-dependent module variable. Note that a collection es zone must be imported as a variable
into any scope in which C.New or C.Free
is called. If C is reference-counted, C.zone must also be imported as a
variable into any scope in which a
non-local variable of type tC is assigned to. Appendix B contains examples of module types which- implement zones.
A collection declared without a zone will get a standard
zone called SystemZone. This zone is not
pervasive (since it is a variable), but must be imported where it is needed.
A collection can be reference-counted, in which case a variable in the collection will be freed automatically when no pointers to it remain. The
optional manifest constant is an integer
which gives the maximum reference count which should be maintained; any
variable to which more than this number of pointers ever exists at one time may
never be freed. The Free procedure does not
exist for a reference-counted collection.
collectionType ::= [ counted [ manifestConstant ]
] collection of objectType [ in zone ]
objectType ::= type
zone ::= variable
pointerType ::= "t" collectionVariable
collectionVariable ::=
variable
There are six standard
components of a collection variable C:
a pointer which points to no variable at all.
the
object type. If there are any unknowns in the collection definition, C.ObjectType is a parameterized type
with formal parameters
corresponding to the unknowns, in the same order (see 6.3).
the zone (see above).
a function which takes a
pointer to C and returns an integer.
This function has only one defined property: it is one-to-one.
allocates a new variable v of type T in collection C and assigns
the pointer to v to
the pointer variable p. C.New
imports C as a variable. This procedure works by calling Allocate for the pointer's zone with T.size and
T.alignment as parameters. It
gets back a tstorageBlocks, and
uses the theStorage component
in this block as the first StorageUnit for the newly created variable. It is up to the verifier of the zone to ensure that a sequence of at least n free StorageUnits begins there if Allocate(n, i, p) was called, and that the storage allocated does not overlap with that of
any other variable. Any
initialization specified by the type of v