This document was made by OCR from a scan of the technical report. It has not been edited or proofread and is not meant for human consumption, but only for search engines. To see the scanned original, replace OCR.htm with Abstract.htm or Abstract.html in the URL that got you here.


Text Box: SIGPLAN
Notices
Text Box: Volume 12, Number 2, February 1977Text Box: Contents:Text Box: Report On The Programming Language EuclidText Box: by B. W. Lampson, J. J. Horning, R. L. London, J. G. Mitchell, and G. L. Popek


Report On The Programming Language Euclid

by B. W. Lampsonl, J. J. Hornin                                               and

g2, R. L. London3, J. G. Mitchell1,      G. J. Popek4

This report describes the Euclid language, intended for the expression of system programs which are to be verified.

Authors' addresses and support:

1.   Xerox Research Center, 3333 Coyote Hill Road, Palo Alto, CA 94304

2.   Computer Systems Research Group, University of Toronto, Toronto, Canada M5S 1A4 Supported in part by a Research Leave Grant from the University of Toronto.

3.   USC Information Sciences Institute, 4676 Admiralty Way, Marina del Rey, CA 90291 Supported by the Advanced Research Projects Agency under contract DAHC-15-72-C-0308.

4. 3532 Boelter Hall, Computer Science Department, University of California, Los Angeles, CA 90024 Supported in part by the Advanced Research Projects Agency under contract DAHC-73-C-0368.

The views expressed are those of the authors.


Table of contents

Preface                                                                                                                                     1

Acknowledgements                                                                                                                                                                                                                                    1

1. Introduction                                                                                                                           2

2. Summary of the language                                                                                                      4

3. Notation, terminology and vocabulary                                                                                   9

3.1 Vocabulary                                                                                                                          9

3.2 Legality assertions                                                                                                               10

3.3 Lexical structure                                                                                                                  11

4. Identifiers, numbers and strings                                                                                             12

5. Manifest constants                                                                                                                 13

6. Data type declarations                                                                                                            14

6.1 Simple types                                                                                                                        15

6.1.1 Enumerated types                                                                                                                                                             15

6.1.2 Standard simple types                                                                                                                                                       16

6.1.3 Subrange types                                                                                                                                                                 18

6.2 Structured types                                                                                                                   18

6.2.1 Array types                                                                                                                                                                       18

6.2.2 Record types                                                                                                                                                                     19

6.2.3 Module types                                                                                                                                                                    21

6.2.4 Machine-dependent records                                                                                                                                              24

6.2.5 Set types                                                                                                                                                                           25

6.2.6 Pointer and collection types                                                                                                                                              26

6.3 Parameterized types                                                                                                             29

6.4 Type compatibility                                                                                                              31

6.5 Explicit type conversion                                                                                                      32

7. Declarations and denotations of constants and variables                                                       34

7.1 Entire variables                                                                                                                    36

7.2 Component variables                                                                                                           36

7.2.1 Indexed variables                                                                                                                                                              36

7.2.2 Field designators                                                                                                                                                              37

7.2.3 Referenced variables                                                                                                                                                         37

7.3 Scope rules                                                                                                                          38

7.4 Binding                                                                                                                                40


8.   Expressions                                                                                                                       39

8.1 Operators                                                                                                                         43

8.1.1 Multiplying operators                                                                                                                                                   44

8.1.2 Adding operators                                                                                                                                                         44

8.1.3 Relational operators                                                                                                                                                     44

8.1.4 Other operators                                                                                                                     45

8.2 Function designators                                                                                  45

9. Statements                                                                                                                            46

9.1 Simple statements                                                                                                            46

9.1.1 Assignment statements                                                                                                                                                46

9.1.2 Procedure statements                                                                                                                                                   46

9.1.3 Escape statements                                                                                                                                                        47

9.1.4 Assert statements                                                                                                                                                          48

9.2 Structured statements                                                                                                      48

9.2.1 Compound statements and blocks                                                                                                                             48

9.2.2 Conditional statements                                                                                                                                                49

9.2.2.1 If statements                                                                                                                                                               49

9.2.2.2 Case statements                                                                                                                                                         50

9.2.3 Repetitive statements                                                                                                                                                   51

9.2.3.1 Loop statements                                                                                                                                                        52

9.2.3.2 For statements                                                                                                                                                           52

9.2.4 Other uses of binding                                                                                                                                                  53

10.  Procedure declarations                                                                                                    55

11.  Function declarations                                                                                                      58

12.  Programs                                                                                                                          60

13.   A standard for implementation and program interchange                                            63

13.1 Representation of basic symbols                                                                                 64

13.2 Standard format for programs                                                                                     64

13.3. Annotation                                                                                                                    65

14. Implementation notes                                                                                                          66

14.1 Identifiers                                                                                                                           66

14.2 Parsing                                                                                                                               66

14.3 One-pass translation                                                                                                           66

14.4 Routine parameters                                                                                                         66

14.5 Routines in modules                                                                                                      67

14.6 Constant components of records and modules                                                         67

14.7 Finalization                                                                                                                     67

14.8 Inline code                                                                                                                     67

14.9 Reference counts                                                                                                           68

14.10 Representation of pointers                                                                                                                      68

14.11 Parameterized types                                                                                                                                      68

References                                                                                                                              70

Appendix A. Collected syntax                                                                                                                                                                                            71

Appendix B. Zone Example                                                                                                                                                                                                  78


Preface

This report describes a new programming language called Euclid, intended for the expression of system programs which are to be verified. Euclid draws heavily on Pascal for its structure and many of its features. In order to reflect this relationship as clearly as possible, the Euclid report has been written as a heavily edited version of the revised Pascal report.

Proof rules for Euclid appear in a separate report [London et al 1977]. A Euclid implementation is under development by the System Development Corporation, 2500 Colorado Avenue, Santa Monica, California; information may be obtained from Mr. Hugh C. Lauer.

No implementation of the language has yet been completed, and no Euclid programs of any size have been written. As a result of experience in implementing and using it, changes in the language or its defining report may be made.

This is the third version of the Euclid report; earlier versions appeared in May 1976 and August 1976.

Acknowledgements

Obviously, we are greatly indebted to Wirth, both for the aspects pf the language which are copied from Pascal, and for the structure and much of the wording of the report. We are also much in debt to Hoare's work in the areas of programming language design, axiomatic methods, and program verification. In particular, we have tried to follow his suggestion that "the language designer should b_e,,,familiar with many alternative features designed by others, and should have excellent judgement in choosing the best... One thing he should not do is to include untried ideas of his own. His task is consolidation, not innovation." [Hoare 1973].

We have consciously borrowed ideas and features from Alphard [Wulf, London, and Shaw 1976], BCPL [Richards 1969], CLU [Liskov 1976], Gypsy [Ambler et al. 1976], LIS [Ichbiah et al. 1974], Mesa [Geschke and Mitchell 1975], Modula [Wirth 1976], and the SUE System Language [Clark and Horning 1973, Clark and Ham 1974]; other languages and suggestions for language features have also undoubtedly influenced our thinking. We have benefitted greatly from comments and criticisms by numerous individual colleagues on previous versions of the language and report, and from the comments of the implementors, especially Lauer.

We are grateful to Prof. Wirth, and to Springer-Verlag, for permission to use portions of the Pascal report [Wirth 1971, Jensen and Wirth 1975] in this report.

Our work has been significantly aided by the Arpanet, which allowed us to maintain effective and rapid communication in stating and resolving problems, in spite of the wide geographical distribution of the authors.


1. Introduction

"There is no royal road to geometry."

Proclus, Comment on Euclid, Prol. G. 20.

The programming language Euclid has been designed to facilitate the construction of verifiable system programs. By a verifiable program we mean one written in such a way that existing formal techniques for proving certain properties of programs can be readily applied; the proofs might be either manual or automatic, and we believe that similar considerations apply in both cases. By system we mean that the programs of interest are part of the basic software of the machine on which they run; such a program might be an operating system kernel, the core of a data base management system, or a compiler.

An important consequence of this goal is that Euclid is not intended to be a general-purpose programming language. Furthermore, its design does not specifically address the problems of constructing very large programs; we believe most of the programs written in Euclid will be modest in size. While there is some experience suggesting that verifiability supports other desired goals, we assume the user is willing, if necessary, to obtain verifiability by giving up some run-time efficiency, and by tolerating some inconvenience in the writing of his programs.

We see Euclid as a (perhaps somewhat eccentric) advance along one of the main lines of current programming language development: transferring more and more of the work of producing a correct program, and verifying its correctness, from the programmer and the verifier (human or mechanical) to the language and its compiler.

The main changes relative to Pascal take the form of restrictions, which allow stronger statements about the properties of the program to be made from the rather superficial, but quite reliable, analysis which the compiler can perform. In some cases new constructions have been introduced, whose meaning can be explained by expanding them in terms of existing Pascal constructions. The reason for this is that the expansion would be forbidden by the newly introduced restrictions, whereas the new construction is itself sufficiently restrictive in a different way.

The main differences between Euclid and Pascal are summarized in the following list:

Visibility: Euclid provides explicit control over the visibility of identifiers, by requiring .the program to list all the identifiers imported into a routine or module, or .exported from a module.

Variables: The language guarantees that two identifiers in the same scope can never refer to the same or overlapping variables. There is a uniform mechanism for binding an identifier to a variable in a procedure call, on block entry (replacing the Pascal with statement), or in a variant record discrimination. The variables referenced or modified by a routine (i.e., procedure or function) must be accessible in every scope from which the routine is called.

Pointers: This idea is extended to pointers, by allowing dynamic variables to be assigned to collections, and guaranteeing that two pointers into different collections can never refer to the same variable.

Storage allocation: The program can control the allocation of storage for dynamic variables explicitly, in a way which confines the opportunity for making a type error very narrowly. It is also possible to declare that some


dynamic variables should be reference-counted, and automatically deallocated when no pointers to them remain.

Types: Types have been generalized to allow formal parameters, so that arrays can have bounds which are fixed only when they are created, and variant records can be handled in a type-safe manner. Records are generalized to include constant components.

Modules: A new kind of record, called a module, can contain routine and type components, and thus provides a facility for modularization. The module can include initialization and finalization statements which are executed whenever a module variable is created or destroyed.

Constants: Euclid defines a constant to be a literal, or an identifier whose value is fixed throughout the scope in which it is declared.

For statement: A generator can be declared as a module type, and used in a for statement to enumerate a sequence of values.

Loopholes: features of the underlying machine can be accessed, and the type-checking can be overridden, in a controlled way. Except for the explicit loopholes, Euclid is designed to be type-safe.

Assertions: the syntax allows assertions to be supplied at convenient points.

Deletions: A number of Pascal features have been Omitted from Euclid: input-output, seals, multi-dimensional arrays, labels and gotos, and functions and procedures as parameters.

The only new features in the list which can make it hard to convert a Euclid program into a legal Pascal program by straightforward rewriting are parameterized types, storage allocation, finalization, and some of the loopholes.

There are a number of other considerations which influenced the design of Euclid:

It is based on current knowledge of programming languages and compilers; concepts which are not fairly well understood, and features whose implementation is unclear, have been omitted.

Although program portability is not a major goal of the language design, it is necessary to have compilers which generate code for a number of different machines, including mini-computers.

The object code must be reasonably efficient, and the language must not require a highly optimizing compiler to achieve an acceptable level of efficiency in the object program.

Since the total size of a program is modest, separate compilation is not required (although it is certainly not ruled out).

The required run time support must be minimal, since it presents a serious problem for verification.


2. Summary of the language

"Be sure of it; give me the ocular proof."

Othello III, iii, 361.

This section contains a summary of Euclid. The information here is intended to be consistent with the remainder of the report, but in case of conflict the body of the report (sections 3-12) governs. Because it is a summary, many details are omitted, and some general statements are made without the qualifications which may be found in the body of the report.

An algorithm or computer program consists of two essential parts, a description of actions which are to be performed, and a description of the data which are manipulated by these actions. Actions are described by statements, and data are described by type definitions. A data type essentially defines a set of values and the actions which may be performed on elements of that set.

The data are represented by values. A value may be constant, or it may be the value of a variable. A value occurring in a statement may be represented by a literal constant, an identifier which has been declared to be constant, an identifier which has been declared as a variable, or an expression containing values. Every identifier occurring in the program must be introduced by a declaration. A constant or variable declaration associates with an identifier a data type, and either a value or a variable.

In general, a definition specifies a fixed value, type, or routine, and a declaration introduces an identifier and associates some properties with it. A data type may either be directly described in the constant or variable declaration, or it may be referenced by a type identifier, in which case this identifier must be introduced by an explicit type declaration.

A constant declaration associates an identifier with a value; the association cannot be changed within the scope of the declaration. If the value can be determined at compile-time, the constant is said to be manifest; the expression defining a manifest

constant must contain only                        literal constants, other manifest constants, and built-in
operations.

An enumercited type definition indicates an ordered set of values, i.e., introduces identifiers standing for each value in the set. The simple data types are the enumerated types, the subrange types, and the four standard simple types: Boolean, integer, char and StorageUnit. The real type has been omitted. For the first three, there is a way of writing literal constants of that type: True and False for Boolean, numbers for integers, and

quotations for characters.                          Numbers and quotations are syntactically distinct from

identifiers. The set of values of type char is the character set available in a particular implementation. The type StorageUnit has values which occupy the minimum unit in which storage allocation is done; this may of course differ from one implementation to another. Since no operations are defined on StorageUnit values, nothing more need he said about them.

A type may also be defined as a subrange of a simple type by indicating the smallest and the largest value of the subrange.


Structured types are defined by describing the types of their components, and indicating a structuring method. The various structuring methods differ in the selection mechanism serving to select the components of a variable of the structured type. In Euclid, there are five basic structuring methods available: array, record, module, set, and collection.

In an array structure, all components are of the same type. A component is selected by an array selector, or computable index. The index type, which must be simple, is indicated in the array type declaration. It is usually a programmer-defined enumerated type, or a subrange of the type integer. Given a value of the index type, an array selector yields a variable or constant of the component type. Every array structure can therefore be regarded as a mapping of the index type into the component type.

In a record structure, the components (called fields) are not necessarily of the same type. In order that the type of a selected component be evident from the program text (without executing the program), a record selector is not a computable value, but instead is an identifier uniquely denoting the component to be selected. These field identifiers are declared in the record type definition. Records may include constant as well as variable components; manifest constant components, of course, do not need to be stored in each record instance.

A record type may be specified as consisting of several variants. This implies that different record values, although declared to be of the same type, may assume structures which differ in a certain manner. The difference may consist of a different number and different types of components. The variant which is assumed by a record value is indicated by a constant of some simple type which is called the tag.

A module structure is much like a record, but may include routines and types as components. In this way, the operations which are defined on a data structure can be conveniently packaged with the structure. Module components cannot be accessed outside the module body unless they are explicitly exported. Thus in a properly written program it is evident from the lexical structure how the state of a module can be altered.

A set structure defines the set of values which is the powerset of its base type, i.e., the set of all subsets of values of the base type. The base type must be a simple type.

Variables declared in explicit declarations are called static. The declaration associates an identifier with the variable, and the identifier is used to refer to the variable. The language guarantees that two identifiers which can legally be used in the same scope cannot refer to the same variable, or to overlapping variables. Thus, an assignment to an identifier cannot change the value of any other identifier accessible in the same scope.

In contrast, variables may be generated by an executable statement. Such a dynamic generation yields a pointer value (a substitute for an explicit identifier) which subsequently serves to refer to the variable. This pointer may be assigned to other variables, namely variables of type pointer. Each pointer variable may assume values pointing to variables in a single collection C, all of whose members are of the same type. It may, however, also assume the value C.nil, which points to no variable. Because pointer variables may also occur as components' of structured variables, which are themselves dynamically generated, the use of pointers permits the representation of finite graphs in full generality. Although the language cannot guarantee in general that two pointer variables do not refer to the same variable, it can make this guarantee for two pointers in different collections.


A zone can be associated with each collection to provide procedures for allocating and deallocating the storage required by variables in that collection; if the zone is omitted, a standard system zone is used. The program may free a dynamic variable explicitly, in which case the program is responsible for ensuring that there will be no further references to the non-existent variable. .Alternatively, the collection may be reference-counted, in which case each variable is automatically freed when no pointers to it remain. The main advantage of reference-counted variables, as compared with explicit deallocation, is that the correctness of the deallocation does not have to be verified.

Throughout this report, the word variable means a container which can hold a value of a specific type. A variable may or may not be associated with an identifier. A constant, by contrast, is simply a value of a specific type. The fundamental difference is that assignment to a variable is possible.

A type declaration may have formal parameters; such a parameterized declaration represents a set of types, one of which is specified each time the type is referenced and actual parameters are supplied for the formals.

Two types are the same if their definitions are identical after any type identifiers which are not opaque have been replaced by their definitions, and any actual parameters and any identifiers declared outside the type have been replaced by their values. A type identifier is opaque if it is a module type, or is exported from a module.

The most fundamental statement is the             assignment statement. It specifies that a newly
computed value be assigned to a variable (or a component of a variable). The value is

obtained by evaluating an expression.                Expressions consist of variables, constants, sets,
operators and functions operating on the denoted quantities and producing new values.

Variables, constants, and functions are either declared in the program or are standard entities. Euclid defines a fixed set of operators, each of which can be regarded as describing a mapping from the operand types into the result type. The set of operators is subdivided into groups of:

1.     arithmetic operators of addition, subtraction, sign inversion, multiplication, division, and computing the remainder (mod).

2.     Boolean operators of negation (not), conjunction (and), disjunction (or) and implication (->).

3.     set operators of union, intersection, set difference, and symmetric, difference (xor).

4.     relational operators of equality,   inequality, ordering, set membership and set
inclusion. The results of relational operations are of type Boolean.

The procedure statement causes the execution of the designated procedure (see below).

There are two kinds of escape statements: an exit statement is used to terminate a loop, and a return statement to terminate a routine. An escape statement may be qualified by a when clause, which causes termination only if a Boolean expression is True.

Assignment, procedure, and escape statements are the components or building blocks of structured statements, which specify sequential, selective, or repeated execution of their components. Sequential execution of statements is specified by the compound statement;


conditional or selective execution by the if statement and the case statement; and repeated execution by the loop statement and the for statement. The if statement serves to make the execution of a statement dependent on the value of a Boolean expression, and the case statement allows for the selection among many statements according to the value of a selector. The discriminating case statement provides a safe way of discriminating the current variant of a variant record. The for statement is used when a bound on the number of iterations is known beforehand, and the loop statement is used otherwise.

A block can be used to associate declarations with statements. The identifiers thus declared have significance only within the block. Hence, the block is called the scope of these identifiers, and they are said to be local to the block. Since a block may appear as a statement, scopes may be nested. An if, case, for or loop statement, or a module type declaration, also defines a scope in a similar way.

A block can be named by an identifier, and be referenced through that identifier. The block is then called a procedure, and its declaration a procedure declaration. However, an identifier which is not local to a given procedure body is accessible in that body only if it is accessible in the immediately enclosing scope, and

it is pervasive in some enclosing scope or

it is explicitly imported into the given procedure body.

A procedure has a fixed number of parameters, each of which is denoted within the procedure by an identifier called the formal parameter, which is local to the procedure body. Upon an activation of the procedure statement, an actual quantity has to be indicated for each parameter which can be referenced from within the procedure through the formal parameter.       This quantity is called the actual parameter.                              There are two kinds of

parameters: constant parameters and variable parameters; routine and type parameters are not allowed. In the first case, the actual parameter is an expression which is evaluated once. The formal parameter represents a local constant whose value is the result of this evaluation. In the case of a variable parameter, the actual parameter is a variable and the formal parameter is bound to this variable. Possible indices or pointers are evaluated before execution of the procedure.

Functions are declared analogously to procedures; procedures and functions are collectively called routines. The main difference lies in the fact that a function yields a result, which may be of any assignable type and must be specified in the function declaration. Functions may therefore be used as constituents of expressions. Variable formal parameters and imported variables are not permitted within function declarations; as a consequence, functions cannot have side effects.

Since Euclid is intended for the writing of programs which are to be verified (either mechanically or by hand), there are a number of explicit interactions between the language and the verifier, in addition to the many aspects of the language which have been motivated by the desire to ease verification. These explicit interactions fall into two main categories:

embedding of assertions in the program: the special symbols assert, invariant, pre and post introduce assertions. These may be written as comments which are ignored by the compiler. Presumably they will be used by the verifier, which can take advantage of their relationship to the structure of the program. Alternatively, an assertion may be written as a Boolean expression, which is compiled into a run-time check if the checked option has been specified for an enclosing scope.


compiler-generated assertions: in cases where the compiler needs to be able to assume that some condition holds, but is unable to deduce that it does, the compiler may generate an assertion (in a new listing of the program) for the verifier, and then proceed as though confident of its truth. The legality of the program will then depend on the validity of the compiler-generated assertion. Each case in which such an assertion may be generated is spelled out in this report.


3. Notation, terminology, and vocabulary

"The best words     in the best order."

Coleridge.

The syntax is described in a modification of Backus-Naur form, in which syntactic constructs are denoted by English words or phrases, not enclosed in any special marks. These words also suggest the nature or meaning of the construct, and are used in the

accompanying description of semantics.                  Basic symbols of the language are written in
boldface or enclosed in quote marks; e.g., begin and ":". Possible repetition of a construct is

indicated by enclosing the construct within metabrackets { and } .     Possible omission of a

construct is indicated by enclosing the construct within metabrackets [ and ]      The word
empty denotes the null sequence of symbols.

The grammar defining the syntax of Euclid is distributed throughout this report; for convenient reference, it has also been collected in Appendix A.

3.1 Vocabulary

The primitive vocabulary of Euclid consists of basic symbols classified    into letters, digits,
and special symbols. Note that this vocabulary is not the character set. The character set is

implementation dependent, and each implementation must define, in its character set, distinct representations for all the basic symbols. Suggestions for doing this in some common cases may be found in section 13.

Each implementation must specify a single break character which can be used within an identifier. Two identifiers are similar if they are composed of the same sequence of characters, except for changes from upper to lower case letters or vice versa, and for the presence or absence of break characters. The intended use of the break character is to visually separate an identifier into its component parts. In an implementation which can print upper and lower case letters, a transition from lower to upper case can also be used for this separation. It is recommended that this convention be used when possible, in preference to the explicit break character, for implementations which have lower case letters;

obviously it cannot be used if the entire identifier is upper case, for example.       Thus, an
identifier might be represented as

alpha Beta           using 96-character ASCII, with capitalization as the break.

ALPH A_BETA   using the IBM PL/l character set, with    as the break.

ALPHA\BETA    using the Model 33 Teletype character set, with \ as the break.

All of these identifiers would be similar to the identifier ALPHABETA. With these conventions, it is possible to convert from one representation to another in a reasonable way (see 13.).

Each time an identifier is used, it must be written in exactly the same way (i.e., with the same capitalization and use of break characters) as it was written                   when it was declared.

However, another  identifier which is similar according to the above           rules may not be
declared in any scope in which the first identifier is accessible (see 7.3).

The following capitalization convention is generally used in this report:      type and routine
identifiers begin with a capital letter (except for certain standard types); other identifiers

begin with a small letter.              This convention is not part of the definition of Euclid, hOwever.


letter ::=     "A" I "B" I "C" I „D,.          "E" I „F,.             '°G" "I-I" I "I" I "1" I "K" I "L,"
"M" I ,.N,. "0" 1 "R" I "Q" I "R" 1 "S" I ".I,,, "U" I "V" I "W" I ,.X,. "Y" 1

"Z" 1"a" 1 "b" 1"c" 1 "d" 1"e" 1"f" 1 "g" 1"h" I"i" I "}" 1 "k" I "I" 1 "m" 1"u" 1 "0" I "p" I "a" I "r" I "s"                                           I IT° 1 "u" I "v" 1 "w"  1 "x" I "Y" I "z"

octalDigit ::=    "0" 1 "1" I "2" I "3" 1 "4" 1 ""5..          "6" 1 "7"

digit ::=      octalDigit 1 "8" I "9"

hexDigit ::=      digit I "A" 1 .,B,.           "C" I ,.D,.           "E" I "F"

breakChar ::= <some implementation-dependent character not a letter or digit>

specialSymbol ::=

"+" I "" 1 "*" I "=" I "<" I            ">" I "<=" 1 ">="       "->" 1 "(" I ")" I

"{" I„}„ 1     ":=” I "" I ""            ":" I ":" 1 ° " I "c" I  "=>" I "<<=" I "S" I "#" I
wordSymbol

wordSymbol ::=

abstraction I aligned 1 all 1 and I any I array I assert I at I begin I bind I bits bound I case I checked 1 code 1 collection I const I counted I decreasing dependent I discriminating I div I else I elseif I end I exit 1 exports I finally

for I forward 1 from 1 function I if 1 imports 1 in I include I initially 1 inline invariant 1 loop I machine I mod 1 module 1 net l of 1 on 1 or 1 otherwise packed 1 parameter I pervasive I post I pre 1 procedure I readonly 1 record

return            I returns I set I then 1 to 1 type I unknown 1 var 1 when I with I xor

The construct

"{" <any sequence of symbols not containing "}"> "}"

may be inserted between any two identifiers, literal constants (see 4.), or special symbols. It is called a comment, and may be removed from the program text without altering its meaning. The symbols "{" and "}" do not occur otherwise in the language.

Any verification system which accepts Euclid programs as input may define a convention for distinguishing comments which have special meaning for the verifier. One reasonable convention is that every comment in which a certain character appears as the first character after the  is intended for the verifier.

The word routine is used as a synonym for the phrase "procedure or function."

3.2 Legality assertions

Throughout this report, various restrictions will be placed on legal Euclid programs. Many of these restrictions cannot be checked syntactically, and in some cases they involve dynamic conditions that are difficult (or impossible) to check statically. Nevertheless, programs that violate them are not considered to be meaningful Euclid programs. It is the responsibility of the compiler to verify as many of these properties as it can, and to produce Boolean expressions called legality assertions for those it cannot. Thus, any program whose legality assertions can all be verified is a legal Euclid program, with well-defined semantics. Note that legality assertions are produced only for conditions specified in this report.


If checked is specified for a scope and not overridden by an inner not checked (see 6.2.3 and 9.2.1), all legality assertions in the block, and all programmer-supplied assertions which are Boolean expressions, are compiled into run-time checks, as an aid in detecting illegal programs, even before the verification process is complete.

3.3 Lexical structure

The text of a program is built up out of declarations and statements, collectively called units, according to the syntax specified below. In general units are separated by semicolons. The syntax is constructed in such a way that a unit may always be legally followed by a semicolon. In order to make it unnecessary to write semicolons between units which appear on separate lines, a semicolon is automatically inserted at the end of a line whenever the last token of the line could be the end of a unit, namely one of:

identifier, literal constant, ), t, exit, return, forward, or end possibly followed by if, loop, case, record, or module,

and the first token of the next line could be the beginning of a unit, namely one of:

identifier, literal constant, abstraction, assert, begin, bind, case, const, exit, finally, for, function, if, initially, inline, invariant, loop, machine, pervasive, procedure, return, type, var or with.

Commas are used as separators in enumerated types, case label lists, element lists, and parameter lists, and within declarations in identifier lists, bind lists, and import/export lists.

There are several kinds of brackets which are used to group declarations and statements for various purposes. The following list gives the unique closing bracket for each opening bracket.

 

if

loop case =>

begin

code record

module

end if

end loop end case

end caseLabel

end, or end routineldentifier whenever the block is the body of a routine

end routineldentifier.

end record, or end typeldentifier whenever the record definition is the declaration of a type identifier.

end module, or end typeldentifier whenever the module definition is the declaration of a type identifier.


4. Identifiers, numbers and strings

"And twenty more such names as these

Which never were nor no man ever saw."

The Taming of the Shrew, Induction, ii, 95.

Identifiers serve to denote constants, variables, types, and routines. Their association must be unique within their scope of validity, i.e., within the scope in which they are declared (see 6, 7.3).

identifier ::=   letter {  letterOrDigit }

letterOrDigit ::=   letter I digit I breakChar

The usual decimal notation is used for numbers, which are the literal constants of the data type integer (see 6.1.2.). Numbers may also be written in octal or hexadecimal notation. Note that unsigned numbers are always positive; a negative manifest constant can be written as an expression, e.g., -14.

unsignedNumber ::=  digit { digit } I

octalDigit     octalDigit } "#8" I
digit { hexDigit } "#16"

Examples:

1       100                717#8 OCAD1#16 123#16

Sequences of characters enclosed by quote marks are called literal string constants. They are the literal constants of the standard type string (see 6.2.2). A character code, whether or not it is in the printing character set, can also be represented in a literal string constant as follows:

$ddd, where each d stands for a decimal digit, represents the character code with the decimal representation ddd. Note: char.Ord($$ddd) = ddd (see 6.1.2).

For convenience, $S, $T, $N, $$, $' represent space, tab, newline, $, and ' respectively. The $ddd construction can be used only in a machine-dependent module (see 6.2.3).

literalString ::=" ' "    extendedCharacter } " ' "

extendedCharacter ::=         character I "$" extension

extension ::=   digit digit digit I        "S" I "T" I "N" I "$" I "
Examples:

1.1                                      'Here comes a null: $000 and there it went'

'Euclid'         'THIS IS A STRING"This$Sis$Sa$Sstring'

A single character preceded by a dollar sign is a literal constant of the standard type char (see 6.1.2).       The $ convention may also be used in these constants.

literalChar ::=  "$" extendedCharacter

Examples:

$a $$S {space character} $$000 {the NUL character} $"        $$$


Manifest constants

"One here will constant be, Come wind, come weather."

Pilgrim's Progress

A manifest constant is a literal constant or an expression which can be used in place of a literal constant. The value of a manifest constant can be computed in a straightforward way at compile time. Thus, a manifest constant expression may not involve any functions, except the standard ones defined in this report. A general constant, by contrast, has a value which is fixed during the lifetime of the scope in which it is defined, but may be computed in an arbitrary way at the beginning of that lifetime.

literalConstant ::= unsignedNumber I literalString I literalChar I enumeratedValueld manifestConstant ::= literalConstant I manifestConstantExpression manifestConstantExpression ::= expression

Examples:

-100                  'Euclid'         red              3*Co/or.Ord(Colorlast)


6. Data types

"What is written without effort is in general read without pleasure."

Johnson

A data type determines the set of values which variables and constants of that type may assume, and the set of basic operations that may be performed on them. A type declaration associates an identifier with the type. The type identifier is considered to denote the same type as its definition, unless it is declared as a module type (see 6.2.3) or is exported from a module, in which case the identifier denotes a different type; type equivalence is discussed in detail in 6.4. Parameterized types are introduced in 6.3.

A type declaration introduces a new scope in which the formal parameters of the type, if any, are declared (see 6.3). If the type definition is a module type, the definition is a second, closed scope (see 7.3), and identifiers declared outside it are inaccessible unless imported. If the type definition is not a module type, however, its scope is the open scope of the declaration (or possibly a second, open scope for a record definition), and importing is not necessary (or possible).

An identifier must be declared before it is used. When there are mutually recursive routines or types, however, it is impossible to give the definition of every identifier before its use. In this situation, a definition of forward may be given instead, and later another declaration, of the form type (or procedure P=..., or function Fr...) must appear to provide the true definition. Between its forward and true definitions, or within its own definition, a type may only be used as the object type of a collection, or to declare a formal parameter. If an identifier declared with forward has parameters, these must appear in the forward definition and must not be repeated in the true "definition.

A type declaration may not contain variable identifiers which are free, i.e., declared outside the type declaration, except that a module definition may explicitly import variables. As a consequence, an unparameterized type identifier denotes the same type throughout the scope in which it is declared, and the type denoted by a parameterized type identifier depends only on the values of the actual parameters. Note, however, that the same type identifier may denote different types in different scopes, e.g., in different instantiations of a routine or a module.

All types (except integer) automatically acquire components when they are declared. For example, an array type T has the component T.lndexType (see 6.2.1). Any component of a type is automatically also a component of every constant or variable of that type. Thus if e is a constant or variable of type T, eindexType is the same as T.IndexType.

This report specifies the standard representations, in terms of bits, for the values of certain types. These specifications are given so that machine-dependent records and machine-code procedures can be sensibly defined, and so that the effect of an explicit type conversion can be predicted. The following contexts are defined as sensitive:

a variable component of a machine-dependent record (see 6.2.4) a variable declared at a fixed address (see 7.)

an actual parameter or result of a machine-code routine (see 10.)


the actual parameter or result of an explicit type conversion (see 6.5)

A type may not appear in a sensitive context unless its standard representation is specified, either in this report, or explicitly by the implementation. Except in these sensitive contexts, there is no way for a Euclid program to determine the standard representation of any value, and an implementation is therefore free to use other representations, provided that it converts each value to the standard representation when it appears in a sensitive context.

type ::= simpleType I structuredType I pointerType I parameterizedTypeReference

typeDeclaration ::= type typeldentifier formalParameterList =
preAssertion typeDefinition

typeldentifier ::=   identifier

typeDefinition ::= type I forward

There are two components implicitly declared for each type other than integer:

T.size          the result, of type integer,   is the number of StorageUnits (see
6.1.2) required for the representation of a variable of type T.

T.alignment  the result, of type integer, is the required alignment of variables
of type T, in StorageUnits. Thus, if p : pType is a pointer to such

a      variable,       then      (AddressType<<=pType(p))                  mod
pType.alignment=0.

There is a component implicitly declared for each variable or constant: x.itsType      the type of x

6.1. Simple types

There are no type variables in Euclid. However, a type may be a component of a module (see 6.2.3), and hence may be referenced by a field designator (see 7.2.2), as well as by an identifier.

simpleType ::= enumeratedType I standardSimpleType I subrangeType I derivedSimpleType

derivedSimpleType ::= [ containingVariable "." I simpleTypeldentifier simpleTypeldentifier ::= identifier

6.1.1. Enumerated types

An enumerated type defines an ordered set of values by enumeration of the identifiers which denote these values. There must be at least two such identifiers. The identifiers are declared as constants in the current scope. if the current scope is a type declaration of type T, for which the enumerated type is the definition, however, the identifiers are declared in the enclosing scope instead, i.e., the scope in which T is declared. In any case, the identifiers may not be used for any other purpose in the scope in which they are declared.

The standard representation of the ith identifier (counting from 0) in the enumeration is the same as the representation of the unsigned integer i. Thus, if T is an enumerated type,


T.first is represented exactly like the integer 0.

enumeratedType ::= "(" enumeratedValueld { "," enumeratedValueld } ")" enumeratedValueld ::= identifier

Examples:

type Color = (red, green, blue, orange, yellow, purple)

type Suit = (club, diamond, heart, spade)

type Day = (Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday) type SexType = (female, male)

type Classification = (confidential, secret, topSecret)

type Device = (disk, display, keyboard, printer, tape)

Components implicitly declared for each enumerated and subrange (see 6.1.3) type T are:


Text Box: T first T.last T.Succ(x)
T.Pred(x)
T.Ord(x)
the first value (in the enumeration) the last value (in the enumeration)

the value succeeding x (in the enumeration). T.Succ(T.last) is undefined if T is an enumerated type, or T is a subrange of an enumerated type U and T.last=U.last.

the value preceding x (in the enumeration). T.Pred(T.first) is undefined if T is an enumerated type, or 7' is a subrange of an enumerated type U and T.first=U.first.

an unsignedlnt which is the ordinal number of the value x in the enumeration of T. Thus, T.Ord(T.first)=0.


For instance, Suit.last is spade, and Day.first is Monday.

6.1.2. Standard simple types

The following types are standard in Euclid, and are pervasive throughout all programs (see 7.3):

integer Its values are the positive and negative integers, in the mathematical sense. It is not possible to declare a variable to be of type integer. instead, variables can be declared to be of some suitable subrange type. Constants of type integer may be declared, however, and numbers are literal constants of type integer.

 

signedlnt, unsignedlnt

Every implementation has two standard types, signedlnt and unsignedlnt; they are ordinary subranges of integer which are pre-defined for convenience. These are implementation-defined; the intention is that they should be large subranges of integer type which can be handled efficiently by the machine, and which contain:

for signedlnt. equal numbers of positive and negative numberS, or perhaps one more negative number.

for unsignedlnt, 0 and no negative numbers.


 

Boolean

char

An operation is called well-behaved if its operands are in the range signedint, or unsignedInt, and it yields an integer result in the same range. An implementation must support the evaluation of any expression in which all the operations are well-behaved (see 8.1). An implementation may also support the evaluation of expressions involving larger integers (e.g., double-precision integers), but this is not required. It is recommended, however, that on machines which have unsignedInt.0..216-1, at least double-precision integers should be supported.

If T is defined by type T = m n, for any manifest constants m, n>=0, and i is a value of type T, the standard representation of i is the ordinary binary representation of the integer i, filled out on the left with any number of extra zeros. The standard representation of a signed integer must be defined by the implementation, but is not defined in this report.

There are two standard functions defined on any subrange of integer: Abs(x) returns an integer which is the absolute value of x. Odd(x) returns a Boolean which is True if and only if x is odd.

Its values are the truth values denoted by the identifiers False and True. It is defined by type Boolean = (False, True).

It is an enumerated type whose values are a set Of characters determined by the implementation. They are denoted by the characters themselves preceded by a dollar sign (see 4.).

There is a standard function Chr(x: 0..char.Ord(char.last)) with the property that char.Ord(Chr(x))=x.

Warning: the ordering of the values of type char is implementation dependent. Use of this ordering in comparisons of chars, subranges of chars, or the Chr and char.Ord functions, will in general result in non-portable programs. Unlike most other machine-dependent features of Euclid, this one is not restricted to machine-dependent modules.

 

 StorageUnit This is the basic unit for storage allocation (see 6.3). There are no distinguishable values of this type, and no operations are defined on this type. Thus, a StorageUnit variable simply serves to occupy a known amount of space. The standard representation of a StorageUnit is not defined.

There are two standard components of the type StorageUnit:

sizeinBits, an integer constant which defines the number of bits in a StorageUnit.

Address, a function declared by

function Address (A:array 0..n of StorageUnit)      returns
Add ressType,

which returns the machine address of A(0). It may only be used in a machine-dependent module.


AddressType This is an unsigned subrange of integer, large enough to hold a full machine address; i.e., a value returned by the function StorageUnit.Address.

6.1.3. Subrange types

A type may be defined as a subrange of another simple type by indication of the smallest and the largest value in the subrange. The first constant specifies the lower bound, and the second the upper bound. If type A is a subrange of type B, and type B is a subrange of type C, we say that A is also a subrange of C. The Succ, Pred, first and last components are defined for all subrange types. If A is a subrange of B, and a is of type A and b is of type B, and a=b, and the standard representation of b is defined, then the standard representation of a is the same as the standard representation of b.

subrangeType ::= constantSum ".." constantSum constantSum ::= sum

Examples:

type OneToOneHundred = 1 .. 100

type SymmetricRange = -10 .. 10

type Primary = red .. blue {the values of a Primary are red, green, and blue} type ScreenPosition = 1 .. 525 {y coordinate for display screen}

6.2. Structured types

A structured type is characterized by the type(s) of its components and by its structuring method. Moreover, a structured type definition may contain an indication of the preferred data representation: if a definition is prefixed with the symbol packed, this is a hint to the compiler that storage should be economized even at the price of some loss in efficiency of access, and even if this may expand the code necessary for accessing components of the structure.

Adding occurrences of packed may make a legal program into an illegal one (because of type compatibility (see 6.4) or if a component of the structure has been renamed as an entire variable (see 7.4)), but will not otherwise change the meaning of the program.

structuredType ::= [ packed       unpackedStructuredType I derivedStructuredType

unpackedStructuredType      arrayType I recordType I moduleType I

mdRecordType I setType I collectionType

derivedStructuredType ::= [containingVariable "."] structuredTypeldentifier structuredTypeldentifier ::= identifier

6.2.1. Array types

An array type is a structure consisting of a fixed number of components which are all of the same type, called the component type.                 The elements of the array are designated by indices,

values belonging to the index type.   The array type definition specifies both the component
type and the index type.


The standard representation of an unpacked array A of type array I of C is defined as follows.          Let t.C.size+(C.alignment-1), and s=t-(t mod C.alignment).                          Then successive

components of A occupy successive groups of s StorageUnits, with no unoccupied StorageUnits in between. A(I.first) occupies the first s StorageUnits, i.e., the ones with the smallest machine addresses, and A(I.last) occupies the last s StorageUnits, i.e., the ones with the largest machine addresses. Each component is aligned in the same way as a variable of type C. The entire array thus occupies max(0,e(Llast - /.first + 1)) StorageUnits. The standard representation of a packed array is not defined.

arrayType ::= array indexType of componentType indexType ::= simpleType

componentType ::= type

There are two standard components of an array type T:

T.IndexType           the index type

T.ComponentType   the component type

Like other components of types, they are also components of any variable or constant of the type (see 6.). Thus if a is a variable of type T, then aindexType is the same as T.IndexType, and likewise for ComponentType.

Examples:

type type type type

ArrayO = array 1 .. 100 of signedInt

Arrayl = array -10 .. 10 of 0 .. 99

Array2 array Boolean of Color

NameTable = array OneToOneHundred of string(50)

 

6.2.2. Record types

A record type is a structure consisting of a fixed number of components, possibly of different types. The record type definition specifies for each component, called a field, its type and an identifier which denotes it. The scope of these field identifiers is the record definition itself. They are also accessible within a field designator (cf. 7.2) referring to a record variable or constant of this type. Record components may be constants or variables. Components other than variables are accessible within a field designator referring to the type itself, as well as in a designator referring to a variable.

For the syntax of constant and variable declarations, see 7. If the record type appears as the definition of a type identifier, it must end with the clause end identifier; otherwise it must end with end record.

The size of a record containing an unpacked array is equal to some constant value, independent of the array size, plus the size of the array. The standard representation of a record is not defined.

recordType ::= record fieldList endRecord endRecord        end record I end identifier

variantPart ] [ , ]

fieldList ::= , [ recordDeclaration [";"] ] [

recordDeclaration ::= pervasive recordDeclarationPart

"," per-Vasive recordDeclarationPart }

recordDeclarationPart    constantDeclaration I variableDeclaration

pervasive ::=  pervasive I empty


A record type may have several variants. In this case a constant of some simple type must be used as a selector in a case construction which enumerates the possible variants. This constant is called the tag, and its value indicates which variant is assumed by the record variable at a given time. Each variant structure is identified by a case label which is a set of manifest constants of the type of the tag. Usually the tag will be a formal parameter of the type declaration in which the case appears (see 6.3). When a variable of the record type is declared, however, the tag must be a manifest constant, or any.

The case label lists must be disjoint. Furthermore, the union of the lists must exhaust the enumerated type of the tag, unless there is an otherwise variant, in which case all the tag values not mentioned explicitly are lumped under that variant. The case label following the end of the variant must be one of the labels specified by the case label list.

variantPart ::= case tag of variant { ";" variant } [ otherwiseVariant ] [ ";" ] end case variant               caseLabelList "=>" fieldList end caseLabel I empty

caseLabelList      caseLabel    "," caseLabel }

caseLabel ::= manifestConstant I subrangeType

tag ::= constant

otherwiseVariant ::= otherwise ".>" fieldList

Examples:

type Date = record

var day 1 .. 31

var month: (Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec) var year 1900 .. 2100

end Date

type stream (dev: Device) = record

case dev of

display =>

var first, last: DisplayControlBlock {not defined in this report} var height: ScreenPosition                          ScreenPosition.first

var nLines: 0 .. (ScreenPosition.last)/8

end display

tape, disk =>

var file: FileHandle {not defined in this report}

var position.. unsignedInt := 0

var buffer array 0 .. 255 of char

end tape

keyboard =>

var buffer. array 0 .. 20 of char

var bufFirst, bufLast: buffer.IndexType := buffer.IndexType.first end keyboard

otherwise => {null fieldList}

end case

end stream


There are three pervasive standard types having to do with strings, declared as follows.

type Stringlndex = 1      stringMaxLength

type StringLength = 0   stringMaxLength

type string(maxLength: StringIndex) = record var length: 0 .. maxLength                                   0;

var text: packed array 1 .. maxLength of char end string

Literal string constants are of type string, with length equal to the number of characters. The maxLength of a literal string constant is chosen to make its type suitable for the context in which it appears; it must not be less than the length, however. Routines can readily be defined to extract substrings, do pattern matching, or perform any other desired operations on strings (see 10. and 11.        for   examples).                                                      Furthermore, these routines might be

machine-coded for efficiency.        The value                      of the pervasive manifest constant
stringMaxLength is implementation-defined.

6.2.3 Module types

A module type is a generalization of a record type. Module components may be declared as constants, variables, routines, or types. These declarations have the same form and the same meaning as the declarations in a block. Thus, a module serves as a package for a collection of related objects. A module type is always opaque; i.e., it is not the same as any other type (see 6.4). Thus, two different module definitions always define different types, even if the definitions are textually identical.

Identifiers declared in a module are not known outside unless they are exported explicitly, so the packaging supplied by the module also provides protection against improper use of components which are intended to be known only within the module definition. An exported identifier x is accessible (but only within a suitable field designator) in any scope in which the module type is accessible. A constant identifier is always exported as a constant; this may be specified by a binding condition of const, or the binding condition may be omitted. A variable identifier may be exported as a variable, using the binding condition var, or as a readonly variable, using the binding condition readonly or omitting the binding condition (see 7). A variable may not be exported as a constant. No two variables in the union of the import and the export lists of a module may overlap (see 7.4).

A constant or variable identifier may not be exported unless its type is exported, or is accessible in the scope S enclosing the module definition; the itsType component of an exported constant or variable is automatically exported. Similarly, a routine identifier may not be exported unless the types of its parameters and results are exported or accessible in S.

When a type T is exported from a module M, it is always opaque outside its defining module; i.e., it is not the same as any other type (see 6.4). Thus all operations on values of type T, other than possibly assignment and equality tests, must be performed by routines defined in M. The export item for a type may specify, in an appended with clause, that it is to be exported with assignment, equality or both; an operation not specified is not exported. If assignment is not exported, then T is not assignable; i.e., outside M constants or constant parameters of type T cannot be declared, and variables of type T cannot appear on the left side of an assignment statement. Furthermore, Ts field identifiers (including standard ones, like size) are not accessible outside 41, unless the with clause specifies that they should be exported. If T is an array type, subscripting is exported if and only if the IndexType


component is exported. If T is an enumerated type, its enumerated value identifiers are not exported unless they are mentioned explicitly in T's with clause; subrangeType is permitted in an exportltem to facilitate this. Only an exported type identifier may have a with clause.

Values of the module type itself may not be assigned or compared for equality unless := or =, respectively, appear in the export list. Furthermore, := and = may not appear in the export list if the module imports any variables.

The identifiers which may be exported from a module are those which are accessible in the scope between the end of the module's declarations and the end of the module; loosely, these are just the identifiers -declared at the outermost level in the module.

A module is a closed scope; i.e., identifiers declared outside the module are not known inside unless they are known in the immediately enclosing scope, and either are pervasive or are explicitly imported into the module (by the imports clause in the module definition, or because they appear in the formal parameter list of the module declaration). Importing is discussed in detail in 7.3.

The optional identifier following module in the type definition may be used within the definition to name the entire value, i.e., for self-referencing.

A module type may be prefixed with the symbols machine dependent, which simply means that it may contain other machine-dependent module and record declarations, fixed addresses in variable declarations (see 7.), extended characters of the form $ddd (see 4.), type converters involving types with implementation-dependent representations (see 6.5), uses of StorageUnit.Address (see 6.1.2), and machine-code procedure declarations (see 10.); a module which is not machine-dependent may not contain any of these things. Thus it is possible to tell from the heading of a module type whether it contains any machine

dependencies or not. Note that machine-dependent identifiers exported from a machine-dependent module, as well as the machine-dependent module type itself, may be used in other modules which are not machine-dependent. Since exported types can only be

manipulated   by     exported   routines,   the    routines   serve    to    encapsulate     the
machine-dependencies.

The standard representation of a module is not defined.

moduleType ::= [ machine dependent ] module [ identifier ]

importClause exportClause moduleBody endModule

endModule ::= end module I end identifier

importClause ::= imports "(" importltem   "," importltem } ")" I empty
importltem ::= pervasive bindingCondition identifier I typeConverter exportClause ::= exports exportList [";"] I empty

exportList ::= "(" exportltem { "," exportltem} ")"

exportltem ::= bindingCondition identifier [ with exportList ] I ":=" I "=" subrangeType

bindingCondition ::= const I readonly I var I empty

A module may include an initial action which is executed whenever a new variable of the module type is created, and a final action which is executed whenever such a variable is


destroyed. Note that if several module variables are declared, as in the program fragment begin x:Ml; y.M2; zMl; ... end

the order of creation is x, y, z, and the order of destruction is z, y, x. This follows from the fact that the three declarations start three nested scopes, all of which end at the end (see 7.3).

A module may also specify an invariant which is supposed to be true during the lifetime of the module variable (i.e., after the execution of the initial action and before the execution of the final action), except perhaps when one of the procedures of the module has been called and has not yet returned. Like other assertions, this one may be empty or a Boolean expression. In the former case the content of the assertion is supplied in a comment. This comment is of course ignored by the compiler, but presumably is interpreted by the verifier. In the latter case the assertion must be a legal Boolean expression, but it generates code only if the checked option is enabled for the enclosing scope.

The abstraction function, necessary for one approach to the verification of modules [Hoare 1972, London et al. 1977], maps the variables inside the module into a value of the module type. The need for this mapping in verification arises because a variable of the module type is represented inside the module by different variables. The body of the abstraction function may contain constructs outside Euclid in the same way that assertions may, i.e., within comments. The abstraction function is not callable from a Euclid program.

moduleBody ::= checkedClause declaration [ ";" initialAction invariant finalAction

checkedClause ::= checked I not checked I empty

declaration ::= empty I pervasive declarationPart { ";" pervasive declarationPart }

declarationPart ::= constantDeclaration I variableDeclaration I typeDeclaration procedureDeclaration I functionDeclaration

initialAction                initially routineDefinition ";" I empty

invariant ::= [abstraction functionDeclaration] invariant assertion ";" I empty

assertion ::= expression I empty

finalAction ::= finally routineDefinition ";" I empty

The following example outlines how one might package floating-point numbers and operations on them in a module. Examples of complete modules may be found in section 12 and Appendix B.

var Real: machine dependent module

exports (Add, Subtract, Times, Div, Greater, const zero, number, Value with (:=, =, sign, exponent, mantissa))

type Exp = -80#16 7F#16; type Mant = 0 OFFFFFF#16 type Value = machine dependent record

var sign (at 0 bits 0 .. 0): 0 .. 1 := 0 var exponent (at 0 bits 1 .. 7): Exp := 0

var mantissa (at 0 bits 8 .. 31): Mant := 0 end Value

const zero: Value := (0,0,0)

intine function number (m: signedInt, ex: Exp) returns num; Value =


imports (Mara)

begin

if m < 0 then num.sign := 0 else num.sign := 1 end if

num.mantissa := Abs(m); num.exponent := ex

end number

function Add (const 1, r Value) returns sum: Value =

code ... end Add

inline function Greater (const 1, r Value) returns Boolean =

imports (signedInt<<=Value)

begin {use type converters to compare I & r as signedlnts}

return True when (signedInt<<=lia/ue(0) > (signedlnt<<=Value (r)) return False

end Greater

•••

end Real

6.2.4 Machine-dependent records

A machine-dependent record type is a restricted kind of record type which allows the programmer to. specify the exact position and size of each variable field. The position is specified in StorageUnits, where the first StorageUnit of the record is numbered 0, and then in bits, where the first bit of the specified StorageUnit is numbered 0, and the bit numbering continues to successive StorageUnits in the obvious way. The ordering of bits in a StorageUnit is implementation-defined. If the bits clause is omitted, the field occupies an integral number of StorageUnits, and its size is computed from the size of its type; the value is right-justified in the field. The compiler's responsibility is to check that fields do not overlap and that each field is at least large enough to hold values of its type. The size of a value of machine-dependent record type is determined by the largest position specified by the declaration, where the value of the at clause (in StorageUnits) is added to the ending value of the bits clause (in bits). An implementation may place restrictions on how fields overlap natural storage boundaries. The index type specifying bits must be a subrange of integer, and must be manifest except possibly for the last component of the record.

A machine-dependent record may have constant components like an ordinary record. It may not have any parameters. All its variable components must have position specifications, and they cannot be exported or passed as variable parameters. Furthermore, they must all have types whose standard representation is specified. Note that the standard representation of a machine-dependent record is specified, but the standard representation of an ordinary record is not.

An alignment clause, aligned mod a, in a machine-dependent record declaration forces a value of the record type to be allocated so that the machine address of its first StorageUnit is 0 mod a; a must be a power of 2.

The module in which a machine-dependent record type appears must be a machine-dependent module.


mdRecordType ::= machine dependent record [ alignmentClause ] mdDeclarationPart  ";" mdDeclarationPart } endRecord

mdDeclarationPart ::= constantDeclaration I

var identifier "C at manifestConstant [ bits simpleType ")" ":" typeDefinition [ initialization


Text Box: alignmentClause ::= aligned mod manifestConstant Examples
type InterruptWord = machine dependent record aligned mod 8 var device (at 0 bits 0 .. 2): DeviceNumber,
var channel (at 0 bits 3 .. 5): 0 .. 7;
var stopCode (at 0 bits 6 .. 7): (finishedOk, errorStop, powerOff); var command (at 1 bits 0 wordSize): ChannelCommand end InterruptWord
 



6.2.5. Set types

A set type defines the range of values which is the powerset of its base type. Base types must be simple types. Operators applicable to all set types are:

union

set difference: i appears in a-b if and only if it appears in a and not in b. xor     symmetric difference: i appears in a xor b if and only if it appears in
exactly one of a and b.

intersection

in         membership

<=, >= set inclusion

Sets can be built up from values of the base type as described in section 8. The standard representation of a set S : set of B is defined if. B is a manifest type and the standard representation of values of B is defined. It is a sequence of n significant bits, where n=(B.last-B.first+1), preceded by any number of insignificant zero bits. If the significant bits are numbered 0, 1, ..., n-1, then bit i is one if and only if x is in S and B.Ord(x)=i.

setType ::= set of baseType

baseType simpleType

There is one standard component of a set type T: T.BaseType                    the base type

Examples:

type Hue = set of Color

type SubtractivePrimaries = set of red .. green type SymSet = set of -5 .. +5

type EntrieslnUse = set of Arrayl.lndexType


6.2.6. Pointer and collection types

A variable which is declared in a program (see 7.) is referred to by its identifier. The variable exists during the entire lifetime of the scope to which it is local, and such a variable is therefore called static. In contrast, variables may also be generated dynamically, i.e., without much correlation to the structure of the program. These dynamic variables are generated by the standard procedure component New described below; since they do not occur in an explicit variable declaration, they cannot be referred to by an identifier. Instead, they may be referred to by a pointer value which is provided by New when the dynamic variable is generated. A pointer type thus consists of an unbounded set of values pointing to elements of the same type. No operations are defined on pointers except the test for equality, the pointer following operator t which yields the variable referred to by the pointer, and the standard function component, Index, which converts a pointer into an integer.

The standard representation of a pointer is the same as the standard representation of an AddressType.

A dynamic variable must be an element of a collection. A collection is not a type: it is a variable which behaves very much like an array variable. Just as an element of an array variable A can be referenced by subscripting A with an index whose type is the index type of A (A.lndexType), so an element of a collection C can be referenced by subscripting C with a pointer whose type is the pointer type of C (tC). There are two differences:

No two collections have the same pointer type. Hence the pointer alone is sufficient to specify the collection, and we allow pt as shorthand for C(p), where p is of type tC.

There are no operations which produce pointer results, except the standard procedure C.New which creates a new variable (or an explicit type conversion). Hence the storage allocation strategy for collections can be quite different from the strategy for arrays.

The reason for having collections is that two pointers to different collections are guaranteed to point to different variables; two pointers to the same collection are either equal, and point to the same variable, or unequal, and point to non-overlapping variables. Hence collections are a means by which the programmer can express some of his knowledge about the ways in which his program is using pointers. If he prefers not to do this, or has no knowledge about pointers to variables of type T which can be expressed in this way, he can simply declare a single collection of Ts and use it everywhere.

There are no operations on collections. A collection may not be assigned to another collection. In fact, there is nothing to do with a collection except to subscript it, or to pass it as an actual parameter.

Associated with every collection is a zone which provides storage for its variables. A zone is a module variable with three special components (and possibly other components):

a variable storageBlocks which is a collection of a record type containing a special component (and possibly other components):

theStorage, a StorageUnit


a procedure Allocate(size, alignment: unsignedlnt, var pointer tstorageBlocks) a procedure Deallocate(pointer tstorageBlocks, size: unsignedlnt)

These components must not be exported; they are intended only for use by the standard procedures New and Free. A zone must be a machine-dependent module variable. Note that a collection es zone must be imported as a variable into any scope in which C.New or C.Free is called. If C is reference-counted, C.zone must also be imported as a variable into any scope in which a non-local variable of type tC is assigned to. Appendix B contains examples of module types which- implement zones.

A collection declared without a zone will get a standard zone called SystemZone. This zone is not pervasive (since it is a variable), but must be imported where it is needed.

A collection can be reference-counted, in which case a variable in the collection will be freed automatically when no pointers to it remain. The optional manifest constant is an integer which gives the maximum reference count which should be maintained; any variable to which more than this number of pointers ever exists at one time may never be freed. The Free procedure does not exist for a reference-counted collection.

collectionType ::= [ counted [ manifestConstant ] ] collection of objectType [ in zone ]

objectType ::= type

zone ::= variable

pointerType ::= "t" collectionVariable

collectionVariable ::= variable

There are six standard components of a collection variable C:


Text Box: C.nil C.ObjectTypea pointer which points to no variable at all.

the object type. If there are any unknowns in the collection definition, C.ObjectType is a parameterized type with formal parameters corresponding to the unknowns, in the same order (see 6.3).

Text Box: C.zone C.Index(obj: tC)the zone (see above).

a function which takes a pointer to C and returns an integer. This function has only one defined property: it is one-to-one.

Text Box: C.New(var p: tC)allocates a new variable v of type T in collection C and assigns the pointer to v to the pointer variable p. C.New imports C as a variable. This procedure works by calling Allocate for the pointer's zone with T.size and T.alignment as parameters. It gets back a tstorageBlocks, and uses the theStorage component in this block as the first StorageUnit for the newly created variable. It is up to the verifier of the zone to ensure that a sequence of at least n free StorageUnits begins there if Allocate(n, i, p) was called, and that the storage allocated does not overlap with that of any other variable. Any initialization specified by the type of v