This document was made by OCR from a scan of the technical report. It has not been edited or proofread and is not meant for human consumption, but only for search engines. To see the scanned original, replace OCR.htm with Abstract.htm or Abstract.html in the URL that got you here.

Analysis and Caching of Dependencies

Martin Abadi                            Butler Lampson                Jean-Jacques Luevy

Digital Systems Research Center                  Microsoft                  INRIA Rocquencourt

rna@pa . dec. corn                     blarnpson@rnicrosoft. corn

Abstract

We address the problem of dependency analysis and caching in the context of the A-calculus. The dependencies of a A-term are (roughly) the parts of the A-term that contribute to the result of evaluating it. We introduce a mechanism for keeping track of dependencies, and discuss how to use these dependencies in caching.

1 Introduction

Suppose that we have evaluated the function application I(1, 2), and that its result is 7. If we cache the equality I(1,2) = 7, we may save ourselves the work of evaluating I(1, 2) in the future. Suppose further that, in the course of evaluating I(1,2), we noticed that the first argument of I was not accessed at all. Then we can make a more general cache entry: I(ri, 2) = 7 for all ii. In call-by-name evalua­tion, we may not even care about whether n is defined or not. Later, if asked about the result of I(2,2), for example, we may match I(2,2) against our cache entry, and deduce that I(2,2) = 7 without having to compute I.

There are three parts in this caching scheme: (i) the dependency analysis (in this case, noticing that I did not use its first argument in the course of the computation); (ii) writing down dependency information, in some way, and caching it; (iii) the cache lookup. Each of the parts can be complex. However, the caching scheme is worthwhile if the computation of I is expensive and if we expect to encounter several similar inputs (e.g., I(1,2), I(2,2), . . . ).

We address the problem of dependency analysis and cach­ing in the context of the A-calculus. We introduce a mecha­nism for keeping track of dependencies, and show how to use these dependencies in caching. (However, we stop short of considering issues of cache organization, replacement policy, etc.) Our techniques apply to programs with higher-order functions, and not just to trivial first-order examples like I(1, 2). The presence of higher-order functions creates the need for sophisticated dependency propagation.

As an example, consider the higher-order function:

I c =Ax.Ay.fst(x(fst(y))(snd(y)))

where pairs are encoded as usual:

(a,b> c =Ax.x(a)(b) fst c =Ap.p(An.Az.n) snd c =Ap.p(An.Az.z)

The function I takes two arguments x and y; presumably x is a function and y is a pair. The function I applies x to the first and second components of y, and then extracts the first component of the result. A priori, it may seem that I depends on x and on all of y. Consider now the following arguments for I:

 g

=An.Az.(z,n>                           r  c

c                                      =(1,2>

g'c= An.Az.(z,(n,z>>                    r'   c = (2,2>

Both functions g and g' seem to depend on their respective arguments. However, all these a priori expectations are too coarse. After evaluating I(g)(r) to 2, we can deduce that I(g')(r') also yields 2. For this we need to express that I accesses only part of the pair that g produces, that g accesses only part of the pair that I feeds it, and that g and g' look sufficiently similar. We develop a simple way of capturing and of exploiting these fairly elaborate dependencies.

Our approach is based on a labelled A-calculus [Lév78]. Roughly, our labelled A-calculus is like a A-calculus with names for subexpressions. In the course of computation, the names propagate, and some of them end up in the result. If a reduces to v, then v will contain the names of the subex­pressions of a that contribute to producing v. Then, if we are given a' that coincides with a on those subexpressions, we may deduce that a' reduces to v.

In our example, we would proceed as follows. First, when given the expression I(g)(r), we would label some of its subexpressions. The more labels we use, the more infor­mation we obtain. In this example, which is still relatively simple, we label only components of g and r:

gA =An.Az.(e0:z,e1:n>             rA   c

c                                      =(e2:1,e3:2>

where e0, e1, e2, and e3 are distinct labels. We extend the reduction rules of the A-calculus to handle labels; in this case, I(Ag)(Ar) reduces to e0:e3:2. Stripping off all the labels, we can deduce that I(g)(r) reduces to 2. Studying the la­bels, we may notice that e1 and e2 do not appear in the result. As we will prove, this means that I(g* ) (r*) reduces to 2 for any expressions g* and r* of the forms:

g* c =An.Az.(z, >                                       r* c = ( ,2>

Obviously, g' and r' match this pattern, and hence f(g')(r') reduces to 2. As this small example suggests, our techniques for dependency analysis are effective, reasonably efficient, and hence potentially practical.

In the next section we review the background for our work and some related work. In section 3, we study de­pendency analysis and caching in the pure A-calculus. In sections 4, we extend our techniques to a more realistic lan­guage; this language includes records and has a weak oper­ational semantics based on explicit substitutions [ACCL91, Fie9O].

2 Motivation and Related Work

The motivation for this work arose in the context of a system-modelling system called Vesta [LM93, HL93]—roughly a re­placement for tools like make and rcs. In Vesta, the ana­logue of a makefile is a program written in a specialized, un­typed, higher-order, functional, lazy, interpreted language. The functional character of the language guarantees that the results of system building are predictable and reproducible.

In Vesta, the basic computation steps are expensive calls to functions like compile and link; hence it is important to avoid unnecessary recomputations. The programs can be reasonably large; it is therefore desirable to notice cache hits for large subexpressions rather than for individual calls to primitives (e.g., individual compilations). Furthermore, irrelevant changes in parameters are expected to be frequent; when there are such changes, a simple memoisation [Mic68, Hug85] does not suffice for avoiding recomputations, and a more savvy caching strategy is necessary.

This paper, however, is not about Vesta. There has been some research on caching in Vesta [HL93], and more is cur­rently in progress. Here we discuss techniques for the A-calculus; these are somewhat simpler, easier to explain, and perhaps of more general interest.

In the A-calculus, the work that seems most closely re­lated to ours is that of Field and Teitelbaum [FT9O]. They have investigated the problem of reductions of similar ex­pressions (which may not even yield the same result). Their approach is based on a A-calculus with a new "fork" primi­tive (Lx) rather than on a labelled A-calculus. For example, they can represent the two similar expressions b(a) and b' (a) as the single expression Lx(b, b')(a), with a rule for duplica­tion, namely Lx(b, b')(a) = Lx(b(a), b'(a)). Their algorithm seems particularly appropriate for dealing with pairs of ex­pressions that differ at only one or a few predictable subex­pressions.

There has been much other work on incremental com­putation, and some of it is related to ours. In particular, Pugh's dissertation concerns incremental evaluation of func­tional programs; it raises the issue of caching for functions that do not depend on all of their arguments [Pug88

, pp. 70—

71].

Dependency analysis is also similar to traditional anal­yses such as strictness analysis (e.g., [BHA86]). There is even a recent version of strictness analysis that relies on a labelled A-calculus [GVS95]. Strictness analysis is con­cerned with what parts of a program must be evaluated; in contrast, for doing cache lookups, we need to know what parts of a program may affect the result. Furthermore, we do not use approximate abstract interpretations, but rather rely on previous, actual executions of programs similar to the one being analyzed.3 Dependencies in the Pure A-calculus

In this section we consider incremental computation in the context of the pure A-calculus. This is a minimal system, but it enables us to illustrate our ideas. First we review some classical results that suggest an approach to depen­dency analysis; then we describe a labelled calculus, a basic scheme for caching, some examples, and a more sophisti­cated scheme for caching; finally, we consider call-by-value evaluation.

3.1 The A-calculus

The standard A-calculus has the following grammar for ex­pressions:

a,b,c ::==                                     terms

M            x           variable (x E V)

M           Ax.a abstraction (x E V)

M            b(a)                application

where V is a set of variables.
The /3 rule is, as usual:

(Ax.b)a -* b{a/x}

where b{a/x} is the result of replacing x with a in b. When C is a context (a term with a hole), we write C{a} for the result of filling C's hole with a (possibly with variable captures). We adopt the following congruence rule:

a-*b

C{a} -* C{b}

The reduction relation -*" is the reflexive, transitive closure of -*. A computation stops when it reaches a normal form.

We can now reformulate the problem posed in the intro­duction. Suppose that a is a term and a -*" v. When can we say that b -*" v simply by comparing a and b? In order to address this question, we recall a few known theorems.

Theorem 1 (Church-Rosser) The calculus is confluent.

Theorem 2 (Normalization) If a term can be reduced to a normal form, then its leftmost outermost reduction (which reduces the leftmost outermost redex at each step) reaches this normal form.

Clearly the leftmost outermost reduction reduces only subex­pressions necessary to get to the normal form.

A prefix is an expression possibly with several missing subexpressions:

a, b, c ::==prefixes M         hole

M            x           variable (x E V)

M           Ax.a abstraction (x E V)

Mb(a)                      application

A prefix a is prefix of another prefix (or expression) b if a matches b except in some holes; we write a -< b. For instance, we have that (x)( )(Ay. (y)) -< y(x)(x)(Ay. (y)). For the purposes of reduction, we treat like a free variable; for example, (Ax.x( ))(a) -* a( ).

The following three results concern the prefix ordering and reduction:

Proposition 1 (Maximality of terms) If b -< d and b is a term, then b = d.

Proposition 2 (Monotonicity) If a, b, and c are pre­fixes, a —*" b, and a -< c, then there exists a prefix d such

that c —*" d and b -< d.

Theorem 3 (Stability) If a is a term, v is a term in nor­mal form, and a —*" v, then there is a minimum prefix

ao -< a such that ao —*" v.

Proof The stability theorem follows from the stability of Böhm trees [Ber78]. Here we sketch a simple, alternative proof.

First we show that if a and b are compatible (have a common upper bound) in the prefix ordering, and a and b reduce to a term v in normal form, then the greatest lower bound of a and b (written a A b) also reduces to v. The proof is by induction on the lengths of the leftmost outermost reductions to v, and secondarily on the sizes of a and b. We proceed by cases on the form of a.

·  Ifa=x,thenv=xandb=x,so(aAb)=v.

·  If a = Ax.al, then b is of the form Ax.bl, with al and bl compatible. The result follows by induction hypothesis.

·  If a = x(al). .. (an), then b is of the form x(bl). .. (bn) with a, and b, compatible for each i E 1. .n. The result follows by induction hypothesis.

·  The cases where a = or a = (al) ... (an) are impos­sible, since a reduces to a term in normal form.

·  Finally, if a = (Ax.al)(a2).. . (an), then b is of the form (Ax.bl)(b2). .. (bn). Let a' = al{a2/x}(a3).. . (an) and b' = bl{b2/x}(b3)... (bn); a' and b' are compatible, and they reduce to v with shorter leftmost outermost reductions than a and b. By induction hypothesis, a' A b' reduces to v. Since a A b reduces to a' A b', we obtain that a A b reduces to v by transitivity.

Now suppose that a and v are as indicated in the state­ment of the theorem. The prefixes of a that reduce to v are compatible, since they have a as upper bound; their greatest lower bound is the prefix ao described in the statement. E

We can give a first solution to our problem, as follows. Suppose that a —*" v and v is a term in normal form. Let ao be the minimum prefix of a such that ao —*" v, given by Theorem 3. By Proposition 2, if ao is a prefix of b then b —*" v' for some v' such that v is a prefix of v'; by Proposition 1, v' is v. Therefore, if ao is a prefix of b then we can reuse the computation a —*" v and conclude that b —*" v.

It remains for us to compute ao. As we will show, this computation can be performed at the same time as we eval­uate a, and does not require much additional work. Intu­itively, we will mark every subexpression of a necessary to compute v along the leftmost outermost reduction.

3.2 A labelled A-calculus

In order to compute minimum prefixes as discussed above, we follow the underlined method of Barendregt [Bar84], gen­eralized by use of labels as in the work of Field, Levy, or Maranget [Fie90, Lév78, Mar91]. Our application of this

method gives rise to a new labelled calculus, which we de­fine next.

We consider a A-calculus with the following extended grammar for expressions:

a,b,c ::==                                     terms

M            . . .        as in section 3.1

M             e:a labelled term (e E E)

where E is a set of labels.

There is one new one-step reduction rule:

(e:b)(a) —* e:(b(a))

The essential purpose of this rule is to move labels outwards as little as possible in order to permit /3 reduction. For ex­ample, (eo:(Ax.x(x)))(el :y) reduces to eo:((Ax.x(x))(el :y)) via the new rule, and then yields eo:((el:y)(el:y)) by the /3 rule.

There are clear correspondences between the unlabelled calculus and the labelled calculus. When a' is a labelled term, let strip (a') be the unlabelled term obtained by re­moving every label in a'. We have:

Proposition 3 (Simulation) Let a, b be terms, and let a', b' be labelled terms.

·  If a' —* b', then strip (a')—*" strip(b').

·  If a = strip(a') and a —* b, then a' —*" b' for some b' such that b = strip(b').

The labelled calculus enjoys the same fundamental theo­rems as the unlabelled calculus: confluence, normalization, and stability. The confluence theorem follows from Klop's dissertation work, because the labelled calculus is a regular combinatory reduction systems [Klo80]; the labelled calcu­lus is left-linear and without critical pairs. The normal­ization theorem can also be derived from Klop's work; al­ternatively it can be obtained from results about abstract reductions systems [GLM92], via O'Donnell's notion of left systems [O'D77]. The proof of the stability theorem is sim­ilar to the one in [HL91].

3.3 Basic caching

Suppose that a —*" v, where a is a term and v is its normal form. Put a different label on every subexpression of a, obtaining a labelled term a'. By Proposition 3, a' —*" v' for some v' such that v = strip(v'). Consider all the labels in v'; to each of these labels corresponds a subterm of a' and thus of a. Let G(a) be a prefix obtained from a by replacing with each subterm whose label does not appear in v'. We can prove that G(a) is well-defined. In particular, the value of G(a) does not depend on the choice of a' or v'; and if the label for a subterm of a appears in v' then so do the labels for all subterms that contain it.

Whena —*" v, wemaycachethepair (G(a),v). Whenwe consider a new term b, it is sufficient to check that G(a) -< b in order to produce v as the result of b. As G(a) is the part of a sufficient to get v (what we called ao in section 3.1), we obtain the following theorem:

Theorem 4 If a is a term, v is a term in normal form,

a —*" v, and G(a) -< b, then b —*" v.

Theorem 4 supports a simple caching strategy. In this strategy, we maintain a cache with the following invariants:

·    the cache is a set of pairs (ao, v), consisting each of an unlabelled prefix ao and an unlabelled term v in normal form;

·  if (ao,v) is in the cache and ao -< b then b * v.

Therefore, whenever we know that v is the normal form of a, we may add to the cache the pair (G(a), v). Theorem 4 implies that this preserves the cache invariants.

Suppose that a is a term without labels. In order to evaluate a, we do:

·  if there is a cache entry (ao, v) such that ao -< a, then return v;

·  otherwise:

-    let a' be the result of adding distinct labels to a, at every subexpression;

-    suppose that, by reduction, we find that a' * v' for v' in normal form;

-    let v = strip(v') and ao = G(a);

-    optionally, add the entry (ao, v) to the cache;

-    return v.

Both cases preserve the cache invariants. In both, the v returned is such that a * v.

In a refinement of this scheme, we may put labels at only some subexpressions of a. In this case, we replace with a subexpression of a only if this subexpression was labelled before reduction. The more labels we use, the more general the prefix obtained; this results in better cache entries, at a moderate cost. However, in examples, we prefer to use few labels in order to enhance readability.

Another refinement of the scheme consists in caching pairs of labelled prefixes and results. The advantage of not stripping the labels is that the cache records the precise de­pendencies of results on prefixes. We return to this subject in section 3.5.

3.4 Examples

The machinery that we have developed so far handles the example of the introduction (the term I(g)(r)). We leave the step-by-step calculation for that example as an exercise to the reader. As that example illustrates, pairing behaves nicely, in the sense that fst(a, b> depends only on a, as one would expect.

As a second example, we show that the Church booleans behave nicely too. The encoding of booleans is as usual:

c

true =Ax.Ay.x false c = Ax.Ay.y if a then b else c c =a(b)(c)

In the setting of the labelled A-calculus, we obtain as a de­rived rule that:

if (e:a) then b else c * e:(if a then b else c) It follows from this rule that, for example,

(Ax. if eo:x then ei:y else e2:z)(e3:true) * eo:e3:ei:y

We obtain the unlabelled prefix:

(Ax. if x then y else )(true)

and we can deduce that any expression that matches this prefix reduces to y.

Similar examples arise in the context of Vesta (see sec­tion 2 and [HL93]). A simple one is the term:

(if isC(file) then Ccompile else M3compile)(file)

where isC(f) returns true whenever I is a C source file, and file is either a C source file or an M3 source file. If isC(file) returns true, then that term yields Ccompile(file). Using labels, we can easily discover that this result does not depend on the value of M3compile, and hence that it need not be recomputed when that value changes. In fact, even isC(file) and the conditional need not be reevaluated.

In a higher-order variant of this example, the conditional compilation function is passed as an argument:

(Ax. x(file))

(Ay. if isC(y) then Ccompile(y) else M3compile(y))

Our analysis is not disturbed by the higher-order abstrac­tion, and yields the same information.

3.5 Limitations of the basic caching scheme

The basic caching scheme of section 3.3 has some limitations, illustrated by the following two concrete examples. Suppose that we have the cache entry:

((Ax. (snd(x), fst(x)> )((true, false>), (false, true>) Suppose further that we wish to evaluate the term: fst((Ax.(snd(x),fst(x)>)((true, false>))

Immediately the cache entry enables us to reduce this term to fst((false, true>)), and eventually we obtain false. How­ever, in the course of this computation, we have not learned how the result depends on the input. We are unable to make an interesting cache entry for the term we have evaluated. Given the new, similar term

fst((Ax.(snd(x),fst(x)>)((false, false>))

we cannot immediately tell that it yields the same result. As a second example, suppose that we have the cache entry:

(if true then true else ,true)

and that we wish to evaluate the term: not(if true then true else true)

In our basic caching scheme, we would initially label this term, for example as:

not(if true then eo:true else ei:true)

Then we would have to reduce this term, and as part of that task we would have to reduce the subterm

(if true then eo:true else ei:true)

At this point our cache entry would tell us that the sub-term yields true, modulo some labels. We can complete the

reduction, obtaining false, and we can make a trivial cache entry:

(not(if true then true else true), false)

However, we have lost track of which prefix of the input determines the result, and we cannot make the better cache entry:

(not(if true then true else ), false)

The moral from these examples is that cache entries should contain dependency information that indicates how each part of the result depends on each part of the input. One obvious possibility is not to strip the labels of prefixes and results before making cache entries; after all, these la­bels encode the desired dependency information. We have developed a refinement of the basic caching scheme that does precisely that, but we omit its detailed description in this paper. Next we give another solution to the limitations of the basic caching scheme.

3.6 A more sophisticated caching scheme

In this section we describe another caching scheme. This scheme does not rely directly on the labelled A-calculus, but it can be understood or even implemented in terms of that calculus.

With each reduction a —*" v of a term a, we associate a function d from prefixes of v to prefixes of a. Very roughly, if vO is a prefix of a then d(vO) is the prefix of a that yields vO in the reduction a —*" v. We write a —*"G v to indicate the function d. This annotated reduction relation is defined by the following rules.

·  Reflexivity:

a —* " Ld a

where id is the identity function on prefixes.

·  Transitivity:

a—*" G b b—*" G,c

a—*" ~G, ~G~ c

where d'; d is the function composition of d' and d.

·    Congruence: Given a function d from prefixes of b to prefixes of a, we define a function C{d} from prefixes of C{b} to prefixes of C{a}. If CO -< C then C{d}(cO) = CO; otherwise, there exists a unique bO -< b such that CO = C{bO}, and we let C{d}(CO) = C{d(bO)}. We

obtain the rule:

a —* " G b

C{a}—*" C{d} C{b}

·  /3: (Ax.b)(a)—*"Gn b{a/x}

where d3( ) = and, for CO =and ~ CO -< b{a/x}, d3(CO) = (Ax.bO)(aO) where aO and bO are the least prefixes such that aO -< a, bO -< b, and CO -< bO{aO/x}.

These rules are an augmentation of the reduction rules of section 3.1, in the following sense:

Proposition 4 If a —* " G b then a —*" b.

·  If a —*" b then a —* " G b for some d.

The rules may seem a little mysterious, but they can be understood in terms of labels. Imagine that every subex­pression of a is labelled (with an invisible label), that a —*" G v, and that vO -< v; then d(vO) is the least prefix of a that contains all of the labels that end up in vO.

As an example, consider the term (Ax.x(x))(a), where a is arbitrary. By /3, we have

(Ax.x(x))(a)—*"Gn a(a)

where d3 is such that, for instance, d3( ) is , d3(a(a)) is the entire (Ax.x(x))(a), and d3(a( )) is (Ax.x( ))(a). If we had labelled the initial term (Ax.x(x))(a) before reduction, then the labels that would decorate the result prefix a( ) would be all those of the initial term except for the label of the argument occurrence of x; this justifies that d3 (a( )) be (Ax.x( ))(a).

We obtain:

Theorem 5 If a is a term, a —* " G v, and d(v) -< b, then

b —* " G v.

This theorem gives rise to a new caching scheme. The cache entries in this scheme consist of judgements a —* " G v, where a and v are terms and d is a function from prefixes of v to prefixes of a. The representation of d can be its graph (i.e., a set of pairs of prefixes) or a formal expression (written in terms of id, d3, etc.); it can even be the pair of a labelling of a and a corresponding labelling of v. According to the theorem, whenever we encounter a term b such that d(v) -< b, we may deduce that b —* " G v.

This caching scheme does not suffer from the limitations of the basic one. In particular, each cache entry contains dependency information for every part of the result, rather than for the whole result. Moreover, the rules of inference provide a way of combining dependency information for sub-computations; therefore, we can make an interesting cache entry whenever we do an evaluation, even if we used the cache in the course of the evaluation.

3.7 Call-by-value

So far, we have considered only call-by-name evaluation. Here we define a call-by-value version of the labelled A-calculus, showing that we can adapt our approach to call­by-value evaluation. The move from a call-by-name to a call-by-value labelled A-calculus does not affect the basic caching scheme of section 3.3, which remains sound.

The syntax of the call-by-value labelled A-calculus is that given in section 3.2. The /3 rule is restricted to:

(Ax.b)v —* b{v/x}

where v ranges over terms of the form x, x(al) ... (an), or Ax.a; such terms are called values. As in section 3.2, we have a rule for moving labels outwards from the left-hand side of applications:

(e:b)(a) —* e:(b(a))

We have an additional rule for the right-hand side of appli­cations:

(Ax.b)(e:a) —* e:((Ax.b)(a))

One might be tempted to adopt a stronger rule, namely b(e:a) —* e:(b(a)), but this rule creates critical pairs.

With call-by-name evaluation, the term (Ax.b)(a) does not depend on a if x does not occur free in b. In particular, any term that has (Ax.b)( ) as prefix yields the same result. With call-by-value evaluation, on the other hand, the be­havior of (Ax.b)(a) depends on a: it depends on whether a can be reduced to a value or not. Therefore, a term may match the prefix (Ax.b)( ) but not yield the same result as (Ax.b)(a).

The treatment of labels in our rules for call-by-value takes into account that (Ax.b)(a) depends on a even if x does not occur free in b. Suppose that a can be reduced to a value v. To see how (Ax.b)(a) depends on a, we add a label in front of a, obtaining (Ax.b)(e:a). Labelled reduc­tion yields e:((Ax.b)(a)), then e:((Ax.b)(v)), and finally e:b. The label e does not disappear, as it would in call-by-name evaluation.

For instance, let us consider the case where a is the value y(z). From a labelled reduction we obtain the pre­fix (Ax.b)( ( )). Any term that matches this prefix reduces to the same result as (Ax.b)(y(z)). In fact, the evaluation of (Ax.b)(y(z)) reveals that this term depends on the "val­ueness" of y(z), but does not depend on y(z) in any other way. The prefix (Ax.b)( ( )) does not express this informa­tion with full accuracy, but approximates it.

4 Dependencies in a Weak A-calculus with Records

The techniques developed in the previous section are not limited to the pure A-calculus. In this section, we demon­strate their applicability to a more realistic language, with primitive booleans, primitive records, and explicit substitu­tions. The operational semantics of this language is weak (so function and record closures are not reduced).

4.1 A weak calculus with records

We consider an extended A-calculus with the grammar for terms and for substitutions given in Figure 1. In the gram­mar, L is a set of names (field names for records) and else is a keyword used for "else" clauses in records and in substi­tutions. As we show below, these "else" clauses are useful in dependency analysis. We typically think of the "else" clauses as corresponding to run-time errors (missing fields, unbound variables). The term a +l in an "else" clause can be arbitrary; a term that represents a run-time error will do.

We use the following notation for extending substitu­tions. Let s be xl = al,.. . , x = a , else = a +l; then (x = a) .s is x= a,xl= al,... ,x = a , else = a +l if x is not among the variables xl,. . . , x , and it is x = a, xl =

al,. . . ,xi_l = aj_l,xL+l = aL+l,. . . ,x = a , else = a +l

if x is x',.

The one-step reduction rules now use explicit substitu­tions. They are given in Figure 2. In particular, the ana­logue for the /3 rule is ((Ax.b)[s])a —* b[(x = a) . s], which extends an explicit substitution; the replacement of a for x happens gradually, through other rules which push the substitution inwards in b.

An active context is a context generated by the grammar of Figure 3. We adopt the following congruence rule: for any active context C,

a—*b

C{a} —* C{b}

Notice that this rule allows us to compute inside substitu­tions, but not under A, inside records, or in the term part of

closures (since Ax.C, (. . . ,l- = C,. . .>, (. . . , else = C>, and C[s] are not listed as active contexts). The relation —*" is the reflexive, transitive closure of —*.

The prefix ordering for this language is interesting. Let s be the substitution xl = al,. . . x = a , else = a +l, let r be the record (ll = al,. . . , l = a , else = a +l>. We associate with s and r the following functions from variables or field names to prefixes:

[s](x) =   { aL               if x = xj for some i

a +l                             otherwise

aL        if l = li for some i

{

[r](l) =        a +l                         otherwise

Intuitively, [s](x) is the image of x through s, and [r](l) is the image of l through r. The prefix ordering is as before except for substitutions and records where s -< s'if [s](x) -< [s'](x) for all x E V, and r -< r' if [r](l) -< [r'](l) for all l E L.

According to this definition, the order of the components of substitutions and records does not matter. In addition, we obtain that, if the "else" clause has a hole, then any other holes can be collapsed into it; for example, the prefixes

(ll = a,l2 = ,else = >, (l3 = ,ll = a, else = >, and (ll = a, else = > are all equivalent.

This A-calculus enjoys the same theorems as the pure A-calculus of section 3.1 (modulo that now -< is actually a pre-order, not an order). These theorems should not be taken for granted, however. Their proofs are less easy, but they can be done by using results on abstract reduction sys­tems [GLM92]. The stability theorem ensures that there is a minimum prefix for obtaining any result; moreover, the maximality and monotonicity propositions are the basis for a caching mechanism.

Finally, we should note that, in this calculus, closures may contain irrelevant bindings. For example, consider the function closure (Ay.y)[x = z, else = w], where z is a vari­able and w is an arbitrary normal form. This closure reduces only to itself; the irrelevant substitution does not disappear. In this case, we will consider that the result depends on the substitution. We could add rules for minimizing substitu­tions but, for the sake of simplicity, we do not.

4.2 A weak labelled calculus with records

Following the same approach as in section 3, we define a labelled calculus:

a,b,c ::==                                    terms

M            . . .       as in section 4.1

M                        e:a labelled term (e E E)

There are new one-step reduction rules in addition to those of section 4.1:

(e:b)(a) —* e:(b(a)) (e:b)[s] —* e:(b[s])

(e:b).l —* e:(b.l)
if (e:a) then b else c —* e:(if a then b else c)

The grammar for active contexts is extended with one clause:

C ::==active contexts M                     . . .         as in section 4.1

M        e:C labelled context (e E E)

a,b,c        ::==                                                                             terms

I         x                                                        variable (x E V)

I         Ax.a                                              abstraction (x E V)

I         b(a)                                                           application

Ia[s]                                                                                        closure
(l1 = a1, . .. ,l = a , else = a +1) record (l'. EL, distinct)

a.l                                                                             selection (l E L)

true                                                                                            true

false                                                                                         false

if a then b else c                                                                conditional

s               ::==x1 = a1,. .. , x = a , else = a +1                                                                                                                                          substitutions

(x'. E V, distinct)

Figure 1. Grammar for the weak A-calculus.

x[x1 = a1, . .. ,x = a , else = a +1] -* a'.                                  (x = x'.)

x[x1 = a1, . .. ,x = a , else = a +1] -* a +1                                                                                                                                   (x=all ~x'.)

(b(a))[s] -* b[s](a[s])

((Ax.b)[s])a -* b[(x = a)      s]

(b.l)[s] -* (b[s]).l

((l1 = a1, ... , l = a , else = a +1))[s].l -* a'.[s]                                     (l = l'.)

((l1 = a1, . . . ,l = a , else = a +1))[s].l -* a +1[s]                                              (l=all ~l'.)

true[s] -* true false[s] -* false

(if a then b else c)[s] -* if a[s] then b[s] else c[s]

if true then b else c -* b if false then b else c -* c

Figure 2. One-step reduction rules for the weak A-calculus.

C ::==                                                                          active contexts

I                                                                                                                        hole

IC(a)                                                                application (left)

I        b(C)                                                        application (right)

Ia[S]                                                                                 closure

I        C.l                                                              selection (l E L)

I        if C then b else c                                      conditional (guard)

I        if a then C else c                                        conditional (then)

I        if a then b else C                                         conditional (else)

S ::==           x1 = a1,. . . , x'. = C'.,. . . x = a , else = a +1                                                                                           substitutions

As usual, the congruence rule permits reduction in any ac‑

tive context:

a—*b

C{a} —* C{b}

for any active context C.

4.3 Dependency analysis and caching (by example)

The labelled calculus provides a basis for dependency anal­ysis and caching. The sequence of definitions and results would be much as in section 3. We do not go through it, but rather give one instructive example.

We consider the term:

((Ax.x.li)(li =yi,l2 =y2,else= w)) [yi = zi,y2 = z2, else = w]

This term yields zi. We label the term, obtaining:

((Ax. x.li)(li = (ei:yi), l2 = (e2:y2), else = (e3:w)))

[yi = (e4:zi),y2 = (e5:z2), else = (e6:w)]

This labelled term yields ei :e4 :zi, so we immediately con­clude that the following prefix also yields zi:

((Ax. x.li)(li = yi,l2 = ,else =))

[yi = zi, y2 =, else =]

Thanks to our definition of the prefix ordering, this prefix is equivalent to:

((Ax.x.li)(li =yi,else= ))[yi =zi,else= ]

Suppose that, in our cache, we record this prefix with the associated result zi; and suppose that later we are given the term:

((Ax.x.li)(li =yi,l3 =yi7(yi7),else= w')) [yi7 = zi,yi = zi, else = w']

This term matches the prefix in the cache entry, so we im­mediately deduce that it reduces to zi.

As this example illustrates, the labelled reductions help us identify irrelevant components of both substitutions and records. The prefix ordering and the use of else then allow us to delete those irrelevant components and to add new irrelevant components.

In some applications, irrelevant components may be com­mon. For example, in the context of Vesta, a large record may bundle compiler switches, environment variables, etc.; for many computations, most of these components are irrel­evant. In such situations, the ability to detect and to ignore irrelevant components is quite useful—it means more cache hits.

5 Conclusions

We have developed techniques for caching in higher-order functional languages. Our approach relies on using depen­dency information from previous executions in addition to the outputs of those executions. This dependency informa­tion is readily available and easy to exploit (once the proper tools are in place); it yields results that could be difficult to obtain completely statically. The techniques are based on a labelled A-calculus and, despite their pragmatic simplicity, benefit from a substantial body of theory.

Acknowledgements

We thank Yanhong Liu and Tim Teitelbaum for discussions about related work; Allan Heydon, Jim Horning, Roy Levin, and Yuan Yu for discussions about Vesta; Alain Deutsch and Simon Peyton Jones for comments on strictness analysis; and Georges Gonthier for assorted remarks on a draft of this paper.

References

[ACCL91] M. Abadi, L. Cardelli, P.-L. Curien, and J.-J. Levy. Explicit substitutions. Journal of Func­tional Programming, 1(4):375-416, 1991.

[Bar84]    Henk P. Barendregt. The Lambda Calculus.
North Holland, Revised edition, 1984.

[Ber78] G. Berry. Stable models of typed lambda-calculi. In Proc. 5th Coll. on Automata, Languages and Programming, Lectures Notes in Computer Sci­ence, pages 72-89. Springer-Verlag, 1978.

[BHA86] G. L. Burn, C. Hankin, and S. Abramsky. Strict­ness analysis for higher-order functions. Science of Computer Programming, 7:249-278, 1986.

[Fie90] John Field. On lariiness and optimality in lambda interpreters: Tools for specification and analysis. In Proceedings of the Seventeenth Annual ACM Symposium on Principles of Programming Lan­guages, pages 1-15, 1990.

[FT90] John Field and Tim Teitelbaum. Incremental re­duction in the lambda calculus. In Proceedings of the 1990 ACM Conference on LISP and Func­tional Programming, pages 307-322. ACM, 1990.

[GLM92] Georges Gonthier, Jean-Jacques Levy, and Paul­André Mellites. An abstract standardisation the­orem. In Seventh Annual IEEE Symposium on Logic in Computer Science, 1992.

[GVS95] Milind Gandhe, G. Venkatesh, and Amitabha Sanyal. Labeled A-calculus and a generalised no­tion of strictness. In Asian Computing Science Conference, Lecture Notes in Computer Science. Springer-Verlag, December 1995.

[HL91] Gerard Huet and Jean-Jacques Levy. Compu­tations in Orthogonal Term Rewriting Systems. MIT Press, 1991.

[HL93] Chris Hanna and Roy Levin. The Vesta language for configuration management. Research Re­port 107, Digital Equipment Corporation, Sys­tems Research Center, June 1993. Available from http://www.research.digital.com/SRC.

[Hug85] John Hughes. Lariy memo-functions. In Jean-Pierre Jouannaud, editor, Functional Program­ming Languages and Computer Architecture, pages 129-146, September 1985.

[Klo80]    Jan Willem Klop. Combinatory Reduction Sys‑
tems. PhD thesis, CWI, 1980.

[Lév78] Jean-Jacques Levy. Rueductions Correctes et Op­timales dans le Lambda Calcul. PhD thesis, Uni­versity of Paris 7, 1978.

[LM93] Roy Levin and Paul R. McJones. The Vesta approach to precise configuration of large software systems. Research Report 105, Digital Equipment Corporation, Systems Re­search Center, June 1993. Available from http://www.research.digital.com/SRC.

[Mar91] Luc Maranget. Optimal derivations in weak lambda-calculi and in orthogonal term rewriting systems. In Proceedings of the Eighteenth Annual ACM Symposium on Principles of Programming Languages, 1991.

[Mic68]     D. Michie. 'Memo' functions and machine learn‑
ing. Nature, 218:19-22, 1968.

[O'D77] Michael O'Donnell. Computing in Systems de­scribed by Equations. PhD thesis, Cornell Uni­versity, 1977.

[Pug88] William Pugh. Incremental Computation and the Incremental Evaluation of Functional Programs. PhD thesis, Cornell University, 1988.