CEK Machine

Last updated

A CEK Machine is an abstract machine invented by Matthias Felleisen and Daniel P. Friedman that implements left-to-right call by value. [1] It is generally implemented as an interpreter for functional programming languages, but can also be used to implement simple imperative programming languages. A state in a CEK machine includes a control statement, environment and continuation. The control statement is the term being evaluated at that moment, the environment is (usually) a map from variable names to values, and the continuation stores another state, or a special halt case. It is a simplified form of another abstract machine called the SECD machine. [2] [3] [4]

Contents

The CEK machine builds on the SECD machine by replacing the dump (call stack) with the more advanced continuation, and putting parameters directly into the environment, rather than pushing them on to the parameter stack first. Other modifications can be made which creates a whole family of related machines. For example, the CESK machine has the environment map variables to a pointer on the store, which is effectively a heap. This allows it to model mutable state better than the ordinary CEK machine. The CK machine has no environment, and can be used for simple calculi without variables. [5]

Description

A CEK machine can be created for any programming language so the term is often used vaguely. For example, a CEK machine could be created to interpret the lambda calculus. Its environment maps variables to closures and the continuations are either a halt, a continuation to evaluate an argument (ar), or a continuation to evaluate an application after evaluating a function (ap): [4] [6]

TransitionFromTo
Variablex, E, Kt, E', K where closure(t,E') = E[x]
Application(f e), E, Kf, E, ar(e, E, K)
Abstraction while evaluating functionAbs, E, ar(t, E', K)t, E', ap(Abs, E, K)
Abstraction while evaluating argumentAbs, E, ap(λx.Exp, E', K)Exp, E'[x=closure(Abs,E)], K

Representation of components

Each component of the CEK machine has various representations. The control string is usually a term being evaluated, or sometimes, a line number. For example, a CEK machine evaluating the lambda calculus would use a lambda expression as a control string. The environment is almost always a map from variables to values, or in the case of CESK machines, variables to addresses in the store. The representation of the continuation varies. It often contains another environment as well as a continuation type, for example argument or application. It is sometimes a call stack, where each frame is the rest of the state, i.e. a control statement and an environment.

There are some other machines closely linked to the CEK machine.

CESK machine

The CESK machine is another machine closely related to the CEK machine. The environment in a CESK machine maps variables to pointers, on a "store" (heap) hence the name "CESK". It can be used to model mutable state, for example the Λσ calculus described in the original paper. This makes it much more useful for interpreting imperative programming languages, rather than functional ones. [5]

CS machine

The CS machine contains just a control statement and a store. It is also described by the original paper. In an application, instead of putting variables into an environment it substitutes them with an address on the store and putting the value of the variable in that address. The continuation is not needed because it is lazily evaluated; it does not need to remember to evaluate an argument. [5]

SECD machine

The SECD machine was the machine that CEK machine was based on. It has a stack, environment, control statement and dump. The dump is a call stack, and is used instead of a continuation. The stack is used for passing parameters to functions. The control statement was written in postfix notation, and the machine had its own "programming language". A lambda calculus statement like this:

(M N)

would be written like this:

N:M:ap

where ap is a function that applies two abstractions together. [7] [8]

Origins

On page 196 of "Control Operators, the SECD Machine, and the -Calculus", [9] and on page 4 of the technical report with the same name, [10] Matthias Felleisen and Daniel P. Friedman wrote "The [CEK] machine is derived from Reynolds' extended interpreter IV.", referring to John Reynolds's Interpreter III in "Definitional Interpreters for Higher-Order Programming Languages". [11] [12]

To wit, here is an implementation of the CEK machine in OCaml, representing lambda terms with de Bruijn indices:

typeterm=INDofint(* de Bruijn index *)|ABSofterm|APPofterm*term

Values are closures, as invented by Peter Landin:

typevalue=CLOofterm*valuelisttypecont=C2ofterm*valuelist*cont|C1ofvalue*cont|C0letreccontinue(c:cont)(v:value):value=matchc,vwithC2(t1,e,k),v0->evalt1e(C1(v0,k))|C1(v0,k),v1->applyv0v1k|C0,v->vandeval(t:term)(e:valuelist)(k:cont):value=matchtwithINDn->continuek(List.nthen)|ABSt'->continuek(CLO(t',e))|APP(t0,t1)->evalt0e(C2(t1,e,k))andapply(v0:value)(v1:value)(k:cont)=let(CLO(t,e))=v0inevalt(v1::e)kletmain(t:term):value=evalt[]C0

This implementation is in defunctionalized form, with cont and continue as the first-order representation of a continuation. Here is its refunctionalized counterpart:

letreceval(t:term)(e:valuelist)(k:value->'a):'a=matchtwithINDn->k(List.nthen)|ABSt'->k(CLO(t',e))|APP(t0,t1)->evalt0e(funv0->evalt1e(funv1->applyv0v1k))andapply(v0:value)(v1:value)(k:value->'a):'a=let(CLO(t,e))=v0inevalt(v1::e)kletmain(t:term):value=evalt[](funv->v)

This implementation is in left-to-right continuation-passing style, where the domain of answers is polymorphic, i.e., is implemented with a type variable. [13] This continuation-passing implementation is mapped back to direct style as follows:

letreceval(t:term)(e:valuelist):value=matchtwithINDn->List.nthen|ABSt'->CLO(t',e)|APP(t0,t1)->letv0=evalt0eandv1=evalt1einapplyv0v1andapply(v0:value)(v1:value):value=let(CLO(t,e))=v0inevalt(v1::e)letmain(t:term):value=evalt[]

This direct-style implementation is also in defunctionalized form, or more precisely in closure-converted form. Here is the result of closure-unconverting it:

typevalue=FUNof(value->value)letreceval(t:term)(e:valuelist):value=matchtwithINDn->List.nthen|ABSt'->FUN(funv->evalt'(v::e))|APP(t0,t1)->letv0=evalt0eandv1=evalt1einapplyv0v1andapply(v0:value)(v1:value):value=let(FUNf)=v0infv1letmain(t:term):value=evalt[]

The resulting implementation is compositional. It is the usual Scott-Tarski definitional self-interpreter where the domain of values is reflexive (Scott) and where syntactic functions are defined as semantic functions and syntactic applications are defined as semantic applications (Tarski).

This derivation mimics Danvy's rational deconstruction of Landin's SECD machine. [14] The converse derivation (closure conversion, CPS transformation, and defunctionalization) is documented in John Reynolds's article "Definitional Interpreters for Higher-Order Programming Languages", which is the origin of the CEK machine and was subsequently identified as a blueprint for transforming compositional evaluators into abstract machines as well as vice versa. [11] [15]

Modern times

The CEK machine, like the Krivine machine, does not only functionally correspond to a meta-circular evaluator (via a left-to-right call-by-value CPS transformation), [15] it also syntactically corresponds to the calculus -- a calculus that uses explicit substitution -- with a left-to-right applicative-order reduction strategy, [16] [17] and likewise for the SECD machine (via a right-to-left call-by-value CPS transformation). [18]

Related Research Articles

Lambda calculus is a formal system in mathematical logic for expressing computation based on function abstraction and application using variable binding and substitution. Untyped lambda calculus, the topic of this article, is a universal model of computation that can be used to simulate any Turing machine. It was introduced by the mathematician Alonzo Church in the 1930s as part of his research into the foundations of mathematics. In 1936, Church found a formulation which was logically consistent, and documented it in 1940.

<span class="mw-page-title-main">Scheme (programming language)</span> Dialect of Lisp

Scheme is a dialect of the Lisp family of programming languages. Scheme was created during the 1970s at the MIT Computer Science and Artificial Intelligence Laboratory and released by its developers, Guy L. Steele and Gerald Jay Sussman, via a series of memos now known as the Lambda Papers. It was the first dialect of Lisp to choose lexical scope and the first to require implementations to perform tail-call optimization, giving stronger support for functional programming and associated techniques such as recursive algorithms. It was also one of the first programming languages to support first-class continuations. It had a significant influence on the effort that led to the development of Common Lisp.

The SECD machine is a highly influential virtual machine and abstract machine intended as a target for functional programming language compilers. The letters stand for Stack, Environment, Control, Dump—the internal registers of the machine. The registers Stack, Control, and Dump point to stacks, and Environment points to an associative array.

In programming languages, a closure, also lexical closure or function closure, is a technique for implementing lexically scoped name binding in a language with first-class functions. Operationally, a closure is a record storing a function together with an environment. The environment is a mapping associating each free variable of the function with the value or reference to which the name was bound when the closure was created. Unlike a plain function, a closure allows the function to access those captured variables through the closure's copies of their values or references, even when the function is invoked outside their scope.

Unlambda is a minimal, "nearly pure" functional programming language invented by David Madore. It is based on combinatory logic, an expression system without the lambda operator or free variables. It relies mainly on two built-in functions and an apply operator. These alone make it Turing-complete, but there are also some input/output (I/O) functions to enable interacting with the user, some shortcut functions, and a lazy evaluation function. Variables are unsupported.

In combinatory logic for computer science, a fixed-point combinator, is a higher-order function that returns some fixed point of its argument function, if one exists.

Operational semantics is a category of formal programming language semantics in which certain desired properties of a program, such as correctness, safety or security, are verified by constructing proofs from logical statements about its execution and procedures, rather than by attaching mathematical meanings to its terms. Operational semantics are classified in two categories: structural operational semantics formally describe how the individual steps of a computation take place in a computer-based system; by opposition natural semantics describe how the overall results of the executions are obtained. Other approaches to providing a formal semantics of programming languages include axiomatic semantics and denotational semantics.

In computer science, a continuation is an abstract representation of the control state of a computer program. A continuation implements (reifies) the program control state, i.e. the continuation is a data structure that represents the computational process at a given point in the process's execution; the created data structure can be accessed by the programming language, instead of being hidden in the runtime environment. Continuations are useful for encoding other control mechanisms in programming languages such as exceptions, generators, coroutines, and so on.

In some programming languages, eval, short for the English evaluate, is a function which evaluates a string as though it were an expression in the language, and returns a result; in others, it executes multiple lines of code as though they had been included instead of the line including the eval. The input to eval is not necessarily a string; it may be structured representation of code, such as an abstract syntax tree, or of special type such as code. The analog for a statement is exec, which executes a string as if it were a statement; in some languages, such as Python, both are present, while in other languages only one of either eval or exec is.

In functional programming, continuation-passing style (CPS) is a style of programming in which control is passed explicitly in the form of a continuation. This is contrasted with direct style, which is the usual style of programming. Gerald Jay Sussman and Guy L. Steele, Jr. coined the phrase in AI Memo 349 (1975), which sets out the first version of the Scheme programming language. John C. Reynolds gives a detailed account of the numerous discoveries of continuations.

In the Scheme computer programming language, the procedure call-with-current-continuation, abbreviated call/cc, is used as a control flow operator. It has been adopted by several other programming languages.

In computing, a meta-circular evaluator (MCE) or meta-circular interpreter (MCI) is an interpreter which defines each feature of the interpreted language using a similar facility of the interpreter's host language. For example, interpreting a lambda application may be implemented using function application. Meta-circular evaluation is most prominent in the context of Lisp. A self-interpreter is a meta-circular interpreter where the interpreted language is nearly identical to the host language; the two terms are often used synonymously.

In computer science, A-normal form is an intermediate representation of programs in functional programming language compilers. In ANF, all arguments to a function must be trivial. That is, evaluation of each argument must halt immediately.

In computer science, Peter Landin's J operator is a programming construct that post-composes a lambda expression with the continuation to the current lambda-context. The resulting “function” is first-class and can be passed on to subsequent functions, where if applied it will return its result to the continuation of the function in which it was created.

In programming languages, a delimited continuation, composable continuation or partial continuation, is a "slice" of a continuation frame that has been reified into a function. Unlike regular continuations, delimited continuations return a value, and thus may be reused and composed. Control delimiters, the basis of delimited continuations, were introduced by Matthias Felleisen in 1988 though early allusions to composable and delimited continuations can be found in Carolyn Talcott's Stanford 1984 dissertation, Felleisen et al., Felleisen's 1987 dissertation, and algorithms for functional backtracking, e.g., for pattern matching, for parsing, in the Algebraic Logic Functional programming language, and in the functional implementations of Prolog where the failure continuation is often kept implicit and the reason of being for the success continuation is that it is composable.

In programming language semantics, normalisation by evaluation (NBE) is a method of obtaining the normal form of terms in the λ-calculus by appealing to their denotational semantics. A term is first interpreted into a denotational model of the λ-term structure, and then a canonical (β-normal and η-long) representative is extracted by reifying the denotation. Such an essentially semantic, reduction-free, approach differs from the more traditional syntactic, reduction-based, description of normalisation as reductions in a term rewrite system where β-reductions are allowed deep inside λ-terms.

In mathematical logic, a term denotes a mathematical object while a formula denotes a mathematical fact. In particular, terms appear as components of a formula. This is analogous to natural language, where a noun phrase refers to an object and a whole sentence refers to a fact.

<span class="mw-page-title-main">Krivine machine</span> Theoretical model of computation

In theoretical computer science, the Krivine machine is an abstract machine. As an abstract machine, it shares features with Turing machines and the SECD machine. The Krivine machine explains how to compute a recursive function. More specifically it aims to define rigorously head normal form reduction of a lambda term using call-by-name reduction. Thanks to its formalism, it tells in details how a kind of reduction works and sets the theoretical foundation of the operational semantics of functional programming languages. On the other hand, Krivine machine implements call-by-name because it evaluates the body of a β-redex before it applies the body to its parameter. In other words, in an expression u it evaluates first λx. t before applying it to u. In functional programming, this would mean that in order to evaluate a function applied to a parameter, it evaluates first the function before applying it to the parameter.

In computer science, refocusing is a program transformation used to implement a reduction semantics—i.e., a small-step operational semantics with an explicit representation of the reduction context—more efficiently. It is a step towards implementing a deterministic semantics as a deterministic abstract machine.

References

    1. Accattoli, Beniamino; Barenbaum, Pablo; Mazza, Damiano (19 August 2014), "Distilling abstract machines", Proceedings of the 19th ACM SIGPLAN international conference on Functional programming, ACM, pp. 363–376, doi:10.1145/2628136.2628154, ISBN   9781450328739, They differ on how they behave with respect to applications: the CEK implements left-to-right call-by-value, i.e. it first evaluates the function part, the LAM gives instead precedence to arguments, realizing right-to-left call-by-value.
    2. Jens Palsberg (28 August 2009). Semantics and Algebraic Specification: Essays Dedicated to Peter D. Mosses on the Occasion of His 60th Birthday. Springer Science & Business Media. pp. 162–. ISBN   978-3-642-04163-1.
    3. Felleisen, Matthias; Findler, Robert Bruce; Flatt, Matthew (10 July 2009). Semantics Engineering with PLT Redex. MIT Press. pp. 113–. ISBN   978-0-262-25817-3.
    4. 1 2 Thielecke, Hayo (December 9, 2015). "Implementing functional languages with abstract machines" (PDF). Archived from the original (PDF) on 2021-07-17. Retrieved September 9, 2020.
    5. 1 2 3 Felleisen, Matthias; Friedman, Daniel P. (October 1986). "A Calculus for Assignments in Higher-Order Languages" (PDF).
    6. "A refresher on the CEK machine". CMSC 330, Summer 2015. Retrieved 2020-09-19.
    7. "The SECD Virtual Machine" (PDF).
    8. "secd". www.cs.bath.ac.uk. Retrieved 2020-09-23.
    9. Felleisen, Matthias; Friedman, Daniel (1986). Control Operators, the SECD Machine, and the -Calculus. Formal Description of Programming Concepts III, Elsevier Science Publishers B.V. (North-Holland). pp. 193–217. doi:10.1007/978-3-319-14125-1_13.
    10. Felleisen, Matthias; Friedman, Daniel P. (1986). Control Operators, the SECD Machine, and the -Calculus (PDF) (Technical report). Department of Computer Science, Indiana University. 197.
    11. 1 2 Reynolds, John C. (1972). "Definitional Interpreters for Higher-Order Programming Languages". Proceedings of the ACM annual conference on - ACM '72. Vol. 2. Proceedings of 25th ACM National Conference. pp. 717–740. doi:10.1145/800194.805852.
    12. Reynolds, John C. (1998). "Definitional Interpreters Revisited". Higher-Order and Symbolic Computation. 11 (4): 355–361. doi:10.1023/A:1010075320153. S2CID   34126862.
    13. Thielecke, Hayo (2004). "Answer Type Polymorphism in Call-by-Name Continuation Passing". Programming Languages and Systems. Lecture Notes in Computer Science. Vol. 2986. Programming Languages and Systems, 13th European Symposium on Programming, ESOP 2004, LNCS 2986, Springer. pp. 279–293. doi: 10.1007/978-3-540-24725-8_20 . ISBN   978-3-540-21313-0.
    14. Danvy, Olivier (2004). A Rational Deconstruction of Landin's SECD Machine (PDF). Implementation and Application of Functional Languages, 16th International Workshop, IFL 2004, Revised Selected Papers, Lecture Notes in Computer Science 3474, Springer. pp. 52–71. ISSN   0909-0878.
    15. 1 2 Ager, Mads Sig; Biernacki, Dariusz; Danvy, Olivier; Midtgaard, Jan (2003). "A Functional Correspondence between Evaluators and Abstract Machines". Brics Report Series. 10 (13). 5th International ACM SIGPLAN Conference on Principles and Practice of Declarative Programming (PPDP'03): 8–19. doi: 10.7146/brics.v10i13.21783 .
    16. Biernacka, Małgorzata; Danvy, Olivier (2007). "A Concrete Framework for Environment Machines". ACM Transactions on Computational Logic. 9 (1). Article #6: 1–30. doi: 10.7146/brics.v13i3.21909 .
    17. Rozowski, Wojciech (2021). Formally verified derivation of an executable and terminating CEK machine from call-by-value lambda-p-hat-calculus (PDF) (Thesis). University of Southampton.
    18. Danvy, Olivier; Millikin, Kevin (2008). "A rational deconstruction of Landin's SECD machine with the J operator". Logical Methods in Computer Science. 4 (4): 1–67. arXiv: 0811.3231 . doi:10.2168/LMCS-4(4:12)2008. S2CID   7926360.