In computer science, Scott encoding is a way to represent algebraic data types in the lambda calculus, following their syntactic definition without regard whether they are recursive or not. This is unlike Church encoding which treats recursive data types specially, representing them with right folds. The data and operators form a mathematical structure which is embedded in the lambda calculus.
Mogensen–Scott encoding extends and slightly modifies Scott encoding by applying the encoding to Metaprogramming [ citation needed ]. This encoding allows the representation of lambda calculus terms, as data, to be operated on by a meta program.
Scott encoding of numbers follows the Peano definition of natural numbers as a sum of two cases, the zero case and the successor case,
Correspondingly, Scott numerals are functions which expect two arguments, two handlers, each receiving the corresponding case's data from the number:
The zero case has no data. The successor case data is its Scott numeral, which is served as the argument to the corresponding handler.
When a Scott numeral is supplied with two handlers, it calls the appropriate one with its corresponding data. Scott encoded values embody a choice between the sum data type's cases.
Recursive operations on Scott numerals require explicit use of recursion, e.g. using combinator:
Church numerals, on the other hand, already embody the primitive recursion and perform the folding / looping on their own:
The key difference is that Scott's handler's argument is the number's own predecessor Scott numeral, unprocessed, whereas Church's folding / looping function's argument is the result of folding / looping over its predecessor.
Scott encoding of lists follows their definition as a sum of two cases, the empty list case and the cons case,
Correspondingly, Scott lists are functions which expect two arguments, two handlers, each receiving the corresponding data from the list:
The empty list case has no data. The cons case data are the head element and the list's tail, which are served as the arguments to the corresponding handler.
When a Scott list is supplied with two handlers, it calls the appropriate one with its corresponding data. Scott encoded values embody a choice between the sum data type's cases.
Recursive operations on Scott lists require explicit use of recursion, e.g. using combinator:
Church lists, on the other hand, already embody the primitive recursion and perform the folding on their own:
The key difference is that Scott's handler's second argument is the list's own tail, unprocessed, which is also a Scott list, whereas Church's folding function's second argument is the result of folding over its tail.
This is the argument that corresponds to the recursive datum in the data type definition. There are no differences between the two encodings in the other, non-recursive data fields, except possibly in the corresponding arguments' order.
Scott encoding appears first in a set of unpublished lecture notes by Dana Scott [1] whose first citation occurs in the book Combinatorial Logic, Volume II. [2] Michel Parigot gave a logical interpretation of and strongly normalizing recursor for Scott-encoded numerals, [3] referring to them as the "Stack type" representation of numbers. Torben Mogensen later extended Scott encoding for the encoding of Lambda terms as data. [4]
Partially applying a lambda term results in data received as arguments corresponding to its parameters to be "stored" in the resulting term. A term
can be regarded as a lambda calculus representation of an nary tuple . Thus, the term
can be seen as an nary tuple constructor. In fact, for it coincides with the Church encoding for pairs, or 2-tuples.
Tuples are natural representation of products. In algebraic data types discipline of seeing types as sums of products, each alternative is a product, and can be represented as a tuple of corresponding arity.
Such data are processed, or "handled", by applying them to functions of matching arity, that will thus receive the original individual constituents of a tuple as arguments:
Now suppose there are N alternatives in the sum, N product constructors, each with arguments;
All these alternatives are used in the same way, by applying to the same set of handler functions , so all of them together constitute the sum data type's representation as a set of lambda terms.
Each constructor selects a different handling function from the function parameters . This provides branching in the process flow, based on the constructor, akin to type-based pattern matching. Each constructor may have a different arity (number of parameters). If a constructor's arity , the corresponding handler receives no arguments. For example a "maybe pair or nothing", in other words, "a possibly empty cons pair", can be represented by the set of two lambda terms,
corresponding to the two possible alternatives. Whether either of the pair constituents is also a pair is disregarded, just that there are two of them is what's addressed by this encoding. This type consists of two alternatives: one contains no data, the other contains two.
If all the constructors have no parameters then the set of such constructors acts like an enum, a sum type of 1-tuples, or a type with a fixed number of values. If the constructors have parameters, whether they correspond recursively to the same data type's alternatives is ignored. Recursion can be explicitly performed as usual in lambda calculus through the use of fixed points or self-application.
Let D be a datatype with N constructors, , such that constructor has arity .
The Scott encoding of constructor of the data type D is
Mogensen extends Scott encoding to encode the lambda term data type. This allows a lambda term to be represented as data, within lambda calculus itself. The meta function mse converts a lambda term into the corresponding Scott representation of it as a datum. A lambda term data type consists of three alternatives – variable, application, and abstraction:
The "lambda term" is represented as a tagged union with three cases:
For example,
Recursion is ignored in Scott but is directly addressed in Church encodings which already are foldings of the data type on the recursive component(s). Thus, for any non-recursive type Scott encoding coincides with its Church encoding, up to the arguments order, but recursive types differ, since Scott encodes only the surface syntax of the data type definition.
In Scott any constituent can be of any type, whether the same as the type itself, or not
Church cons-pairs with the second field assumed to also be a cons-pair, treat it differently and perform folding on it:
Thus Scott encodings have direct access to every component of the data type, e.g. predecessor for numerals and list tail for lists. But Church encodings must rebuild any recursive component by clever folding processing which usually has a cost with regards to efficiency.
Concerning the practicality of using either the Church or Scott encoding for programming, there is a symmetric trade-off: [5] Church-encoded numerals support a constant-time addition operation and have no better than a linear-time predecessor operation; Scott-encoded numerals support a constant-time predecessor operation and have no better than a linear-time addition operation.
Church-encoded data and operations on them are typable in system F, as are Scott-encoded data and operations. However, the encoding is significantly more complicated. [6]
The type of the Scott encoding of the natural numbers is the positive recursive type:
Full recursive types are not part of System F, but positive recursive types are expressible in System F via the encoding:
Combining these two facts yields the System F type of the Scott encoding:
This can be contrasted with the type of the Church encoding:
The Church encoding is a second-order type, but the Scott encoding is fourth-order!