Recursive data type

Last updated January 10, 2024

In computer programming languages, a recursive data type (also known as a recursively-defined, inductively-defined or inductive data type) is a data type for values that may contain other values of the same type. Data of recursive types are usually viewed as directed graphs ^{[ citation needed ]}.

An important application of recursion in computer science is in defining dynamic data structures such as Lists and Trees. Recursive data structures can dynamically grow to an arbitrarily large size in response to runtime requirements; in contrast, a static array's size requirements must be set at compile time.

Sometimes the term "inductive data type" is used for algebraic data types which are not necessarily recursive.

Example

An example is the list type, in Haskell:

dataLista=Nil|Consa(Lista)

This indicates that a list of a's is either an empty list or a cons cell containing an 'a' (the "head" of the list) and another list (the "tail").

Another example is a similar singly linked type in Java:

classList<E>{Evalue;List<E>next;}

This indicates that non-empty list of type E contains a data member of type E, and a reference to another List object for the rest of the list (or a null reference to indicate that this is the end of the list).

Mutually recursive data types

Data types can also be defined by mutual recursion. The most important basic example of this is a tree, which can be defined mutually recursively in terms of a forest (a list of trees). Symbolically:

f: [t[1], ..., t[k]] t: v f

A forest f consists of a list of trees, while a tree t consists of a pair of a value v and a forest f (its children). This definition is elegant and easy to work with abstractly (such as when proving theorems about properties of trees), as it expresses a tree in simple terms: a list of one type, and a pair of two types.

This mutually recursive definition can be converted to a singly recursive definition by inlining the definition of a forest:

t: v [t[1], ..., t[k]]

A tree t consists of a pair of a value v and a list of trees (its children). This definition is more compact, but somewhat messier: a tree consists of a pair of one type and a list another, which require disentangling to prove results about.

In Standard ML, the tree and forest data types can be mutually recursively defined as follows, allowing empty trees:^[1]

datatype'atree=Empty|Nodeof'a*'aforestand'aforest=Nil|Consof'atree*'aforest

In Haskell, the tree and forest data types can be defined similarly:

dataTreea=Empty|Node(a,Foresta)dataForesta=Nil|Cons(Treea)(Foresta)

Theory

In type theory, a recursive type has the general form μα.T where the type variable α may appear in the type T and stands for the entire type itself.

For example, the natural numbers (see Peano arithmetic) may be defined by the Haskell datatype:

dataNat=Zero|SuccNat

In type theory, we would say: $nat=\mu \alpha .1+\alpha$ where the two arms of the sum type represent the Zero and Succ data constructors. Zero takes no arguments (thus represented by the unit type) and Succ takes another Nat (thus another element of $\mu \alpha .1+\alpha$ ).

There are two forms of recursive types: the so-called isorecursive types, and equirecursive types. The two forms differ in how terms of a recursive type are introduced and eliminated.

Isorecursive types

With isorecursive types, the recursive type $\mu \alpha .T$ and its expansion (or unrolling) $T[\mu \alpha .T/\alpha ]$ (where the notation $X[Y/Z]$ indicates that all instances of Z are replaced with Y in X) are distinct (and disjoint) types with special term constructs, usually called roll and unroll, that form an isomorphism between them. To be precise: $roll:T[\mu \alpha .T/\alpha ]\to \mu \alpha .T$ and $unroll:\mu \alpha .T\to T[\mu \alpha .T/\alpha ]$ , and these two are inverse functions.

Equirecursive types

Under equirecursive rules, a recursive type $\mu \alpha .T$ and its unrolling $T[\mu \alpha .T/\alpha ]$ are equal – that is, those two type expressions are understood to denote the same type. In fact, most theories of equirecursive types go further and essentially specify that any two type expressions with the same "infinite expansion" are equivalent. As a result of these rules, equirecursive types contribute significantly more complexity to a type system than isorecursive types do. Algorithmic problems such as type checking and type inference are more difficult for equirecursive types as well. Since direct comparison does not make sense on an equirecursive type, they can be converted into a canonical form in O(n log n) time, which can easily be compared.^[2]

Isorecursive types capture the form of self-referential (or mutually referential) type definitions seen in nominal object-oriented programming languages, and also arise in type-theoretic semantics of objects and classes. In functional programming languages, isorecursive types (in the guise of datatypes) are common too.^[3]

Recursive type synonyms

In TypeScript, recursion is allowed in type aliases.^[4] Thus, the following example is allowed.

typeTree=number|Tree[];lettree:Tree=[1,[2,3]];

However, recursion is not allowed in type synonyms in Miranda, OCaml (unless -rectypes flag is used or it's a record or variant), or Haskell; so, for example the following Haskell types are illegal:

typeBad=(Int,Bad)typeEvil=Bool->Evil

Instead, they must be wrapped inside an algebraic data type (even if they only has one constructor):

dataGood=PairIntGooddataFine=Fun(Bool->Fine)

This is because type synonyms, like typedefs in C, are replaced with their definition at compile time. (Type synonyms are not "real" types; they are just "aliases" for convenience of the programmer.) But if this is attempted with a recursive type, it will loop infinitely because no matter how many times the alias is substituted, it still refers to itself, e.g. "Bad" will grow indefinitely: Bad → (Int, Bad) → (Int, (Int, Bad)) → ... .

Another way to see it is that a level of indirection (the algebraic data type) is required to allow the isorecursive type system to figure out when to roll and unroll.

Related Research Articles

In mathematics and computer science, mutual recursion is a form of recursion where two mathematical or computational objects, such as functions or datatypes, are defined in terms of each other. Mutual recursion is very common in functional programming and in some problem domains, such as recursive descent parsers, where the datatypes are naturally mutually recursive.

In mathematics, fuzzy sets are sets whose elements have degrees of membership. Fuzzy sets were introduced independently by Lotfi A. Zadeh in 1965 as an extension of the classical notion of set. At the same time, Salii (1965) defined a more general kind of structure called an L-relation, which he studied in an abstract algebraic context. Fuzzy relations, which are now used throughout fuzzy mathematics and have applications in areas such as linguistics, decision-making, and clustering, are special cases of L-relations when L is the unit interval [0, 1].

Mu is the twelfth letter of the Greek alphabet, representing the voiced bilabial nasal IPA:[m]. In the system of Greek numerals it has a value of 40. Mu was derived from the Egyptian hieroglyphic symbol for water, which had been simplified by the Phoenicians and named after their word for water, to become 𐤌 (mem). Letters that derive from mu include the Roman M and the Cyrillic М, though the lowercase resembles a small Latin U (μ).

In computer programming, especially functional programming and type theory, an algebraic data type (ADT) is a kind of composite type, i.e., a type formed by combining other types.

System F is a typed lambda calculus that introduces, to simply typed lambda calculus, a mechanism of universal quantification over types. System F formalizes parametric polymorphism in programming languages, thus forming a theoretical basis for languages such as Haskell and ML. It was discovered independently by logician Jean-Yves Girard (1972) and computer scientist John C. Reynolds.

In mathematics, mixing is an abstract concept originating from physics: the attempt to describe the irreversible thermodynamic process of mixing in the everyday world: e.g. mixing paint, mixing drinks, industrial mixing.

In computer science, corecursion is a type of operation that is dual to recursion. Whereas recursion works analytically, starting on data further from a base case and breaking it down into smaller data and repeating until one reaches a base case, corecursion works synthetically, starting from a base case and building it up, iteratively producing data further removed from a base case. Put simply, corecursive algorithms use the data that they themselves produce, bit by bit, as they become available, and needed, to produce further bits of data. A similar but distinct concept is generative recursion which may lack a definite "direction" inherent in corecursion and recursion.

In mathematics, specifically in category theory, F-algebras generalize the notion of algebraic structure. Rewriting the algebraic laws in terms of morphisms eliminates all references to quantified elements from the axioms, and these algebraic laws may then be glued together in terms of a single functor F, the signature.

In programming languages and type theory, parametric polymorphism allows a single piece of code to be given a "generic" type, using variables in place of actual types, and then instantiated with particular types as needed. Parametrically polymorphic functions and data types are sometimes called generic functions and generic datatypes, respectively, and they form the basis of generic programming.

<span class="mw-page-title-main">Recursion (computer science)</span> Use of functions that call themselves

In computer science, recursion is a method of solving a computational problem where the solution depends on solutions to smaller instances of the same problem. Recursion solves such recursive problems by using functions that call themselves from within their own code. The approach can be applied to many types of problems, and recursion is one of the central ideas of computer science.

The power of recursion evidently lies in the possibility of defining an infinite set of objects by a finite statement. In the same manner, an infinite number of computations can be described by a finite recursive program, even if this program contains no explicit repetitions.

In mathematics, an initial algebra is an initial object in the category of $F$ -algebras for a given endofunctor $F$ . This initiality provides a general framework for induction and recursion.

In category theory, the concept of catamorphism denotes the unique homomorphism from an initial algebra into some other algebra.

In functional programming, a generalized algebraic data type is a generalization of parametric algebraic data types.

In computer programming, an anamorphism is a function that generates a sequence by repeated application of the function to its previous result. You begin with some value A and apply a function f to it to get B. Then you apply f to B to get C, and so on until some terminating condition is reached. The anamorphism is the function that generates the list of A, B, C, etc. You can think of the anamorphism as unfolding the initial value into a sequence.

In recursion theory, α recursion theory is a generalisation of recursion theory to subsets of admissible ordinals $. An admissible set is closed under functions, where denotes a rank of Godel's constructible hierarchy. is an admissible ordinal if is a model of Kripke-Platek set theory. In what follows is considered to be fixed.$

In mathematics, the Butcher group, named after the New Zealand mathematician John C. Butcher by Hairer & Wanner (1974), is an infinite-dimensional Lie group first introduced in numerical analysis to study solutions of non-linear ordinary differential equations by the Runge–Kutta method. It arose from an algebraic formalism involving rooted trees that provides formal power series solutions of the differential equation modeling the flow of a vector field. It was Cayley (1857), prompted by the work of Sylvester on change of variables in differential calculus, who first noted that the derivatives of a composition of functions can be conveniently expressed in terms of rooted trees and their combinatorics.

In computer science, polymorphic recursion refers to a recursive parametrically polymorphic function where the type parameter changes with each recursive invocation made, instead of staying constant. Type inference for polymorphic recursion is equivalent to semi-unification and therefore undecidable and requires the use of a semi-algorithm or programmer-supplied type annotations.

In computability theory, computational complexity theory and proof theory, the Hardy hierarchy, named after G. H. Hardy, is a hierarchy of sets of numerical functions generated from an ordinal-indexed family of functions h_α: N → N called Hardy functions. It is related to the fast-growing hierarchy and slow-growing hierarchy.

In set theory and logic, Buchholz's ID hierarchy is a hierarchy of subsystems of first-order arithmetic. The systems/theories $are referred to as "the formal theories of ν-times iterated inductive definitions". ID ν extends PA by ν iterated least fixed points of monotone operators.$

In theoretical physics, more specifically in quantum field theory and supersymmetry, supersymmetric Yang–Mills, also known as super Yang–Mills and abbreviated to SYM, is a supersymmetric generalization of Yang–Mills theory, which is a gauge theory that plays an important part in the mathematical formulation of forces in particle physics.

References

↑ Harper 1998.
↑ "Numbering Matters: First-Order Canonical Forms for Second-Order Recursive Types". CiteSeerX 10.1.1.4.2276 .
↑ Revisiting iso-recursive subtyping | Proceedings of the ACM on Programming Languages
↑ (More) Recursive Type Aliases - Announcing TypeScript 3.7 - TypeScript

Sources

Harper, Robert (1998), Datatype Declarations, archived from the original on 1999-10-01

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[FOOTNOTEHarper1998-1] Harper 1998.

[canonicalize-2] "Numbering Matters: First-Order Canonical Forms for Second-Order Recursive Types". CiteSeerX 10.1.1.4.2276 .

[3] Revisiting iso-recursive subtyping | Proceedings of the ACM on Programming Languages

[4] (More) Recursive Type Aliases - Announcing TypeScript 3.7 - TypeScript

[1]

[2]

[3]

[4]

v t e Data types
Uninterpreted	Bit Byte Trit Tryte Word Bit array
Numeric	Arbitrary-precision or bignum Complex Decimal Fixed point Floating point Reduced precision Minifloat Half precision bfloat16 Single precision Double precision Quadruple precision Octuple precision Extended precision Long double Integer signedness Interval Rational
Pointer	Address physical virtual Reference
Text	Character String null-terminated
Composite	Algebraic data type generalized Array Associative array Class Dependent Equality Inductive Intersection List Object metaobject Option type Product Record or Struct Refinement Set Union tagged
Other	Boolean Bottom type Collection Enumerated type Exception Function type Opaque data type Recursive data type Semaphore Stream Strongly typed identifier Top type Type class Empty type Unit type Void
Related topics	Abstract data type Boxing Data structure Generic Kind metaclass Parametric polymorphism Primitive data type Interface Subtyping Type constructor Type conversion Type system Type theory Variable