Recursive data type

Last updated

In computer programming languages, a recursive data type (also known as a recursively-defined, inductively-defined or inductive data type) is a data type for values that may contain other values of the same type. Data of recursive types are usually viewed as directed graphs.


An important application of recursion in computer science is in defining dynamic data structures such as Lists and Trees. Recursive data structures can dynamically grow to an arbitrarily large size in response to runtime requirements; in contrast, a static array's size requirements must be set at compile time.

Sometimes the term "inductive data type" is used for algebraic data types which are not necessarily recursive.


An example is the list type, in Haskell:


This indicates that a list of a's is either an empty list or a cons cell containing an 'a' (the "head" of the list) and another list (the "tail").

Another example is a similar singly linked type in Java:


This indicates that non-empty list of type E contains a data member of type E, and a reference to another List object for the rest of the list (or a null reference to indicate that this is the end of the list).

Mutually recursive data types

Data types can also be defined by mutual recursion. The most important basic example of this is a tree, which can be defined mutually recursively in terms of a forest (a list of trees). Symbolically:

f: [t[1], ..., t[k]] t: v f

A forest f consists of a list of trees, while a tree t consists of a pair of a value v and a forest f (its children). This definition is elegant and easy to work with abstractly (such as when proving theorems about properties of trees), as it expresses a tree in simple terms: a list of one type, and a pair of two types.

This mutually recursive definition can be converted to a singly recursive definition by inlining the definition of a forest:

t: v [t[1], ..., t[k]]

A tree t consists of a pair of a value v and a list of trees (its children). This definition is more compact, but somewhat messier: a tree consists of a pair of one type and a list another, which require disentangling to prove results about.

In Standard ML, the tree and forest data types can be mutually recursively defined as follows, allowing empty trees: [1]


In Haskell, the tree and forest data types can be defined similarly:



In type theory, a recursive type has the general form μα.T where the type variable α may appear in the type T and stands for the entire type itself.

For example, the natural numbers (see Peano arithmetic) may be defined by the Haskell datatype:


In type theory, we would say: where the two arms of the sum type represent the Zero and Succ data constructors. Zero takes no arguments (thus represented by the unit type) and Succ takes another Nat (thus another element of ).

There are two forms of recursive types: the so-called isorecursive types, and equirecursive types. The two forms differ in how terms of a recursive type are introduced and eliminated.

Isorecursive types

With isorecursive types, the recursive type and its expansion (or unrolling) (Where the notation indicates that all instances of Z are replaced with Y in X) are distinct (and disjoint) types with special term constructs, usually called roll and unroll, that form an isomorphism between them. To be precise: and , and these two are inverse functions.

Equirecursive types

Under equirecursive rules, a recursive type and its unrolling are equal -- that is, those two type expressions are understood to denote the same type. In fact, most theories of equirecursive types go further and essentially specify that any two type expressions with the same "infinite expansion" are equivalent. As a result of these rules, equirecursive types contribute significantly more complexity to a type system than isorecursive types do. Algorithmic problems such as type checking and type inference are more difficult for equirecursive types as well. Since direct comparison does not make sense on an equirecursive type, they can be converted into a canonical form in O(n log n) time, which can easily be compared. [2]

Equirecursive types capture the form of self-referential (or mutually referential) type definitions seen in procedural and object-oriented programming languages, and also arise in type-theoretic semantics of objects and classes. In functional programming languages, isorecursive types (in the guise of datatypes) are far more common.

In type synonyms

Recursion is not allowed in type synonyms in Miranda, OCaml (unless -rectypes flag is used or it's a record or variant), and Haskell; so for example the following Haskell types are illegal:


Instead, it must be wrapped inside an algebraic data type (even if it only has one constructor):


This is because type synonyms, like typedefs in C, are replaced with their definition at compile time. (Type synonyms are not "real" types; they are just "aliases" for convenience of the programmer.) But if this is attempted with a recursive type, it will loop infinitely because no matter how many times the alias is substituted, it still refers to itself, e.g. "Bad" will grow indefinitely: Bad(Int, Bad)(Int, (Int, Bad))... .

Another way to see it is that a level of indirection (the algebraic data type) is required to allow the isorecursive type system to figure out when to roll and unroll.

See also


  1. Harper 2000, "Data Types".
  2. "Numbering Matters: First-Order Canonical Forms for Second-Order Recursive Types". CiteSeerX .Cite journal requires |journal= (help)

This article is based on material taken from the Free On-line Dictionary of Computing prior to 1 November 2008 and incorporated under the "relicensing" terms of the GFDL, version 1.3 or later.

Related Research Articles

In mathematics and computer science, mutual recursion is a form of recursion where two mathematical or computational objects, such as functions or datatypes, are defined in terms of each other. Mutual recursion is very common in functional programming and in some problem domains, such as recursive descent parsers, where the datatypes are naturally mutually recursive.

ML is a general-purpose functional programming language. It is known for its use of the polymorphic Hindley–Milner type system, which automatically assigns the types of most expressions without requiring explicit type annotations, and ensures type safety – there is a formal proof that a well-typed ML program does not cause runtime type errors. ML provides pattern matching for function arguments, garbage collection, imperative programming, call-by-value and currying. It is used heavily in programming language research and is one of the few languages to be completely specified and verified using formal semantics. Its types and pattern matching make it well-suited and commonly used to operate on other formal languages, such as in compiler writing, automated theorem proving, and formal verification.

In mathematical logic and computer science, a general recursive function, partial recursive function, or μ-recursive function is a partial function from natural numbers to natural numbers that is "computable" in an intuitive sense. If the function is total, it is also called a total recursive function. In computability theory, it is shown that the μ-recursive functions are precisely the functions that can be computed by Turing machines. The μ-recursive functions are closely related to primitive recursive functions, and their inductive definition (below) builds upon that of the primitive recursive functions. However, not every total recursive function is a primitive recursive function—the most famous example is the Ackermann function.

In mathematics, fuzzy sets are somewhat like sets whose elements have degrees of membership. Fuzzy sets were introduced independently by Lotfi A. Zadeh and Dieter Klaua in 1965 as an extension of the classical notion of set. At the same time, Salii (1965) defined a more general kind of structure called an L-relation, which he studied in an abstract algebraic context. Fuzzy relations, which are now used throughout fuzzy mathematics and have applications in areas such as linguistics, decision-making, and clustering, are special cases of L-relations when L is the unit interval [0, 1].

Mu or my is the 12th letter of the Greek alphabet. In the system of Greek numerals it has a value of 40. Mu was derived from the Egyptian hieroglyphic symbol for water, which had been simplified by the Phoenicians and named after their word for water, to become 𐤌 (mem). Letters that derive from mu include the Roman M and the Cyrillic М.

In computer programming, especially functional programming and type theory, an algebraic data type is a kind of composite type, i.e., a type formed by combining other types.

System F, also known as the (Girard–Reynolds) polymorphic lambda calculus or the second-order lambda calculus, is a typed lambda calculus that differs from the simply typed lambda calculus by the introduction of a mechanism of universal quantification over types. System F thus formalizes the notion of parametric polymorphism in programming languages, and forms a theoretical basis for languages such as Haskell and ML. System F was discovered independently by logician Jean-Yves Girard (1972) and computer scientist John C. Reynolds (1974).

Mixing (mathematics)

In mathematics, mixing is an abstract concept originating from physics: the attempt to describe the irreversible thermodynamic process of mixing in the everyday world: mixing paint, mixing drinks, industrial mixing, etc.

In computer science, corecursion is a type of operation that is dual to recursion. Whereas recursion works analytically, starting on data further from a base case and breaking it down into smaller data and repeating until one reaches a base case, corecursion works synthetically, starting from a base case and building it up, iteratively producing data further removed from a base case. Put simply, corecursive algorithms use the data that they themselves produce, bit by bit, as they become available, and needed, to produce further bits of data. A similar but distinct concept is generative recursion which may lack a definite "direction" inherent in corecursion and recursion.

In mathematics, Grönwall's inequality allows one to bound a function that is known to satisfy a certain differential or integral inequality by the solution of the corresponding differential or integral equation. There are two forms of the lemma, a differential form and an integral form. For the latter there are several variants.


In mathematics, specifically in category theory, F-algebras generalize the notion of algebraic structure. Rewriting the algebraic laws in terms of morphisms eliminates all references to quantified elements from the axioms, and these algebraic laws may then be glued together in terms of a single functor F, the signature.

Recursion (computer science) Use of functions that call themselves

In computer science, recursion is a method of solving a problem where the solution depends on solutions to smaller instances of the same problem. Such problems can generally be solved by iteration, but this needs to identify and index the smaller instances at programming time. Recursion solves such recursive problems by using functions that call themselves from within their own code. The approach can be applied to many types of problems, and recursion is one of the central ideas of computer science.

The power of recursion evidently lies in the possibility of defining an infinite set of objects by a finite statement. In the same manner, an infinite number of computations can be described by a finite recursive program, even if this program contains no explicit repetitions.

In mathematics, an initial algebra is an initial object in the category of F-algebras for a given endofunctor F. This initiality provides a general framework for induction and recursion.

In category theory, the concept of catamorphism denotes the unique homomorphism from an initial algebra into some other algebra.

This article discusses how information theory is related to measure theory.

In functional programming, a generalized algebraic data type is a generalization of parametric algebraic data types.

In computer programming, an anamorphism is a function that generates a sequence by repeated application of the function to its previous result. You begin with some value A and apply a function f to it to get B. Then you apply f to B to get C, and so on until some terminating condition is reached. The anamorphism is the function that generates the list of A, B, C, etc. You can think of the anamorphism as unfolding the initial value into a sequence.

In type theory, a system has inductive types if it has facilities for creating a new type from constants and functions that create terms of that type. The feature serves a role similar to data structures in a programming language and allows a type theory to add concepts like numbers, relations, and trees. As the name suggests, inductive types can be self-referential, but usually only in a way that permits structural recursion.

In mathematics, the Butcher group, named after the New Zealand mathematician John C. Butcher by Hairer & Wanner (1974), is an infinite-dimensional Lie group first introduced in numerical analysis to study solutions of non-linear ordinary differential equations by the Runge–Kutta method. It arose from an algebraic formalism involving rooted trees that provides formal power series solutions of the differential equation modeling the flow of a vector field. It was Cayley (1857), prompted by the work of Sylvester on change of variables in differential calculus, who first noted that the derivatives of a composition of functions can be conveniently expressed in terms of rooted trees and their combinatorics.

In computer science, polymorphic recursion refers to a recursive parametrically polymorphic function where the type parameter changes with each recursive invocation made, instead of staying constant. Type inference for polymorphic recursion is equivalent to semi-unification and therefore undecidable and requires the use of a semi-algorithm or programmer supplied type annotations.