Generalized algebraic data type

Last updated December 24, 2024

In functional programming, a generalized algebraic data type (GADT, also first-class phantom type,^[1]guarded recursive datatype,^[2] or equality-qualified type^[3]) is a generalization of a parametric algebraic data type (ADT).

Overview

In a GADT, the product constructors (called data constructors in Haskell) can provide an explicit instantiation of the ADT as the type instantiation of their return value. This allows defining functions with a more advanced type behaviour. For a data constructor of Haskell 2010, the return value has the type instantiation implied by the instantiation of the ADT parameters at the constructor's application.

-- A parametric ADT that is not a GADTdataLista=Nil|Consa(Lista)integers::ListIntintegers=Cons12(Cons107Nil)strings::ListStringstrings=Cons"boat"(Cons"dock"Nil)-- A GADTdataExprawhereEBool::Bool->ExprBoolEInt::Int->ExprIntEEqual::ExprInt->ExprInt->ExprBooleval::Expra->aevale=caseeofEBoola->aEInta->aEEqualab->(evala)==(evalb)expr1::ExprBoolexpr1=EEqual(EInt2)(EInt3)ret=evalexpr1-- False

They are currently implemented in the Glasgow Haskell Compiler (GHC) as a non-standard extension, used by, among others, Pugs and Darcs. OCaml supports GADT natively since version 4.00.^[4]

The GHC implementation provides support for existentially quantified type parameters and for local constraints.

History

An early version of generalized algebraic data types were described by Augustsson & Petersson (1994) and based on pattern matching in ALF.

Generalized algebraic data types were introduced independently by Cheney & Hinze (2003) and prior by Xi, Chen & Chen (2003) as extensions to the algebraic data types of ML and Haskell.^[5] Both are essentially equivalent to each other. They are similar to the inductive families of data types (or inductive datatypes) found in Coq's Calculus of Inductive Constructions and other dependently typed languages, modulo the dependent types and except that the latter have an additional positivity restriction which is not enforced in GADTs.^[6]

Sulzmann, Wazny & Stuckey (2006) introduced extended algebraic data types which combine GADTs together with the existential data types and type class constraints.

Type inference in the absence of any programmer supplied type annotation, is undecidable ^[7] and functions defined over GADTs do not admit principal types in general.^[8] Type reconstruction requires several design trade-offs and is an area of active research (Peyton Jones, Washburn & Weirich 2004; Peyton Jones et al. 2006.

In spring 2021, Scala 3.0 was released.^[9] This major update of Scala introduced the possibility to write GADTs^[10] with the same syntax as algebraic data types, which is not the case in other programming languages according to Martin Odersky.^[11]

Applications

Applications of GADTs include generic programming, modelling programming languages (higher-order abstract syntax), maintaining invariants in data structures, expressing constraints in embedded domain-specific languages, and modelling objects.^[12]

Higher-order abstract syntax

An important application of GADTs is to embed higher-order abstract syntax in a type safe fashion. Here is an embedding of the simply typed lambda calculus with an arbitrary collection of base types, product types (tuples) and a fixed point combinator:

dataLam::*->*whereLift::a->Lama-- ^ lifted valuePair::Lama->Lamb->Lam(a,b)-- ^ productLam::(Lama->Lamb)->Lam(a->b)-- ^ lambda abstractionApp::Lam(a->b)->Lama->Lamb-- ^ function applicationFix::Lam(a->a)->Lama-- ^ fixed point

And a type safe evaluation function:

eval::Lamt->teval(Liftv)=veval(Pairlr)=(evall,evalr)eval(Lamf)=\x->eval(f(Liftx))eval(Appfx)=(evalf)(evalx)eval(Fixf)=(evalf)(eval(Fixf))

The factorial function can now be written as:

fact=Fix(Lam(\f->Lam(\y->Lift(ifevaly==0then1elseevaly*(evalf)(evaly-1)))))eval(fact)(10)

Problems would have occurred using regular algebraic data types. Dropping the type parameter would have made the lifted base types existentially quantified, making it impossible to write the evaluator. With a type parameter, it is still restricted to one base type. Further, ill-formed expressions such as App (Lam (\x -> Lam (\y -> App x y))) (Lift True) would have been possible to construct, while they are type incorrect using the GADT. A well-formed analogue is App (Lam (\x -> Lam (\y -> App x y))) (Lift (\z -> True)). This is because the type of x is Lam (a -> b), inferred from the type of the Lam data constructor.

Notes

↑ Cheney & Hinze 2003.
↑ Xi, Chen & Chen 2003.
↑ Sheard & Pasalic 2004.
↑ "OCaml 4.00.1". ocaml.org.
↑ Cheney & Hinze 2003, p. 25.
↑ Cheney & Hinze 2003, pp. 25–26.
↑ Peyton Jones, Washburn & Weirich 2004, p. 7.
↑ Schrijvers et al. 2009, p. 1.
↑ Kmetiuk, Anatolii. "Scala 3 Is Here!". scala-lang.org. École Polytechnique Fédérale Lausanne (EPFL) Lausanne, Switzerland. Retrieved 19 May 2021.
↑ "Scala 3 – Book Algebraic Data Types". scala-lang.org. École Polytechnique Fédérale Lausanne (EPFL) Lausanne, Switzerland. Retrieved 19 May 2021.
↑ Odersky, Martin. "A Tour of Scala 3 – Martin Odersky". youtube.com. Scala Days Conferences. Archived from the original on 2021-12-19. Retrieved 19 May 2021.
↑ Peyton Jones, Washburn & Weirich 2004, p. 3.

External links

Generalised Algebraic Datatype Page on the Haskell wiki
Generalised Algebraic Data Types in the GHC Users' Guide
Generalized Algebraic Data Types and Object-Oriented Programming
GADTs – Haskell Prime – Trac
Papers about type inference for GADTs, bibliography by Simon Peyton Jones
Type inference with constraints, bibliography by Simon Peyton Jones
Emulating GADTs in Java via the Yoneda lemma

Related Research Articles

Standard ML (SML) is a general-purpose, high-level, modular, functional programming language with compile-time type checking and type inference. It is popular for writing compilers, for programming language research, and for developing theorem provers.

Generic programming is a style of computer programming in which algorithms are written in terms of data types to-be-specified-later that are then instantiated when needed for specific types provided as parameters. This approach, pioneered in the programming language ML in 1973, permits writing common functions or data types that differ only in the set of types on which they operate when used, thus reducing duplicate code.

Type inference, sometimes called type reconstruction, refers to the automatic detection of the type of an expression in a formal language. These include programming languages and mathematical type systems, but also natural languages in some branches of computer science and linguistics.

In computer programming, especially functional programming and type theory, an algebraic data type (ADT) is a kind of composite data type, i.e., a data type formed by combining other types.

The Glasgow Haskell Compiler (GHC) is a native or machine code compiler for the functional programming language Haskell. It provides a cross-platform software environment for writing and testing Haskell code and supports many extensions, libraries, and optimisations that streamline the process of generating and executing code. GHC is the most commonly used Haskell compiler. It is free and open-source software released under a BSD license.

Hope is a small functional programming language developed in the 1970s at the University of Edinburgh. It predates Miranda and Haskell and is contemporaneous with ML, also developed at the University. Hope was derived from NPL, a simple functional language developed by Rod Burstall and John Darlington in their work on program transformation. NPL and Hope are notable for being the first languages with call-by-pattern evaluation and algebraic data types.

SIGPLAN is the Association for Computing Machinery's Special Interest Group (SIG) on programming languages. This SIG explores programming language concepts and tools, focusing on design, implementation, practice, and theory. Its members are programming language developers, educators, implementers, researchers, theoreticians, and users.

In computer programming languages, a recursive data type is a data type for values that may contain other values of the same type. Data of recursive types are usually viewed as directed graphs.

In programming languages and type theory, parametric polymorphism allows a single piece of code to be given a "generic" type, using variables in place of actual types, and then instantiated with particular types as needed. Parametrically polymorphic functions and data types are sometimes called generic functions and generic datatypes, respectively, and they form the basis of generic programming.

In computer science, a type class is a type system construct that supports ad hoc polymorphism. This is achieved by adding constraints to type variables in parametrically polymorphic types. Such a constraint typically involves a type class T and a type variable a, and means that a can only be instantiated to a type whose members support the overloaded operations associated with T.

Irvine Dataflow (Id) is a general-purpose parallel programming language, started at the University of California at Irvine in 1975 by Arvind and K. P. Gostelow. Arvind continued work with Id at MIT into the 1990s.

In the area of mathematical logic and computer science known as type theory, a kind is the type of a type constructor or, less commonly, the type of a higher-order type operator. A kind system is essentially a simply typed lambda calculus "one level up", endowed with a primitive type, denoted $and called "type", which is the kind of any data type which does not need any type parameters.$

Expression templates are a C++ template metaprogramming technique that builds structures representing a computation at compile time, where expressions are evaluated only as needed to produce efficient code for the entire computation. Expression templates thus allow programmers to bypass the normal order of evaluation of the C++ language and achieve optimizations such as loop fusion.

In computer science, polymorphic recursion refers to a recursive parametrically polymorphic function where the type parameter changes with each recursive invocation made, instead of staying constant. Type inference for polymorphic recursion is equivalent to semi-unification and therefore undecidable and requires the use of a semi-algorithm or programmer-supplied type annotations.

This article describes the features in the programming language Haskell.

Haskell is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research, and industrial applications, Haskell has pioneered several programming language features such as type classes, which enable type-safe operator overloading, and monadic input/output (IO). It is named after logician Haskell Curry. Haskell's main implementation is the Glasgow Haskell Compiler (GHC).

In computer science, a type family associates data types with other data types, using a type-level function defined by an open-ended collection of valid instances of input types and the corresponding output types.

PureScript is a strongly-typed, purely-functional programming language that transpiles to JavaScript, C++11, Erlang, and Go. It can be used to develop web applications, server side apps, and also desktop applications with use of Electron or via C++11 and Go compilers with suitable libraries. Its syntax is mostly comparable to that of Haskell. In addition, it introduces row polymorphism and extensible records. Also, contrary to Haskell, the PureScript language is defined as having a strict evaluation strategy, although there are non-conforming back-ends which implement a lazy evaluation strategy.

This article compares the syntax for defining and instantiating an algebraic data type (ADT), sometimes also referred to as a tagged union, in various programming languages.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[FOOTNOTECheneyHinze2003-1] Cheney & Hinze 2003.

[FOOTNOTEXiChenChen2003-2] Xi, Chen & Chen 2003.

[FOOTNOTESheardPasalic2004-3] Sheard & Pasalic 2004.

[4] "OCaml 4.00.1". ocaml.org.

[FOOTNOTECheneyHinze200325-5] Cheney & Hinze 2003, p. 25.

[FOOTNOTECheneyHinze200325–26-6] Cheney & Hinze 2003, pp. 25–26.

[FOOTNOTEPeyton_JonesWashburnWeirich20047-7] Peyton Jones, Washburn & Weirich 2004, p. 7.

[FOOTNOTESchrijversPeyton_JonesSulzmannVytiniotis20091-8] Schrijvers et al. 2009, p. 1.

[scala3-is-here-9] Kmetiuk, Anatolii. "Scala 3 Is Here!". scala-lang.org. École Polytechnique Fédérale Lausanne (EPFL) Lausanne, Switzerland. Retrieved 19 May 2021.

[10] "Scala 3 – Book Algebraic Data Types". scala-lang.org. École Polytechnique Fédérale Lausanne (EPFL) Lausanne, Switzerland. Retrieved 19 May 2021.

[11] Odersky, Martin. "A Tour of Scala 3 – Martin Odersky". youtube.com. Scala Days Conferences. Archived from the original on 2021-12-19. Retrieved 19 May 2021.

[FOOTNOTEPeyton_JonesWashburnWeirich20043-12] Peyton Jones, Washburn & Weirich 2004, p. 3.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

v t e Data types
Uninterpreted	Bit Byte Trit Tryte Word Bit array
Numeric	Arbitrary-precision or bignum Complex Decimal Fixed point Floating point Reduced precision Minifloat Half precision bfloat16 Single precision Double precision Quadruple precision Octuple precision Extended precision Long double Integer signedness Interval Rational
Pointer	Address physical virtual Reference
Text	Character String null-terminated
Composite	Algebraic data type generalized Array Associative array Class Dependent Equality Inductive Intersection List Object metaobject Option type Product Record or Struct Refinement Set Union tagged
Other	Boolean Bottom type Collection Enumerated type Exception Function type Opaque data type Recursive data type Semaphore Stream Strongly typed identifier Top type Type class Empty type Unit type Void
Related topics	Abstract data type Boxing Data structure Generic Kind metaclass Parametric polymorphism Primitive data type Interface Subtyping Type constructor Type conversion Type system Type theory Variable