Generalized algebraic data type

Last updated

In functional programming, a generalized algebraic data type (GADT, also first-class phantom type, [1] guarded recursive datatype, [2] or equality-qualified type [3] ) is a generalization of parametric algebraic data types.

Contents

Overview

In a GADT, the product constructors (called data constructors in Haskell) can provide an explicit instantiation of the ADT as the type instantiation of their return value. This allows defining functions with a more advanced type behaviour. For a data constructor of Haskell 2010, the return value has the type instantiation implied by the instantiation of the ADT parameters at the constructor's application.

-- A parametric ADT that is not a GADTdataLista=Nil|Consa(Lista)integers::ListIntintegers=Cons12(Cons107Nil)strings::ListStringstrings=Cons"boat"(Cons"dock"Nil)-- A GADTdataExprawhereEBool::Bool->ExprBoolEInt::Int->ExprIntEEqual::ExprInt->ExprInt->ExprBooleval::Expra->aevale=caseeofEBoola->aEInta->aEEqualab->(evala)==(evalb)expr1::ExprBoolexpr1=EEqual(EInt2)(EInt3)ret=evalexpr1-- False

They are currently implemented in the GHC compiler as a non-standard extension, used by, among others, Pugs and Darcs. OCaml supports GADT natively since version 4.00. [4]

The GHC implementation provides support for existentially quantified type parameters and for local constraints.

History

An early version of generalized algebraic data types were described by Augustsson & Petersson (1994) and based on pattern matching in ALF.

Generalized algebraic data types were introduced independently by Cheney & Hinze (2003) and prior by Xi, Chen & Chen (2003) as extensions to ML's and Haskell's algebraic data types. [5] Both are essentially equivalent to each other. They are similar to the inductive families of data types (or inductive datatypes) found in Coq's Calculus of Inductive Constructions and other dependently typed languages, modulo the dependent types and except that the latter have an additional positivity restriction which is not enforced in GADTs. [6]

Sulzmann, Wazny & Stuckey (2006) introduced extended algebraic data types which combine GADTs together with the existential data types and type class constraints.

Type inference in the absence of any programmer supplied type annotations is undecidable [7] and functions defined over GADTs do not admit principal types in general. [8] Type reconstruction requires several design trade-offs and is an area of active research (Peyton Jones, Washburn & Weirich 2004; Peyton Jones et al. 2006.

In spring 2021, Scala 3.0 is released. [9] This major update of Scala introduce the possibility to write GADTs [10] with the same syntax as ADTs, which is not the case in other programming languages according to Martin Odersky. [11]

Applications

Applications of GADTs include generic programming, modelling programming languages (higher-order abstract syntax), maintaining invariants in data structures, expressing constraints in embedded domain-specific languages, and modelling objects. [12]

Higher-order abstract syntax

An important application of GADTs is to embed higher-order abstract syntax in a type safe fashion. Here is an embedding of the simply typed lambda calculus with an arbitrary collection of base types, tuples and a fixed point combinator:

dataLam::*->*whereLift::a->Lama-- ^ lifted valuePair::Lama->Lamb->Lam(a,b)-- ^ productLam::(Lama->Lamb)->Lam(a->b)-- ^ lambda abstractionApp::Lam(a->b)->Lama->Lamb-- ^ function applicationFix::Lam(a->a)->Lama-- ^ fixed point

And a type safe evaluation function:

eval::Lamt->teval(Liftv)=veval(Pairlr)=(evall,evalr)eval(Lamf)=\x->eval(f(Liftx))eval(Appfx)=(evalf)(evalx)eval(Fixf)=(evalf)(eval(Fixf))

The factorial function can now be written as:

fact=Fix(Lam(\f->Lam(\y->Lift(ifevaly==0then1elseevaly*(evalf)(evaly-1)))))eval(fact)(10)

We would have run into problems using regular algebraic data types. Dropping the type parameter would have made the lifted base types existentially quantified, making it impossible to write the evaluator. With a type parameter we would still be restricted to a single base type. Furthermore, ill-formed expressions such as App (Lam (\x -> Lam (\y -> App x y))) (Lift True) would have been possible to construct, while they are type incorrect using the GADT. A well-formed analogue is App (Lam (\x -> Lam (\y -> App x y))) (Lift (\z -> True)). This is because the type of x is Lam (a -> b), inferred from the type of the Lam data constructor.

See also

Notes

  1. Cheney & Hinze 2003.
  2. Xi, Chen & Chen 2003.
  3. Sheard & Pasalic 2004.
  4. "OCaml 4.00.1". ocaml.org.
  5. Cheney & Hinze 2003, p. 25.
  6. Cheney & Hinze 2003, pp. 25–26.
  7. Peyton Jones, Washburn & Weirich 2004, p. 7.
  8. Schrijvers et al. 2009, p. 1.
  9. Kmetiuk, Anatolii. "SCALA 3 IS HERE!🎉🎉🎉". scala-lang.org. École Polytechnique Fédérale Lausanne (EPFL) Lausanne, Switzerland. Retrieved 19 May 2021.
  10. "SCALA 3 — BOOK ALGEBRAIC DATA TYPES". scala-lang.org. École Polytechnique Fédérale Lausanne (EPFL) Lausanne, Switzerland. Retrieved 19 May 2021.
  11. Odersky, Martin. "A Tour of Scala 3 - Martin Odersky". youtube.com. Scala Days Conferences. Archived from the original on 2021-12-19. Retrieved 19 May 2021.
  12. Peyton Jones, Washburn & Weirich 2004, p. 3.

Further reading

Applications
Semantics
Type reconstruction
Other

Related Research Articles

In computer science, functional programming is a programming paradigm where programs are constructed by applying and composing functions. It is a declarative programming paradigm in which function definitions are trees of expressions that map values to other values, rather than a sequence of imperative statements which update the running state of the program.

Standard ML (SML) is a general-purpose, modular, functional programming language with compile-time type checking and type inference. It is popular among compiler writers and programming language researchers, as well as in the development of theorem provers.

Generic programming is a style of computer programming in which algorithms are written in terms of data types to-be-specified-later that are then instantiated when needed for specific types provided as parameters. This approach, pioneered by the ML programming language in 1973, permits writing common functions or types that differ only in the set of types on which they operate when used, thus reducing duplicate code.

In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually has to be exact: "either it will or will not be a match." The patterns generally have the form of either sequences or tree structures. Uses of pattern matching include outputting the locations of a pattern within a token sequence, to output some component of the matched pattern, and to substitute the matching pattern with some other token sequence.

In computer programming, especially functional programming and type theory, an algebraic data type (ADT) is a kind of composite type, i.e., a type formed by combining other types.

The Glasgow Haskell Compiler (GHC) is a native or machine code compiler for the functional programming language Haskell. It provides a cross-platform software environment for writing and testing Haskell code and supports many extensions, libraries, and optimisations that streamline the process of generating and executing code. GHC is the most commonly used Haskell compiler. It is free and open-source software released under a BSD license. The lead developers are Simon Peyton Jones and Simon Marlow.

Hope is a small functional programming language developed in the 1970s at the University of Edinburgh. It predates Miranda and Haskell and is contemporaneous with ML, also developed at the University. Hope was derived from NPL, a simple functional language developed by Rod Burstall and John Darlington in their work on program transformation. NPL and Hope are notable for being the first languages with call-by-pattern evaluation and algebraic data types.

In computer programming languages, a recursive data type is a data type for values that may contain other values of the same type. Data of recursive types are usually viewed as directed graphs.

In programming languages and type theory, parametric polymorphism allows a single piece of code to be given a "generic" type, using variables in place of actual types, and then instantiated with particular types as needed. Parametrically polymorphic functions and data types are sometimes called generic functions and generic datatypes, respectively, and they form the basis of generic programming.

In computer science, a type class is a type system construct that supports ad hoc polymorphism. This is achieved by adding constraints to type variables in parametrically polymorphic types. Such a constraint typically involves a type class T and a type variable a, and means that a can only be instantiated to a type whose members support the overloaded operations associated with T.

Irvine Dataflow (Id) is a general-purpose parallel programming language, started at the University of California at Irvine in 1975 by Arvind and K. P. Gostelow. Arvind continued work with Id at MIT into the 1990s.

In the area of mathematical logic and computer science known as type theory, a kind is the type of a type constructor or, less commonly, the type of a higher-order type operator. A kind system is essentially a simply typed lambda calculus "one level up", endowed with a primitive type, denoted and called "type", which is the kind of any data type which does not need any type parameters.

Expression templates are a C++ template metaprogramming technique that builds structures representing a computation at compile time, where expressions are evaluated only as needed to produce efficient code for the entire computation. Expression templates thus allow programmers to bypass the normal order of evaluation of the C++ language and achieve optimizations such as loop fusion.

In computer science, polymorphic recursion refers to a recursive parametrically polymorphic function where the type parameter changes with each recursive invocation made, instead of staying constant. Type inference for polymorphic recursion is equivalent to semi-unification and therefore undecidable and requires the use of a semi-algorithm or programmer-supplied type annotations.

This article describes the features in the programming language Haskell.

Haskell is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research, and industrial applications, Haskell has pioneered a number of programming language features such as type classes, which enable type-safe operator overloading, and monadic input/output (IO). It is named after logician Haskell Curry. Haskell's main implementation is the Glasgow Haskell Compiler (GHC).

In computer science, a type family associates data types with other data types, using a type-level function defined by an open-ended collection of valid instances of input types and the corresponding output types.

The table shows a comparison of functional programming languages which compares various features and designs of different functional programming languages.

<span class="mw-page-title-main">PureScript</span> Strongly-typed language that compiles to JavaScript

PureScript is a strongly-typed, purely-functional programming language that transpiles to JavaScript, C++11, Erlang, and Go. It can be used to develop web applications, server side apps, and also desktop applications with use of Electron or via C++11 and Go compilers with suitable libraries. Its syntax is mostly comparable to that of Haskell. In addition, it introduces row polymorphism and extensible records. Also, contrary to Haskell, the PureScript language is defined as having a strict evaluation strategy, although there are non-conforming back ends which implement a lazy evaluation strategy.