Generalized algebraic data type

Last updated

In functional programming, a generalized algebraic data type (GADT, also first-class phantom type, [1] guarded recursive datatype, [2] or equality-qualified type [3] ) is a generalization of parametric algebraic data types.

Contents

Overview

In a GADT, the product constructors (called data constructors in Haskell) can provide an explicit instantiation of the ADT as the type instantiation of their return value. This allows defining functions with a more advanced type behaviour. For a data constructor of Haskell 2010, the return value has the type instantiation implied by the instantiation of the ADT parameters at the constructor's application.

-- A parametric ADT that is not a GADTdataLista=Nil|Consa(Lista)integers=Cons12(Cons107Nil)-- the type of integers is List Intstrings=Cons"boat"(Cons"dock"Nil)-- the type of strings is List String-- A GADTdataExprawhereEBool::Bool->ExprBoolEInt::Int->ExprIntEEqual::ExprInt->ExprInt->ExprBooleval::Expra->aevale=caseeofEBoola->aEInta->aEEqualab->(evala)==(evalb)expr1=EEqual(EInt2)(EInt3)-- the type of expr1 is Expr Boolret=evalexpr1-- ret is False

They are currently implemented in the GHC compiler as a non-standard extension, used by, among others, Pugs and Darcs. OCaml supports GADT natively since version 4.00. [4]

The GHC implementation provides support for existentially quantified type parameters and for local constraints.

History

An early version of generalized algebraic data types were described by Augustsson & Petersson (1994) and based on pattern matching in ALF.

Generalized algebraic data types were introduced independently by Cheney & Hinze (2003) and prior by Xi, Chen & Chen (2003) as extensions to ML's and Haskell's algebraic data types. [5] Both are essentially equivalent to each other. They are similar to the inductive families of data types (or inductive datatypes) found in Coq's Calculus of Inductive Constructions and other dependently typed languages, modulo the dependent types and except that the latter have an additional positivity restriction which is not enforced in GADTs. [6]

Sulzmann, Wazny & Stuckey (2006) introduced extended algebraic data types which combine GADTs together with the existential data types and type class constraints introduced by Perry (1991) , Läufer & Odersky (1994) and Läufer (1996) .

Type inference in the absence of any programmer supplied type annotations is undecidable [7] and functions defined over GADTs do not admit principal types in general. [8] Type reconstruction requires several design trade-offs and is an area of active research (Peyton Jones, Washburn & Weirich 2004; Peyton Jones et al. 2006; Pottier & Régis-Gianas 2006 ; Sulzmann, Schrijvers & Stuckey 2006; Simonet & Pottier 2007 ; Schrijvers et al. 2009; Lin & Sheard 2010a ; Lin & Sheard 2010b ; Vytiniotis, Peyton Jones & Schrijvers 2010 ; Vytiniotis et al. 2011 ).

In spring 2021, Scala 3.0 is released. [9] This major update of Scala introduce the possibility to write GADTs [10] with the same syntax as ADTs, which is not the case in other programming languages according to Martin Odersky. [11]

Applications

Applications of GADTs include generic programming, modelling programming languages (higher-order abstract syntax), maintaining invariants in data structures, expressing constraints in embedded domain-specific languages, and modelling objects. [12]

Higher-order abstract syntax

An important application of GADTs is to embed higher-order abstract syntax in a type safe fashion. Here is an embedding of the simply typed lambda calculus with an arbitrary collection of base types, tuples and a fixed point combinator:

dataLam::*->*whereLift::a->Lama-- ^ lifted valuePair::Lama->Lamb->Lam(a,b)-- ^ productLam::(Lama->Lamb)->Lam(a->b)-- ^ lambda abstractionApp::Lam(a->b)->Lama->Lamb-- ^ function applicationFix::Lam(a->a)->Lama-- ^ fixed point

And a type safe evaluation function:

eval::Lamt->teval(Liftv)=veval(Pairlr)=(evall,evalr)eval(Lamf)=\x->eval(f(Liftx))eval(Appfx)=(evalf)(evalx)eval(Fixf)=(evalf)(eval(Fixf))

The factorial function can now be written as:

fact=Fix(Lam(\f->Lam(\y->Lift(ifevaly==0then1elseevaly*(evalf)(evaly-1)))))eval(fact)(10)

We would have run into problems using regular algebraic data types. Dropping the type parameter would have made the lifted base types existentially quantified, making it impossible to write the evaluator. With a type parameter we would still be restricted to a single base type. Furthermore, ill-formed expressions such as App (Lam (\x -> Lam (\y -> App x y))) (Lift True) would have been possible to construct, while they are type incorrect using the GADT. A well-formed analogue is App (Lam (\x -> Lam (\y -> App x y))) (Lift (\z -> True)). This is because the type of x is Lam (a -> b), inferred from the type of the Lam data constructor.

See also

Notes

  1. Cheney & Hinze 2003.
  2. Xi, Chen & Chen 2003.
  3. Sheard & Pasalic 2004.
  4. "OCaml 4.00.1". ocaml.org.
  5. Cheney & Hinze 2003, p. 25.
  6. Cheney & Hinze 2003, pp. 25–26.
  7. Peyton Jones, Washburn & Weirich 2004, p. 7.
  8. Schrijvers et al. 2009, p. 1.
  9. Kmetiuk, Anatolii. "SCALA 3 IS HERE!🎉🎉🎉". scala-lang.org. École Polytechnique Fédérale Lausanne (EPFL) Lausanne, Switzerland. Retrieved 19 May 2021.
  10. "SCALA 3 — BOOK ALGEBRAIC DATA TYPES". scala-lang.org. École Polytechnique Fédérale Lausanne (EPFL) Lausanne, Switzerland. Retrieved 19 May 2021.
  11. Odersky, Martin. "A Tour of Scala 3 - Martin Odersky". youtube.com. Scala Days Conferences. Retrieved 19 May 2021.
  12. Peyton Jones, Washburn & Weirich 2004, p. 3.

Further reading

Applications
Semantics
Type reconstruction
Other

Related Research Articles

In computer science, functional programming is a programming paradigm where programs are constructed by applying and composing functions. It is a declarative programming paradigm in which function definitions are trees of expressions that map values to other values, rather than a sequence of imperative statements which update the running state of the program.

Standard ML (SML) is a general-purpose modular functional programming language with compile-time type checking and type inference. It is popular among compiler writers and programming language researchers, as well as in the development of theorem provers.

Generic programming is a style of computer programming in which algorithms are written in terms of types to-be-specified-later that are then instantiated when needed for specific types provided as parameters. This approach, pioneered by the ML programming language in 1973, permits writing common functions or types that differ only in the set of types on which they operate when used, thus reducing duplication. Such software entities are known as generics in Ada, C#, Delphi, Eiffel, F#, Java, Nim, Python, Rust, Swift, TypeScript and Visual Basic .NET. They are known as parametric polymorphism in ML, Scala, Julia, and Haskell ; templates in C++ and D; and parameterized types in the influential 1994 book Design Patterns.

In computer programming, especially functional programming and type theory, an algebraic data type is a kind of composite type, i.e., a type formed by combining other types.

In computer science, a tagged union, also called a variant, variant record, choice type, discriminated union, disjoint union, sum type or coproduct, is a data structure used to hold a value that could take on several different, but fixed, types. Only one of the types can be in use at any one time, and a tag field explicitly indicates which one is in use. It can be thought of as a type that has several "cases", each of which should be handled correctly when that type is manipulated. This is critical in defining recursive datatypes, in which some component of a value may have the same type as the value itself, for example in defining a type for representing trees, where it is necessary to distinguish multi-node subtrees and leaves. Like ordinary unions, tagged unions can save storage by overlapping storage areas for each type, since only one is in use at a time.

The Glasgow Haskell Compiler (GHC) is an open-source native code compiler for the functional programming language Haskell. It provides a cross-platform environment for the writing and testing of Haskell code and it supports numerous extensions, libraries, and optimisations that streamline the process of generating and executing code. GHC is the most commonly used Haskell compiler. The lead developers are Simon Peyton Jones and Simon Marlow.

Hope is a small functional programming language developed in the 1970s at the University of Edinburgh. It predates Miranda and Haskell and is contemporaneous with ML, also developed at the University. Hope was derived from NPL, a simple functional language developed by Rod Burstall and John Darlington in their work on program transformation. NPL and Hope are notable for being the first languages with call-by-pattern evaluation and algebraic data types.

Many programming language type systems support subtyping. For instance, if the type Cat is a subtype of Animal, then an expression of type Cat should be substitutable wherever an expression of type Animal is used.

SIGPLAN is the Association for Computing Machinery's Special Interest Group on programming languages.

In computer programming languages, a recursive data type is a data type for values that may contain other values of the same type. Data of recursive types are usually viewed as directed graphs.

Scala (programming language) General-purpose programming language

Scala is a strong statically typed general-purpose programming language which supports both object-oriented programming and functional programming. Designed to be concise, many of Scala's design decisions are aimed to address criticisms of Java.

In programming languages and type theory, parametric polymorphism is a way to make a language more expressive, while still maintaining full static type-safety. Using parametric polymorphism, a function or a data type can be written generically so that it can handle values identically without depending on their type. Such functions and data types are called generic functions and generic datatypes respectively and form the basis of generic programming.

In computer science, a type class is a type system construct that supports ad hoc polymorphism. This is achieved by adding constraints to type variables in parametrically polymorphic types. Such a constraint typically involves a type class T and a type variable a, and means that a can only be instantiated to a type whose members support the overloaded operations associated with T.

Haxe is an open source high-level cross-platform programming language and compiler that can produce applications and source code, for many different computing platforms from one code-base. It is free and open-source software, released under the MIT License. The compiler, written in OCaml, is released under the GNU General Public License (GPL) version 2.

The expression problem is a term used in discussing strengths and weaknesses of various programming paradigms and programming languages.

In computer science, polymorphic recursion refers to a recursive parametrically polymorphic function where the type parameter changes with each recursive invocation made, instead of staying constant. Type inference for polymorphic recursion is equivalent to semi-unification and therefore undecidable and requires the use of a semi-algorithm or programmer supplied type annotations.

Haskell is a general-purpose, statically typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research and industrial application, Haskell has pioneered a number of advanced programming language features such as type classes, which enable type-safe operator overloading. Haskell's main implementation is the Glasgow Haskell Compiler (GHC). It is named after logician Haskell Curry.

In computer science, a type family associates data types with other data types, using a type-level function defined by an open-ended collection of valid instances of input types and the corresponding output types.

Flix is a functional, imperative, and logic programming language developed at Aarhus University, with funding from the Independent Research Fund Denmark, and by a community of open source contributors. The Flix language supports algebraic data types, pattern matching, parametric polymorphism, currying, higher-order functions, extensible records, channel and process-based concurrency, and tail call elimination. Two notable features of Flix are its type and effect system and its support for first-class Datalog constraints.