Parametric polymorphism

Last updated

In programming languages and type theory, parametric polymorphism is a way to make a language more expressive, while still maintaining full static type-safety. Using parametric polymorphism, a function or a data type can be written generically so that it can handle values identically without depending on their type. [1] Such functions and data types are called generic functions and generic datatypes respectively and form the basis of generic programming.

Contents

For example, a function append that joins two lists can be constructed so that it does not care about the type of elements: it can append lists of integers, lists of real numbers, lists of strings, and so on. Let the type variable a denote the type of elements in the lists. Then append can be typed

forall a. [a] × [a] -> [a]

where [a] denotes the type of lists with elements of type a. We say that the type of append is parameterized by a for all values of a. (Note that since there is only one type variable, the function cannot be applied to just any pair of lists: the pair, as well as the result list, must consist of the same type of elements.) For each place where append is applied, a value is decided for a.

Following Christopher Strachey, [2] parametric polymorphism may be contrasted with ad hoc polymorphism, in which a single polymorphic function can have a number of distinct and potentially heterogeneous implementations depending on the type of argument(s) to which it is applied. Thus, ad hoc polymorphism can generally only support a limited number of such distinct types, since a separate implementation has to be provided for each type.

History

Parametric polymorphism was first introduced to programming languages in ML in 1975. [3] Today it exists in Standard ML, OCaml, F#, Ada, Haskell, Mercury, Visual Prolog, Scala, Julia, Python, TypeScript, C++ and others. Java, C#, Visual Basic .NET and Delphi have each introduced "generics" for parametric polymorphism. Some implementations of type polymorphism are superficially similar to parametric polymorphism while also introducing ad hoc aspects. One example is C++ template specialization.

The most general form of polymorphism is "higher-rank impredicative polymorphism". Two popular restrictions of this form are restricted rank polymorphism (for example, rank-1 or prenex polymorphism) and predicative polymorphism. Together, these restrictions give "predicative prenex polymorphism", which is essentially the form of polymorphism found in ML and early versions of Haskell.

Higher-ranked polymorphism

Rank-1 (prenex) polymorphism

In a prenex polymorphic system, type variables may not be instantiated with polymorphic types. [4] This is very similar to what is called "ML-style" or "Let-polymorphism" (technically ML's Let-polymorphism has a few other syntactic restrictions). This restriction makes the distinction between polymorphic and non-polymorphic types very important; thus in predicative systems polymorphic types are sometimes referred to as type schemas to distinguish them from ordinary (monomorphic) types, which are sometimes called monotypes. A consequence is that all types can be written in a form that places all quantifiers at the outermost (prenex) position. For example, consider the append function described above, which has type

forall a. [a] × [a] -> [a]

In order to apply this function to a pair of lists, a type must be substituted for the variable a in the type of the function such that the type of the arguments matches up with the resulting function type. In an impredicative system, the type being substituted may be any type whatsoever, including a type that is itself polymorphic; thus append can be applied to pairs of lists with elements of any type—even to lists of polymorphic functions such as append itself. Polymorphism in the language ML is predicative.[ citation needed ] This is because predicativity, together with other restrictions, makes the type system simple enough that full type inference is always possible.

As a practical example, OCaml (a descendant or dialect of ML) performs type inference and supports impredicative polymorphism, but in some cases when impredicative polymorphism is used, the system's type inference is incomplete unless some explicit type annotations are provided by the programmer.

Rank-k polymorphism

For some fixed value k, rank-k polymorphism is a system in which a quantifier may not appear to the left of k or more arrows (when the type is drawn as a tree). [1]

Type inference for rank-2 polymorphism is decidable, but reconstruction for rank-3 and above is not. [5]

Rank-n ("higher-rank") polymorphism

Rank-n polymorphism is polymorphism in which quantifiers may appear to the left of arbitrarily many arrows.

Predicativity and impredicativity

Predicative polymorphism

In a predicative parametric polymorphic system, a type containing a type variable may not be used in such a way that is instantiated to a polymorphic type. Predicative type theories include Martin-Löf type theory and NuPRL.

Impredicative polymorphism

Impredicative polymorphism (also called first-class polymorphism) is the most powerful form of parametric polymorphism. [6] A definition is said to be impredicative if it is self-referential; in type theory this allows the instantiation of a variable in a type with any type, including polymorphic types, such as itself. An example of this is the System F with the type variable X in the type , where X could even refer to T itself.

In type theory, the most frequently studied impredicative typed λ-calculi are based on those of the lambda cube, especially System F. [1]

Bounded parametric polymorphism

In 1985, Luca Cardelli and Peter Wegner recognized the advantages of allowing bounds on the type parameters. [7] Many operations require some knowledge of the data types, but can otherwise work parametrically. For example, to check whether an item is included in a list, we need to compare the items for equality. In Standard ML, type parameters of the form ’’a are restricted so that the equality operation is available, thus the function would have the type ’’a × ’’a list → bool and ’’a can only be a type with defined equality. In Haskell, bounding is achieved by requiring types to belong to a type class; thus the same function has the type in Haskell. In most object-oriented programming languages that support parametric polymorphism, parameters can be constrained to be subtypes of a given type (see Subtype polymorphism and the article on Generic programming).

See also

Notes

  1. 1 2 3 Pierce 2002.
  2. Strachey 1967.
  3. Milner, R., Morris, L., Newey, M. "A Logic for Computable Functions with reflexive and polymorphic types", Proc. Conference on Proving and Improving Programs, Arc-et-Senans (1975)
  4. Benjamin C. Pierce; Benjamin C. (Professor Pierce, University of Pennsylvania) (2002). Types and Programming Languages. MIT Press. ISBN   978-0-262-16209-8.
  5. Pierce 2002, p. 359.
  6. Pierce 2002, p. 340.
  7. Cardelli & Wegner 1985.

Related Research Articles

ML is a general-purpose functional programming language. ML is statically-scoped. It is known for its use of the polymorphic Hindley–Milner type system, which automatically assigns the types of most expressions without requiring explicit type annotations, and ensures type safety – there is a formal proof that a well-typed ML program does not cause runtime type errors. ML provides pattern matching for function arguments, garbage collection, imperative programming, call-by-value and currying. It is used heavily in programming language research and is one of the few languages to be completely specified and verified using formal semantics. Its types and pattern matching make it well-suited and commonly used to operate on other formal languages, such as in compiler writing, automated theorem proving, and formal verification.

In computer science, denotational semantics is an approach of formalizing the meanings of programming languages by constructing mathematical objects that describe the meanings of expressions from the languages. Other approaches providing formal semantics of programming languages including axiomatic semantics and operational semantics.

In programming languages, a type system is a logical system comprising a set of rules that assigns a property called a type to the various constructs of a computer program, such as variables, expressions, functions or modules. These types formalize and enforce the otherwise implicit categories the programmer uses for algebraic data types, data structures, or other components. The main purpose of a type system is to reduce possibilities for bugs in computer programs by defining interfaces between different parts of a computer program, and then checking that the parts have been connected in a consistent way. This checking can happen statically, dynamically, or as a combination of both. Type systems have other purposes as well, such as expressing business rules, enabling certain compiler optimizations, allowing for multiple dispatch, providing a form of documentation, etc.

In programming languages and type theory, polymorphism is the provision of a single interface to entities of different types or the use of a single symbol to represent multiple different types.

Type inference refers to the automatic detection of the type of an expression in a formal language. These include programming languages and mathematical type systems, but also natural languages in some branches of computer science and linguistics.

In type theory and programming languages, a type variable is a mathematical variable ranging over types. Even in programming languages that allow mutable variables, a type variable remains an abstraction, in the sense that it does not correspond to some memory locations.

In computer programming, especially functional programming and type theory, an algebraic data type is a kind of composite type, i.e., a type formed by combining other types.

In programming languages, ad hoc polymorphism is a kind of polymorphism in which polymorphic functions can be applied to arguments of different types, because a polymorphic function can denote a number of distinct and potentially heterogeneous implementations depending on the type of argument(s) to which it is applied. When applied to object-oriented or procedural concepts, it is also known as function overloading or operator overloading. The term ad hoc in this context is not intended to be pejorative; it refers simply to the fact that this type of polymorphism is not a fundamental feature of the type system. This is in contrast to parametric polymorphism, in which polymorphic functions are written without mention of any specific type, and can thus apply a single abstract implementation to any number of types in a transparent way. This classification was introduced by Christopher Strachey in 1967.

The Glasgow Haskell Compiler (GHC) is an open-source native code compiler for the functional programming language Haskell. It provides a cross-platform environment for the writing and testing of Haskell code and it supports numerous extensions, libraries, and optimisations that streamline the process of generating and executing code. GHC is the most commonly used Haskell compiler. The lead developers are Simon Peyton Jones and Simon Marlow.

System F, also known as the (Girard–Reynolds) polymorphic lambda calculus or the second-order lambda calculus, is a typed lambda calculus that differs from the simply typed lambda calculus by the introduction of a mechanism of universal quantification over types. System F thus formalizes the notion of parametric polymorphism in programming languages, and forms a theoretical basis for languages such as Haskell and ML. System F was discovered independently by logician Jean-Yves Girard (1972) and computer scientist John C. Reynolds (1974).

Lambda cube

In mathematical logic and type theory, the λ-cube is a framework introduced by Henk Barendregt to investigate the different dimensions in which the calculus of constructions is a generalization of the simply typed λ-calculus. Each dimension of the cube corresponds to a new kind of dependency between terms and types. Here, "dependency" refers to the capacity of a term or type to bind a term or type. The respective dimensions of the λ-cube correspond to:

In mathematics, logic and philosophy of mathematics, something that is impredicative is a self-referencing definition. Roughly speaking, a definition is impredicative if it invokes the set being defined, or another set that contains the thing being defined. There is no generally accepted precise definition of what it means to be predicative or impredicative. Authors have given different but related definitions.

In computer science, a type class is a type system construct that supports ad hoc polymorphism. This is achieved by adding constraints to type variables in parametrically polymorphic types. Such a constraint typically involves a type class T and a type variable a, and means that a can only be instantiated to a type whose members support the overloaded operations associated with T.

Programming language theory Branch of computer science

Programming language theory (PLT) is a branch of computer science that deals with the design, implementation, analysis, characterization, and classification of programming languages and of their individual features. It falls within the discipline of computer science, both depending on and affecting mathematics, software engineering, linguistics and even cognitive science. It has become a well-recognized branch of computer science, and an active research area, with results published in numerous journals dedicated to PLT, as well as in general computer science and engineering publications.

In many programming languages, map is the name of a higher-order function that applies a given function to each element of a functor, e.g. a list, returning a list of results in the same order. It is often called apply-to-all when considered in functional form.

In programming language theory, parametricity is an abstract uniformity property enjoyed by parametrically polymorphic functions, which captures the intuition that all instances of a polymorphic function act the same way.

In the area of mathematical logic and computer science known as type theory, a kind is the type of a type constructor or, less commonly, the type of a higher-order type operator. A kind system is essentially a simply typed lambda calculus "one level up", endowed with a primitive type, denoted and called "type", which is the kind of any data type which does not need any type parameters.

In type theory, bounded quantification refers to universal or existential quantifiers which are restricted ("bounded") to range only over the subtypes of a particular type. Bounded quantification is an interaction of parametric polymorphism with subtyping. Bounded quantification has traditionally been studied in the functional setting of System F<:, but is available in modern object-oriented languages supporting parametric polymorphism (generics) such as Java, C# and Scala.

In computer science, polymorphic recursion refers to a recursive parametrically polymorphic function where the type parameter changes with each recursive invocation made, instead of staying constant. Type inference for polymorphic recursion is equivalent to semi-unification and therefore undecidable and requires the use of a semi-algorithm or programmer supplied type annotations.

A Hindley–Milner (HM) type system is a classical type system for the lambda calculus with parametric polymorphism. It is also known as Damas–Milner or Damas–Hindley–Milner. It was first described by J. Roger Hindley and later rediscovered by Robin Milner. Luis Damas contributed a close formal analysis and proof of the method in his PhD thesis.

References