Polymorphism (computer science)

Last updated

In programming language theory and type theory, polymorphism is the use of a single symbol to represent multiple different types. [1]

Contents

In object-oriented programming, polymorphism is the provision of a single interface to entities of different types. [2] The concept is borrowed from a principle in biology where an organism or species can have many different forms or stages. [3]

The most commonly recognized major forms of polymorphism are:

History

Interest in polymorphic type systems developed significantly in the 1990s, with practical implementations beginning to appear by the end of the decade. Ad hoc polymorphism and parametric polymorphism were originally described in Christopher Strachey's Fundamental Concepts in Programming Languages , [5] where they are listed as "the two main classes" of polymorphism. Ad hoc polymorphism was a feature of ALGOL 68, while parametric polymorphism was the core feature of ML's type system.

In a 1985 paper, Peter Wegner and Luca Cardelli introduced the term inclusion polymorphism to model subtypes and inheritance, [1] citing Simula as the first programming language to implement it.

Forms

Ad hoc polymorphism

Christopher Strachey chose the term ad hoc polymorphism to refer to polymorphic functions that can be applied to arguments of different types, but that behave differently depending on the type of the argument to which they are applied (also known as function overloading or operator overloading). [5] The term "ad hoc" in this context is not intended to be pejorative; it refers simply to the fact that this form of polymorphism is not a fundamental feature of the type system. In the Java example below, the Add functions seem to work generically over two types (integer and string) when looking at the invocations, but are considered to be two entirely distinct functions by the compiler for all intents and purposes:

classAdHocPolymorphic{publicStringadd(intx,inty){return"Sum: "+(x+y);}publicStringadd(Stringname){return"Added "+name;}}publicclassadhoc{publicstaticvoidmain(String[]args){AdHocPolymorphicpoly=newAdHocPolymorphic();System.out.println(poly.add(1,2));// prints "Sum: 3"System.out.println(poly.add("Jay"));// prints "Added Jay"}}

In dynamically typed languages the situation can be more complex as the correct function that needs to be invoked might only be determinable at run time.

Implicit type conversion has also been defined as a form of polymorphism, referred to as "coercion polymorphism". [1] [6]

Parametric polymorphism

Parametric polymorphism allows a function or a data type to be written generically, so that it can handle values uniformly without depending on their type. [7] Parametric polymorphism is a way to make a language more expressive while still maintaining full static type-safety.

The concept of parametric polymorphism applies to both data types and functions. A function that can evaluate to or be applied to values of different types is known as a polymorphic function. A data type that can appear to be of a generalized type (e.g. a list with elements of arbitrary type) is designated polymorphic data type like the generalized type from which such specializations are made.

Parametric polymorphism is ubiquitous in functional programming, where it is often simply referred to as "polymorphism". The following example in Haskell shows a parameterized list data type and two parametrically polymorphic functions on them:

dataLista=Nil|Consa(Lista)length::Lista->IntegerlengthNil=0length(Consxxs)=1+lengthxsmap::(a->b)->Lista->ListbmapfNil=Nilmapf(Consxxs)=Cons(fx)(mapfxs)

Parametric polymorphism is also available in several object-oriented languages. For instance, templates in C++ and D, or under the name generics in C#, Delphi, Java and Go:

classList<T>{classNode<T>{Telem;Node<T>next;}Node<T>head;intlength(){...}}List<B>map(Func<A,B>f,List<A>xs){...}

John C. Reynolds (and later Jean-Yves Girard) formally developed this notion of polymorphism as an extension to lambda calculus (called the polymorphic lambda calculus or System F). Any parametrically polymorphic function is necessarily restricted in what it can do, working on the shape of the data instead of its value, leading to the concept of parametricity.

Subtyping

Some languages employ the idea of subtyping (also called subtype polymorphism or inclusion polymorphism) to restrict the range of types that can be used in a particular case of polymorphism. In these languages, subtyping allows a function to be written to take an object of a certain type T, but also work correctly, if passed an object that belongs to a type S that is a subtype of T (according to the Liskov substitution principle). This type relation is sometimes written S <: T. Conversely, T is said to be a supertype of S—written T :> S. Subtype polymorphism is usually resolved dynamically (see below).

In the following Java example we make cats and dogs subtypes of pets. The procedure letsHear() accepts a pet, but will also work correctly if a subtype is passed to it:

abstractclassPet{abstractStringspeak();}classCatextendsPet{Stringspeak(){return"Meow!";}}classDogextendsPet{Stringspeak(){return"Woof!";}}staticvoidletsHear(finalPetpet){println(pet.speak());}staticvoidmain(String[]args){letsHear(newCat());letsHear(newDog());}

UML class pet.svg

In another example, if Number, Rational, and Integer are types such that Number :> Rational and Number :> Integer (Rational and Integer as subtypes of a type Number that is a supertype of them), a function written to take a Number will work equally well when passed an Integer or Rational as when passed a Number. The actual type of the object can be hidden from clients into a black box, and accessed via object identity. In fact, if the Number type is abstract, it may not even be possible to get your hands on an object whose most-derived type is Number (see abstract data type, abstract class). This particular kind of type hierarchy is known—especially in the context of the Scheme programming language—as a numerical tower , and usually contains many more types.

Object-oriented programming languages offer subtype polymorphism using subclassing (also known as inheritance ). In typical implementations, each class contains what is called a virtual table (shortly called vtable) — a table of functions that implement the polymorphic part of the class interface—and each object contains a pointer to the vtable of its class, which is then consulted whenever a polymorphic method is called. This mechanism is an example of:

The same goes for most other popular object systems. Some, however, such as Common Lisp Object System, provide multiple dispatch , under which method calls are polymorphic in all arguments.

The interaction between parametric polymorphism and subtyping leads to the concepts of variance and bounded quantification.

Row polymorphism

Row polymorphism [8] is a similar, but distinct concept from subtyping. It deals with structural types. It allows the usage of all values whose types have certain properties, without losing the remaining type information.

Polytypism

A related concept is polytypism (or data type genericity). A polytypic function is more general than polymorphic, and in such a function, "though one can provide fixed ad hoc cases for specific data types, an ad hoc combinator is absent". [9]

Rank polymorphism

Rank polymorphism is one of the defining features of the array programming languages, like APL. The essence of the rank-polymorphic programming model is implicitly treating all operations as aggregate operations, usable on arrays with arbitrarily many dimensions, [10] which is to say that rank polymorphism allows functions to be defined to operate on arrays of any shape and size.

Implementation aspects

Static and dynamic polymorphism

Polymorphism can be distinguished by when the implementation is selected: statically (at compile time) or dynamically (at run time, typically via a virtual function). This is known respectively as static dispatch and dynamic dispatch, and the corresponding forms of polymorphism are accordingly called static polymorphism and dynamic polymorphism.

Static polymorphism executes faster, because there is no dynamic dispatch overhead, but requires additional compiler support. Further, static polymorphism allows greater static analysis by compilers (notably for optimization), source code analysis tools, and human readers (programmers). Dynamic polymorphism is more flexible but slower—for example, dynamic polymorphism allows duck typing, and a dynamically linked library may operate on objects without knowing their full type.

Static polymorphism typically occurs in ad hoc polymorphism and parametric polymorphism, whereas dynamic polymorphism is usual for subtype polymorphism. However, it is possible to achieve static polymorphism with subtyping through more sophisticated use of template metaprogramming, namely the curiously recurring template pattern.

When polymorphism is exposed via a library, static polymorphism becomes impossible for dynamic libraries as there is no way of knowing what types the parameters are when the shared object is built. While languages like C++ and Rust use monomorphized templates, the Swift programming language makes extensive use of dynamic dispatch to build the application binary interface for these libraries by default. As a result, more code can be shared for a reduced system size at the cost of runtime overhead. [11]

See also

Related Research Articles

ML is a general-purpose, high-level, functional programming language. It is known for its use of the polymorphic Hindley–Milner type system, which automatically assigns the data types of most expressions without requiring explicit type annotations, and ensures type safety; there is a formal proof that a well-typed ML program does not cause runtime type errors. ML provides pattern matching for function arguments, garbage collection, imperative programming, call-by-value and currying. While a general-purpose programming language, ML is used heavily in programming language research and is one of the few languages to be completely specified and verified using formal semantics. Its types and pattern matching make it well-suited and commonly used to operate on other formal languages, such as in compiler writing, automated theorem proving, and formal verification.

In programming languages, name binding is the association of entities with identifiers. An identifier bound to an object is said to reference that object. Machine languages have no built-in notion of identifiers, but name-object bindings as a service and notation for the programmer is implemented by programming languages. Binding is intimately connected with scoping, as scope determines which names bind to which objects – at which locations in the program code (lexically) and in which one of the possible execution paths (temporally).

In computer programming, a type system is a logical system comprising a set of rules that assigns a property called a type to every term. Usually the terms are various language constructs of a computer program, such as variables, expressions, functions, or modules. A type system dictates the operations that can be performed on a term. For variables, the type system determines the allowed values of that term.

This is a list of terms found in object-oriented programming.

In computer science, a type signature or type annotation defines the inputs and outputs of a function, subroutine or method. A type signature includes the number, types, and order of the function's arguments. One important use of a type signature is for function overload resolution, where one particular definition of a function to be called is selected among many overloaded forms.

In programming languages, ad hoc polymorphism is a kind of polymorphism in which polymorphic functions can be applied to arguments of different types, because a polymorphic function can denote a number of distinct and potentially heterogeneous implementations depending on the type of argument(s) to which it is applied. When applied to object-oriented or procedural concepts, it is also known as function overloading or operator overloading. The term ad hoc in this context is not intended to be pejorative; it refers simply to the fact that this type of polymorphism is not a fundamental feature of the type system. This is in contrast to parametric polymorphism, in which polymorphic functions are written without mention of any specific type, and can thus apply a single abstract implementation to any number of types in a transparent way. This classification was introduced by Christopher Strachey in 1967.

In computer science, a tagged union, also called a variant, variant record, choice type, discriminated union, disjoint union, sum type, or coproduct, is a data structure used to hold a value that could take on several different, but fixed, types. Only one of the types can be in use at any one time, and a tag field explicitly indicates which type is in use. It can be thought of as a type that has several "cases", each of which should be handled correctly when that type is manipulated. This is critical in defining recursive datatypes, in which some component of a value may have the same type as that value, for example in defining a type for representing trees, where it is necessary to distinguish multi-node subtrees and leaves. Like ordinary unions, tagged unions can save storage by overlapping storage areas for each type, since only one is in use at a time.

In computer science, type conversion, type casting, type coercion, and type juggling are different ways of changing an expression from one data type to another. An example would be the conversion of an integer value into a floating point value or its textual representation as a string, and vice versa. Type conversions can take advantage of certain features of type hierarchies or data representations. Two important aspects of a type conversion are whether it happens implicitly (automatically) or explicitly, and whether the underlying data representation is converted from one representation into another, or a given representation is merely reinterpreted as the representation of another data type. In general, both primitive and compound data types can be converted.

In computer science, type safety and type soundness are the extent to which a programming language discourages or prevents type errors. Type safety is sometimes alternatively considered to be a property of facilities of a computer language; that is, some facilities are type-safe and their usage will not result in type errors, while other facilities in the same language may be type-unsafe and a program using them may encounter type errors. The behaviors classified as type errors by a given programming language are usually those that result from attempts to perform operations on values that are not of the appropriate data type, e.g., adding a string to an integer when there's no definition on how to handle this case. This classification is partly based on opinion.

In computer science, dynamic dispatch is the process of selecting which implementation of a polymorphic operation to call at run time. It is commonly employed in, and considered a prime characteristic of, object-oriented programming (OOP) languages and systems.

<span class="mw-page-title-main">Method overriding</span> Language feature in object-oriented programming

Method overriding, in object-oriented programming, is a language feature that allows a subclass or child class to provide a specific implementation of a method that is already provided by one of its superclasses or parent classes. In addition to providing data-driven algorithm-determined parameters across virtual network interfaces, it also allows for a specific type of polymorphism (subtyping). The implementation in the subclass overrides (replaces) the implementation in the superclass by providing a method that has same name, same parameters or signature, and same return type as the method in the parent class. The version of a method that is executed will be determined by the object that is used to invoke it. If an object of a parent class is used to invoke the method, then the version in the parent class will be executed, but if an object of the subclass is used to invoke the method, then the version in the child class will be executed. This helps in preventing problems associated with differential relay analytics which would otherwise rely on a framework in which method overriding might be obviated. Some languages allow a programmer to prevent a method from being overridden.

In software engineering, double dispatch is a special form of multiple dispatch, and a mechanism that dispatches a function call to different concrete functions depending on the runtime types of two objects involved in the call. In most object-oriented systems, the concrete function that is called from a function call in the code depends on the dynamic type of a single object and therefore they are known as single dispatch calls, or simply virtual function calls.

Many programming language type systems support subtyping. For instance, if the type Cat is a subtype of Animal, then an expression of type Cat should be substitutable wherever an expression of type Animal is used.

In object-oriented programming, inheritance is the mechanism of basing an object or class upon another object or class, retaining similar implementation. Also defined as deriving new classes from existing ones such as super class or base class and then forming them into a hierarchy of classes. In most class-based object-oriented languages like C++, an object created through inheritance, a "child object", acquires all the properties and behaviors of the "parent object", with the exception of: constructors, destructors, overloaded operators and friend functions of the base class. Inheritance allows programmers to create classes that are built upon existing classes, to specify a new implementation while maintaining the same behaviors, to reuse code and to independently extend original software via public classes and interfaces. The relationships of objects or classes through inheritance give rise to a directed acyclic graph.

In programming languages and type theory, parametric polymorphism allows a single piece of code to be given a "generic" type, using variables in place of actual types, and then instantiated with particular types as needed. Parametrically polymorphic functions and data types are sometimes called generic functions and generic datatypes, respectively, and they form the basis of generic programming.

In computer science, a type class is a type system construct that supports ad hoc polymorphism. This is achieved by adding constraints to type variables in parametrically polymorphic types. Such a constraint typically involves a type class T and a type variable a, and means that a can only be instantiated to a type whose members support the overloaded operations associated with T.

In mathematical logic and computer science, some type theories and type systems include a top type that is commonly denoted with top or the symbol ⊤. The top type is sometimes called also universal type, or universal supertype as all other types in the type system of interest are subtypes of it, and in most cases, it contains every possible object of the type system. It is in contrast with the bottom type, or the universal subtype, which every other type is supertype of and it is often that the type contains no members at all.

The curiously recurring template pattern (CRTP) is an idiom, originally in C++, in which a class X derives from a class template instantiation using X itself as a template argument. More generally it is known as F-bound polymorphism, and it is a form of F-bounded quantification.

Generics are a facility of generic programming that were added to the Java programming language in 2004 within version J2SE 5.0. They were designed to extend Java's type system to allow "a type or method to operate on objects of various types while providing compile-time type safety". The aspect compile-time type safety required that parametrically polymorphic functions are not implemented in the Java virtual machine, since type safety is impossible in this case.

In type theory, bounded quantification refers to universal or existential quantifiers which are restricted ("bounded") to range only over the subtypes of a particular type. Bounded quantification is an interaction of parametric polymorphism with subtyping. Bounded quantification has traditionally been studied in the functional setting of System F<:, but is available in modern object-oriented languages supporting parametric polymorphism (generics) such as Java, C# and Scala.

References

  1. 1 2 3 Cardelli, Luca; Wegner, Peter (December 1985). "On understanding types, data abstraction, and polymorphism" (PDF). ACM Computing Surveys . 17 (4): 471–523. CiteSeerX   10.1.1.117.695 . doi:10.1145/6041.6042. S2CID   2921816.: "Polymorphic types are types whose operations are applicable to values of more than one type."
  2. Bjarne Stroustrup (February 19, 2007). "Bjarne Stroustrup's C++ Glossary". polymorphism – providing a single interface to entities of different types.
  3. "Polymorphism". The Java™ Tutorials: Learning the Java Language: Interfaces and Inheritance. Oracle. Retrieved 2021-09-08.
  4. Conallen, J.; Engle, M.; Houston, K.; Maksimchuk, R.; Young, B.; Booch, G. (2007). Object-Oriented Analysis and Design with Applications (3rd ed.). Pearson Education. ISBN   9780132797443.
  5. 1 2 Strachey, Christopher (2000). "Fundamental Concepts in Programming Languages". Higher-Order and Symbolic Computation . 13 (1/2): 11–49. CiteSeerX   10.1.1.332.3161 . doi:10.1023/A:1010000313106. ISSN   1573-0557. S2CID   14124601.
  6. Tucker, Allen B. (2004). Computer Science Handbook (2nd ed.). Taylor & Francis. pp. 91–. ISBN   978-1-58488-360-9.
  7. Pierce, B.C. (2002). "23.2 Varieties of Polymorphism". Types and Programming Languages. MIT Press. pp. 340–1. ISBN   9780262162098.
  8. Wand, Mitchell (June 1989). "Type inference for record concatenation and multiple inheritance". Proceedings. Fourth Annual Symposium on Logic in Computer Science. pp. 92–97. doi:10.1109/LICS.1989.39162.
  9. Lämmel, Ralf; Visser, Joost (2002). "Typed Combinators for Generic Traversal". Practical Aspects of Declarative Languages: 4th International Symposium. Springer. pp. 137–154, See p. 153. CiteSeerX   10.1.1.18.5727 . ISBN   354043092X.
  10. Slepak, Justin; Shivers, Olin; Manolios, Panagiotis (2019). "The semantics of rank polymorphism". arXiv: 1907.00509 [cs.PL].
  11. Beingessner, Alexis. "How Swift Achieved Dynamic Linking Where Rust Couldn't".