Structural type system

Last updated

A structural type system (or property-based type system) is a major class of type systems in which type compatibility and equivalence are determined by the type's actual structure or definition and not by other characteristics such as its name or place of declaration. Structural systems are used to determine if types are equivalent and whether a type is a subtype of another. It contrasts with nominative systems, where comparisons are based on the names of the types or explicit declarations, and duck typing , in which only the part of the structure accessed at runtime is checked for compatibility.

Contents

Description

In structural typing, an element is considered to be compatible with another if, for each feature within the second element's type, a corresponding and identical feature exists in the first element's type. Some languages may differ on the details, such as whether the features must match in name. This definition is not symmetric, and includes subtype compatibility. Two types are considered to be identical if each is compatible with the other.

For example, OCaml uses structural typing on methods for compatibility of object types. Go uses structural typing on methods to determine compatibility of a type with an interface. C++ template functions exhibit structural typing on type arguments. Haxe uses structural typing, but classes are not structurally subtyped.

In languages which support subtype polymorphism, a similar dichotomy can be formed based on how the subtype relationship is defined. One type is a subtype of another if and only if it contains all the features of the base type, or subtypes thereof. The subtype may contain added features, such as members not present in the base type, or stronger invariants.

A distinction exists between structural substitution for inferred and non-inferred polymorphism. Some languages, such as Haskell, do not substitute structurally in the case where an expected type is declared (i.e., not inferred), e.g., only substitute for functions that are signature-based polymorphic via type inference. [1] Then it is not possible to accidentally subtype a non-inferred type, although it may still be possible to provide an explicit conversion to a non-inferred type, which is invoked implicitly.

Structural subtyping is arguably more flexible than nominative subtyping, as it permits the creation of ad hoc types and protocols; in particular, it permits creation of a type which is a supertype of an existing type, without modifying the definition of the latter. However, this may not be desirable where the programmer wishes to create closed abstractions.

A pitfall of structural typing versus nominative typing is that two separately defined types intended for different purposes, but accidentally holding the same properties (e.g. both composed of a pair of integers), could be considered the same type by the type system, simply because they happen to have identical structure. One way this can be avoided is by creating one algebraic data type for each use.

In 1990, Cook, et al., proved that inheritance is not subtyping in structurally-typed OO languages. [2]

Checking that two types are compatible, based on structural typing, is a non-trivial operation, e.g., requires maintaining a stack of previous checked types. [3]

Example

Objects in OCaml are structurally typed by the names and types of their methods.

Objects can be created directly (immediate objects) without going through a nominative class. Classes only serve as functions for creating objects.

#letx=objectvalmutablex=5methodget_x=xmethodset_xy=x<-yend;;valx:<get_x:int;set_x:int->unit>=<obj>

Here the OCaml interactive runtime prints out the inferred type of the object for convenience. Its type (< get_x : int; set_x : int -> unit >) is defined only by its methods. In other words, the type of x is defined by the method types "get_x : int" and "set_x : int -> unit" rather than by any name. [4]

To define another object, which has the same methods and types of methods:

#lety=objectmethodget_x=2methodset_xy=Printf.printf"%d\n"yend;;valy:<get_x:int;set_x:int->unit>=<obj>

OCaml considers them the same type. For example, the equality operator is typed to only take two values of the same type:

#x=y;;-:bool=false

So they must be the same type, or else this wouldn't even type-check. This shows that equivalence of types is structural.

One can define a function that invokes a method:

#letset_to_10a=a#set_x10;;valset_to_10:<set_x:int->'a;..>->'a=<fun>

The inferred type for the first argument (< set_x : int -> 'a; .. >) is interesting. The .. means that the first argument can be any object which has a "set_x" method, which takes an int as argument.

So it can be used on object x:

#set_to_10x;;-:unit=()

Another object can be made that happens to have that method and method type; the other methods are irrelevant:

#letz=objectmethodblahblah=2.5methodset_xy=Printf.printf"%d\n"yend;;valz:<blahblah:float;set_x:int->unit>=<obj>

The "set_to_10" function also works on it:

#set_to_10z;;10-:unit=()

This shows that compatibility for things like method invocation is determined by structure.

Let us define a type synonym for objects with only a "get_x" method and no other methods:

#typesimpler_obj=<get_x:int>;;typesimpler_obj=<get_x:int>

The object x is not of this type; but structurally, x is of a subtype of this type, since x contains a superset of its methods. So x can be coerced to this type:

#(x:>simpler_obj);;-:simpler_obj=<obj>#(x:>simpler_obj)#get_x;;-:int=10

But not object z, because it is not a structural subtype:

# (z :> simpler_obj);; This expression cannot be coerced to type simpler_obj = < get_x : int >; it has type < blahblah : float; set_x : int -> unit > but is here used with type   < get_x : int; .. > The first object type has no method get_x

This shows that compatibility for widening coercions are structural.

Related Research Articles

OCaml is a general-purpose, high-level, multi-paradigm programming language which extends the Caml dialect of ML with object-oriented features. OCaml was created in 1996 by Xavier Leroy, Jérôme Vouillon, Damien Doligez, Didier Rémy, Ascánder Suárez, and others.

In object-oriented (OO) and functional programming, an immutable object is an object whose state cannot be modified after it is created. This is in contrast to a mutable object, which can be modified after it is created. In some cases, an object is considered immutable even if some internally used attributes change, but the object's state appears unchanging from an external point of view. For example, an object that uses memoization to cache the results of expensive computations could still be considered an immutable object.

In computer programming, a type system is a logical system comprising a set of rules that assigns a property called a type to every term. Usually the terms are various language constructs of a computer program, such as variables, expressions, functions, or modules. A type system dictates the operations that can be performed on a term. For variables, the type system determines the allowed values of that term. Type systems formalize and enforce the otherwise implicit categories the programmer uses for algebraic data types, data structures, or other components.

In programming language theory, subtyping is a form of type polymorphism. A subtype is a datatype that is related to another datatype by some notion of substitutability, meaning that program elements, written to operate on elements of the supertype, can also operate on elements of the subtype.

<span class="mw-page-title-main">F Sharp (programming language)</span> Microsoft programming language

F# is a general-purpose, strongly typed, multi-paradigm programming language that encompasses functional, imperative, and object-oriented programming methods. It is most often used as a cross-platform Common Language Infrastructure (CLI) language on .NET, but can also generate JavaScript and graphics processing unit (GPU) code.

In programming language theory and type theory, polymorphism is the use of a single symbol to represent multiple different types.

In computer science, a type signature or type annotation defines the inputs and outputs for a function, subroutine or method. A type signature includes the number, types, and order of the arguments required by a function. A type signature is typically used during overload resolution for choosing the correct definition of a function to be called among many overloaded forms.

In computer programming, duck typing is an application of the duck test—"If it walks like a duck and it quacks like a duck, then it must be a duck"—to determine whether an object can be used for a particular purpose. With nominative typing, an object is of a given type if it is declared as such. With duck typing, an object is of a given type if it has all methods and properties required by that type. Duck typing may be viewed as a usage-based structural equivalence between a given object and the requirements of a type.

<span class="mw-page-title-main">Foreach loop</span> Control flow statement for traversing items in a collection

In computer programming, foreach loop is a control flow statement for traversing items in a collection. foreach is usually used in place of a standard for loop statement. Unlike other for loop constructs, however, foreach loops usually maintain no explicit counter: they essentially say "do this to everything in this set", rather than "do this x times". This avoids potential off-by-one errors and makes code simpler to read. In object-oriented languages, an iterator, even if implicit, is often used as the means of traversal.

Many programming language type systems support subtyping. For instance, if the type Cat is a subtype of Animal, then an expression of type Cat should be substitutable wherever an expression of type Animal is used.

In computer science, a relational operator is a programming language construct or operator that tests or defines some kind of relation between two entities. These include numerical equality and inequalities.

In mathematical logic and computer science, some type theories and type systems include a top type that is commonly denoted with top or the symbol ⊤. The top type is sometimes called also universal type, or universal supertype as all other types in the type system of interest are subtypes of it, and in most cases, it contains every possible object of the type system. It is in contrast with the bottom type, or the universal subtype, which every other type is supertype of and it is often that the type contains no members at all.

In programming languages, an abstract type is a type in a nominative type system that cannot be instantiated directly; by contrast, a concrete typecan be instantiated directly. Instantiation of an abstract type can occur only indirectly, via a concrete subtype.

In computer science, a type system is nominal if compatibility and equivalence of data types is determined by explicit declarations and/or the name of the types. Nominal systems are used to determine if types are equivalent, as well as if a type is a subtype of another. Nominal type systems contrast with structural systems, where comparisons are based on the structure of the types in question and do not require explicit declarations.

In computer programming, an anonymous function is a function definition that is not bound to an identifier. Anonymous functions are often arguments being passed to higher-order functions or used for constructing the result of a higher-order function that needs to return a function. If the function is only used once, or a limited number of times, an anonymous function may be syntactically lighter than using a named function. Anonymous functions are ubiquitous in functional programming languages and other languages with first-class functions, where they fulfil the same role for the function type as literals do for other data types.

The circle–ellipse problem in software development illustrates several pitfalls which can arise when using subtype polymorphism in object modelling. The issues are most commonly encountered when using object-oriented programming (OOP). By definition, this problem is a violation of the Liskov substitution principle, one of the SOLID principles.

In programming languages and type theory, an option type or maybe type is a polymorphic type that represents encapsulation of an optional value; e.g., it is used as the return type of functions which may or may not return a meaningful value when they are applied. It consists of a constructor which either is empty, or which encapsulates the original data type A.

Nemerle is a general-purpose, high-level, statically typed programming language designed for platforms using the Common Language Infrastructure (.NET/Mono). It offers functional, object-oriented, aspect-oriented, reflective and imperative features. It has a simple C#-like syntax and a powerful metaprogramming system.

A Hindley–Milner (HM) type system is a classical type system for the lambda calculus with parametric polymorphism. It is also known as Damas–Milner or Damas–Hindley–Milner. It was first described by J. Roger Hindley and later rediscovered by Robin Milner. Luis Damas contributed a close formal analysis and proof of the method in his PhD thesis.

Objective-C is a high-level general-purpose, object-oriented programming language that adds Smalltalk-style messaging to the C programming language. Originally developed by Brad Cox and Tom Love in the early 1980s, it was selected by NeXT for its NeXTSTEP operating system. Due to Apple macOS’s direct lineage from NeXTSTEP, Objective-C was the standard programming language used, supported, and promoted by Apple for developing macOS and iOS applications until the introduction of the Swift programming language in 2014. Thereafter, its usage has been consistently declining among developers and it has often been described as a "dying" language.

References

  1. "Signature-based polymorphism".
  2. Cook, W.R.; Hill, W.L.; Canning, P.S. (January 1990). "Inheritance is not subtyping". Proceedings of the 17th ACM SIGPLAN-SIGACT symposium on Principles of programming languages - POPL '90. San Francisco, California. pp. 125–135. doi: 10.1145/96709.96721 . ISBN   978-0897913430. S2CID   8225906.{{cite book}}: CS1 maint: location missing publisher (link)
  3. "Type compatibility: name vs structural equivalence".
  4. "Object types".