Behavioral subtyping

Last updated

In object-oriented programming, behavioral subtyping is the principle that subclasses should satisfy the expectations of clients accessing subclass objects through references of superclass type, not just as regards syntactic safety (such as the absence of "method-not-found" errors) but also as regards behavioral correctness. Specifically, properties that clients can prove using the specification of an object's presumed type should hold even though the object is actually a member of a subtype of that type. [1]

Contents

For example, consider a type Stack and a type Queue, which both have a put method to add an element and a get method to remove one. Suppose the documentation associated with these types specifies that type Stack's methods shall behave as expected for stacks (i.e. they shall exhibit LIFO behavior), and that type Queue's methods shall behave as expected for queues (i.e. they shall exhibit FIFO behavior). Suppose, now, that type Stack was declared as a subclass of type Queue. Most programming language compilers ignore documentation and perform only the checks that are necessary to preserve type safety. Since, for each method of type Queue, type Stack provides a method with a matching name and signature, this check would succeed. However, clients accessing a Stack object through a reference of type Queue would, based on Queue's documentation, expect FIFO behavior but observe LIFO behavior, invalidating these clients' correctness proofs and potentially leading to incorrect behavior of the program as a whole.

This example violates behavioral subtyping because type Stack is not a behavioral subtype of type Queue: it is not the case that the behavior described by the documentation of type Stack (i.e. LIFO behavior) complies with the documentation of type Queue (which requires FIFO behavior).

In contrast, a program where both Stack and Queue are subclasses of a type Bag, whose specification for get is merely that it removes some element, does satisfy behavioral subtyping and allows clients to safely reason about correctness based on the presumed types of the objects they interact with. Indeed, any object that satisfies the Stack or Queue specification also satisfies the Bag specification.

It is important to stress that whether a type S is a behavioral subtype of a type T depends only on the specification (i.e. the documentation) of type T; the implementation of type T, if it has any, is completely irrelevant to this question. Indeed, type T need not even have an implementation; it might be a purely abstract class. As another case in point, type Stack above is a behavioral subtype of type Bag even if type Bag's implementation exhibits FIFO behavior: what matters is that type Bag's specification does not specify which element is removed by method get. This also means that behavioral subtyping can be discussed only with respect to a particular (behavioral) specification for each type involved and that if the types involved have no well-defined behavioral specification, behavioral subtyping cannot be discussed meaningfully.

Verifying behavioral subtyping

A type S is a behavioral subtype of a type T if each behavior allowed by the specification of S is also allowed by the specification of T. This requires, in particular, that for each method M of T, the specification of M in S is stronger than the one in T.

A method specification given by a precondition Ps and a postcondition Qs is stronger than one given by a precondition Pt and a postcondition Qt (formally: (Ps, Qs) ⇒ (Pt, Qt)) if Ps is weaker than Pt (i.e. Pt implies Ps) and Qs is stronger than Qt (i.e. Qs implies Qt). That is, strengthening a method specification can be done by strengthening the postcondition and by weakening the precondition. Indeed, a method specification is stronger if it imposes more specific constraints on the outputs for inputs that were already supported, or if it requires more inputs to be supported.

For example, consider the (very weak) specification for a method that computes the absolute value of an argument x, that specifies a precondition 0 ≤ x and a postcondition 0 ≤ result. This specification says the method need not support negative values for x, and it need only ensure the result is nonnegative as well. Two possible ways to strengthen this specification are by strengthening the postcondition to state result = |x|, i.e. the result is equal to the absolute value of x, or by weakening the precondition to "true", i.e. all values for x should be supported. Of course, we can also combine both, into a specification that states that the result should equal the absolute value of x, for any value of x.

Note, however, that it is possible to strengthen a specification ((Ps, Qs) ⇒ (Pt, Qt)) without strengthening the postcondition (Qs ⇏ Qt). [2] [3] Consider a specification for the absolute value method that specifies a precondition 0 ≤ x and a postcondition result = x. The specification that specifies a precondition "true" and a postcondition result = |x| strengthens this specification, even though the postcondition result = |x| does not strengthen (or weaken) the postcondition result = x. The necessary condition for a specification with precondition Ps and postcondition Qs to be stronger than a specification with precondition Pt and postcondition Qt is that Ps is weaker than Pt and "Qs or not Ps" is stronger than "Qt or not Pt". Indeed, "result = |x| or false" does strengthen "result = x or x < 0".

"Substitutability"

In an influential keynote address [4] on data abstraction and class hierarchies at the OOPSLA 1987 programming language research conference, Barbara Liskov said the following: "What is wanted here is something like the following substitution property: If for each object of type S there is an object of type T such that for all programs P defined in terms of T, the behavior of P is unchanged when is substituted for , then S is a subtype of T." This characterization has since been widely known as the Liskov Substitution Principle (LSP). Unfortunately, though, it has several issues. Firstly, in its original formulation, it is too strong: we rarely want the behavior of a subclass to be identical to that of its superclass; substituting a subclass object for a superclass object is often done with the intent to change the program's behavior, albeit, if behavioral subtyping is respected, in a way that maintains the program's desirable properties. Secondly, it makes no mention of specifications, so it invites an incorrect reading where the implementation of type S is compared to the implementation of type T. This is problematic for several reasons, one being that it does not support the common case where T is abstract and has no implementation. Thirdly, and most subtly, in the context of object-oriented imperative programming it is difficult to define precisely what it means to universally or existentially quantify over objects of a given type, or to substitute one object for another. [3] In the example above, we are not substituting a Stack object for a Bag object, we are simply using a Stack object as a Bag object.

In an interview in 2016, Liskov herself explains that what she presented in her keynote address was an "informal rule", that Jeannette Wing later proposed that they "try to figure out precisely what this means", which led to their joint publication [1] on behavioral subtyping, and indeed that "technically, it's called behavioral subtyping". [5] During the interview, she does not use substitution terminology to discuss the concepts.

Notes

  1. 1 2 Liskov, Barbara; Wing, Jeannette (1994-11-01). "A behavioral notion of subtyping". ACM Transactions on Programming Languages and Systems. 16 (6): 1811–1841. doi: 10.1145/197320.197383 .
  2. Parkinson, Matthew J. (2005). Local reasoning for Java (PDF) (PhD). University of Cambridge.
  3. 1 2 Leavens, Gary T.; Naumann, David A. (August 2015). "Behavioral subtyping, specification inheritance, and modular reasoning". ACM Transactions on Programming Languages and Systems. 37 (4). doi: 10.1145/2766446 .
  4. Liskov, B. (May 1988). "Keynote address - data abstraction and hierarchy". ACM SIGPLAN Notices. 23 (5): 17–34. doi: 10.1145/62139.62141 .
  5. van Vleck, Tom (April 20, 2016). Interview with Barbara Liskov. ACM. Archived from the original on 2021-12-21.

Related Research Articles

In computer science, an abstract data type (ADT) is a mathematical model for data types, defined by its behavior (semantics) from the point of view of a user of the data, specifically in terms of possible values, possible operations on data of this type, and the behavior of these operations. This mathematical model contrasts with data structures, which are concrete representations of data, and are the point of view of an implementer, not a user. For example, a stack has push/pop operations that follow a Last-In-First-Out rule, and can be concretely implemented using either a list or an array. Another example is a set which stores values, without any particular order, and no repeated values. Values themselves are not retrieved from sets, rather one tests a value for membership to obtain a Boolean "in" or "not in".

<span class="mw-page-title-main">Design by contract</span> Approach for designing software

Design by contract (DbC), also known as contract programming, programming by contract and design-by-contract programming, is an approach for designing software.

In computer programming, a postcondition is a condition or predicate that must always be true just after the execution of some section of code or after an operation in a formal specification. Postconditions are sometimes tested using assertions within the code itself. Often, postconditions are simply included in the documentation of the affected section of code.

The Vienna Development Method (VDM) is one of the longest-established formal methods for the development of computer-based systems. Originating in work done at the IBM Laboratory Vienna in the 1970s, it has grown to include a group of techniques and tools based on a formal specification language—the VDM Specification Language (VDM-SL). It has an extended form, VDM++, which supports the modeling of object-oriented and concurrent systems. Support for VDM includes commercial and academic tools for analyzing models, including support for testing and proving properties of models and generating program code from validated VDM models. There is a history of industrial usage of VDM and its tools and a growing body of research in the formalism has led to notable contributions to the engineering of critical systems, compilers, concurrent systems and in logic for computer science.

In programming language theory, subtyping is a form of type polymorphism. A subtype is a datatype that is related to another datatype by some notion of substitutability, meaning that program elements, written to operate on elements of the supertype, can also operate on elements of the subtype.

This is a list of terms found in object-oriented programming.

In programming language theory and type theory, polymorphism is the use of a single symbol to represent multiple different types.

In knowledge representation and ontology components, including for object-oriented programming and design, is-a is a subsumptive relationship between abstractions, wherein one class A is a subclass of another class B . In other words, type A is a subtype of type B when A's specification implies B's specification. That is, any object that satisfies A's specification also satisfies B's specification, because B's specification is weaker.

<span class="mw-page-title-main">Liskov substitution principle</span> Object-oriented programming principle

The Liskov substitution principle (LSP) is a particular definition of a subtyping relation, called strong behavioral subtyping, that was initially introduced by Barbara Liskov in a 1987 conference keynote address titled Data abstraction and hierarchy. It is based on the concept of "substitutability" – a principle in object-oriented programming stating that an object may be replaced by a sub-object without breaking the program. It is a semantic rather than merely syntactic relation, because it intends to guarantee semantic interoperability of types in a hierarchy, object types in particular. Barbara Liskov and Jeannette Wing described the principle succinctly in a 1994 paper as follows:

Subtype Requirement: Let be a property provable about objects of type T. Then should be true for objects of type S where S is a subtype of T.

Class-based programming, or more commonly class-orientation, is a style of object-oriented programming (OOP) in which inheritance occurs via defining classes of objects, instead of inheritance occurring via the objects alone.

Many programming language type systems support subtyping. For instance, if the type Cat is a subtype of Animal, then an expression of type Cat should be substitutable wherever an expression of type Animal is used.

In concurrent programming, a monitor is a synchronization construct that prevents threads from concurrently accessing a shared object's state and allows them to wait for the state to change. They provide a mechanism for threads to temporarily give up exclusive access in order to wait for some condition to be met, before regaining exclusive access and resuming their task. A monitor consists of a mutex (lock) and at least one condition variable. A condition variable is explicitly 'signalled' when the object's state is modified, temporarily passing the mutex to another thread 'waiting' on the conditional variable.

<span class="mw-page-title-main">Barbara Liskov</span> American computer scientist

Barbara Liskov is an American computer scientist who has made pioneering contributions to programming languages and distributed computing. Her notable work includes the introduction of abstract data types and the accompanying principle of data abstraction, along with the Liskov substitution principle, which applies these ideas to object-oriented programming, subtyping, and inheritance. Her work was recognized with the 2008 Turing Award, the highest distinction in computer science.

Refinement is a generic term of computer science that encompasses various approaches for producing correct computer programs and simplifying existing programs to enable their formal verification.

In object-oriented programming, inheritance is the mechanism of basing an object or class upon another object or class, retaining similar implementation. Also defined as deriving new classes from existing ones such as super class or base class and then forming them into a hierarchy of classes. In most class-based object-oriented languages like C++, an object created through inheritance, a "child object", acquires all the properties and behaviors of the "parent object", with the exception of: constructors, destructors, overloaded operators and friend functions of the base class. Inheritance allows programmers to create classes that are built upon existing classes, to specify a new implementation while maintaining the same behaviors, to reuse code and to independently extend original software via public classes and interfaces. The relationships of objects or classes through inheritance give rise to a directed acyclic graph.

A structural type system is a major class of type systems in which type compatibility and equivalence are determined by the type's actual structure or definition and not by other characteristics such as its name or place of declaration. Structural systems are used to determine if types are equivalent and whether a type is a subtype of another. It contrasts with nominative systems, where comparisons are based on the names of the types or explicit declarations, and duck typing, in which only the part of the structure accessed at runtime is checked for compatibility.

In computer science, a container is a class or a data structure whose instances are collections of other objects. In other words, they store objects in an organized way that follows specific access rules.

The circle–ellipse problem in software development illustrates several pitfalls which can arise when using subtype polymorphism in object modelling. The issues are most commonly encountered when using object-oriented programming (OOP). By definition, this problem is a violation of the Liskov substitution principle, one of the SOLID principles.

In type theory, a refinement type is a type endowed with a predicate which is assumed to hold for any element of the refined type. Refinement types can express preconditions when used as function arguments or postconditions when used as return types: for instance, the type of a function which accepts natural numbers and returns natural numbers greater than 5 may be written as . Refinement types are thus related to behavioral subtyping.

Whiley is an experimental programming language that combines features from the functional and imperative paradigms, and supports formal specification through function preconditions, postconditions and loop invariants. The language uses flow-sensitive typing also known as "flow typing."

References