Virtual method table

Last updated

In computer programming, a virtual method table (VMT), virtual function table, virtual call table, dispatch table, vtable, or vftable is a mechanism used in a programming language to support dynamic dispatch (or run-time method binding).

Contents

Whenever a class defines a virtual function (or method), most compilers add a hidden member variable to the class that points to an array of pointers to (virtual) functions called the virtual method table. These pointers are used at runtime to invoke the appropriate function implementations, because at compile time it may not yet be known if the base function is to be called or a derived one implemented by a class that inherits from the base class.

There are many different ways to implement such dynamic dispatch, but use of virtual method tables is especially common among C++ and related languages (such as D and C#). Languages that separate the programmatic interface of objects from the implementation, like Visual Basic and Delphi, also tend to use this approach, because it allows objects to use a different implementation simply by using a different set of method pointers. The method allows creation of external libraries, where other techniques perhaps may not. [1]

Suppose a program contains three classes in an inheritance hierarchy: a superclass, Cat, and two subclasses, HouseCat and Lion. Class Cat defines a virtual function named speak, so its subclasses may provide an appropriate implementation (e.g. either meow or roar). When the program calls the speak function on a Cat reference (which can refer to an instance of Cat, or an instance of HouseCat or Lion), the code must be able to determine which implementation of the function the call should be dispatched to. This depends on the actual class of the object, not the class of the reference to it (Cat). The class cannot generally be determined statically (that is, at compile time), so neither can the compiler decide which function to call at that time. The call must be dispatched to the right function dynamically (that is, at run time) instead.

Implementation

An object's virtual method table will contain the addresses of the object's dynamically bound methods. Method calls are performed by fetching the method's address from the object's virtual method table. The virtual method table is the same for all objects belonging to the same class, and is therefore typically shared between them. Objects belonging to type-compatible classes (for example siblings in an inheritance hierarchy) will have virtual method tables with the same layout: the address of a given method will appear at the same offset for all type-compatible classes. Thus, fetching the method's address from a given offset into a virtual method table will get the method corresponding to the object's actual class. [2]

The C++ standards do not mandate exactly how dynamic dispatch must be implemented, but compilers generally use minor variations on the same basic model.

Typically, the compiler creates a separate virtual method table for each class. When an object is created, a pointer to this table, called the virtual table pointer, vpointer or VPTR, is added as a hidden member of this object. As such, the compiler must also generate "hidden" code in the constructors of each class to initialize a new object's virtual table pointer to the address of its class's virtual method table.

Many compilers place the virtual table pointer as the last member of the object; other compilers place it as the first; portable source code works either way. [3] For example, g++ previously placed the pointer at the end of the object. [4]

Example

Consider the following class declarations in C++ syntax:

classB1{public:virtual~B1(){}voidfnonvirtual(){}virtualvoidf1(){}intint_in_b1;};classB2{public:virtual~B2(){}virtualvoidf2(){}intint_in_b2;};

used to derive the following class:

classD:publicB1,publicB2{public:voidd(){}voidf2()override{}intint_in_d;};

and the following piece of C++ code:

B2*b2=newB2();D*d=newD();

g++ 3.4.6 from GCC produces the following 32-bit memory layout for the object b2: [nb 1]

b2:   +0: pointer to virtual method table of B2   +4: value of int_in_b2  virtual method table of B2:   +0: B2::f2()    

and the following memory layout for the object d:

d:   +0: pointer to virtual method table of D (for B1)   +4: value of int_in_b1   +8: pointer to virtual method table of D (for B2)  +12: value of int_in_b2  +16: value of int_in_d  Total size: 20 Bytes.  virtual method table of D (for B1):   +0: B1::f1()  // B1::f1() is not overridden  virtual method table of D (for B2):   +0: D::f2()   // B2::f2() is overridden by D::f2()                 // The location of B2::f2 is not in the virtual method table for D 

Note that those functions not carrying the keyword virtual in their declaration (such as fnonvirtual() and d()) do not generally appear in the virtual method table. There are exceptions for special cases as posed by the default constructor.

Also note the virtual destructors in the base classes, B1 and B2. They are necessary to ensure delete d can free up memory not just for D, but also for B1 and B2, if d is a pointer or reference to the types B1 or B2. They were excluded from the memory layouts to keep the example simple. [nb 2]

Overriding of the method f2() in class D is implemented by duplicating the virtual method table of B2 and replacing the pointer to B2::f2() with a pointer to D::f2().

Multiple inheritance and thunks

The g++ compiler implements the multiple inheritance of the classes B1 and B2 in class D using two virtual method tables, one for each base class. (There are other ways to implement multiple inheritance, but this is the most common.) This leads to the necessity for "pointer fixups", also called thunks, when casting.

Consider the following C++ code:

D*d=newD();B1*b1=d;B2*b2=d;

While d and b1 will point to the same memory location after execution of this code, b2 will point to the location d+8 (eight bytes beyond the memory location of d). Thus, b2 points to the region within d that "looks like" an instance of B2, i.e., has the same memory layout as an instance of B2.[ clarification needed ]

Invocation

A call to d->f1() is handled by dereferencing d's D::B1 vpointer, looking up the f1 entry in the virtual method table, and then dereferencing that pointer to call the code.

Single inheritance

In the case of single inheritance (or in a language with only single inheritance), if the vpointer is always the first element in d (as it is with many compilers), this reduces to the following pseudo-C++:

(*((*d)[0]))(d)

Where *d refers to the virtual method table of D and [0] refers to the first method in the virtual method table. The parameter d becomes the "this" pointer to the object.

Multiple inheritance

In the more general case, calling B1::f1() or D::f2() is more complicated:

(*(*(d[0]/*pointer to virtual method table of D (for B1)*/)[0]))(d)/* Call d->f1() */(*(*(d[8]/*pointer to virtual method table of D (for B2)*/)[0]))(d+8)/* Call d->f2() */

The call to d->f1() passes a B1 pointer as a parameter. The call to d->f2() passes a B2 pointer as a parameter. This second call requires a fixup to produce the correct pointer. The location of B2::f2 is not in the virtual method table for D.

By comparison, a call to d->fnonvirtual() is much simpler:

(*B1::fnonvirtual)(d)

Efficiency

A virtual call requires at least an extra indexed dereference and sometimes a "fixup" addition, compared to a non-virtual call, which is simply a jump to a compiled-in pointer. Therefore, calling virtual functions is inherently slower than calling non-virtual functions. An experiment done in 1996 indicates that approximately 6–13% of execution time is spent simply dispatching to the correct function, though the overhead can be as high as 50%. [5] The cost of virtual functions may not be so high on modern CPU architectures due to much larger caches and better branch prediction.

Furthermore, in environments where JIT compilation is not in use, virtual function calls usually cannot be inlined. In certain cases it may be possible for the compiler to perform a process known as devirtualization in which, for instance, the lookup and indirect call are replaced with a conditional execution of each inlined body, but such optimizations are not common.

To avoid this overhead, compilers usually avoid using virtual method tables whenever the call can be resolved at compile time.

Thus, the call to f1 above may not require a table lookup because the compiler may be able to tell that d can only hold a D at this point, and D does not override f1. Or the compiler (or optimizer) may be able to detect that there are no subclasses of B1 anywhere in the program that override f1. The call to B1::f1 or B2::f2 will probably not require a table lookup because the implementation is specified explicitly (although it does still require the 'this'-pointer fixup).

Comparison with alternatives

The virtual method table is generally a good performance trade-off to achieve dynamic dispatch, but there are alternatives, such as binary tree dispatch, with higher performance in some typical cases, but different trade-offs. [1] [6]

However, virtual method tables only allow for single dispatch on the special "this" parameter, in contrast to multiple dispatch (as in CLOS, Dylan, or Julia), where the types of all parameters can be taken into account in dispatching.

Virtual method tables also only work if dispatching is constrained to a known set of methods, so they can be placed in a simple array built at compile time, in contrast to duck typing languages (such as Smalltalk, Python or JavaScript).

Languages that provide either or both of these features often dispatch by looking up a string in a hash table, or some other equivalent method. There are a variety of techniques to make this faster (e.g., interning/tokenizing method names, caching lookups, just-in-time compilation).

See also

Notes

  1. G++'s -fdump-class-hierarchy (starting with version 8: -fdump-lang-class) argument can be used to dump virtual method tables for manual inspection. For AIX VisualAge XlC compiler, use -qdump_class_hierarchy to dump class hierarchy and virtual function table layout.
  2. "C++ - why there are two virtual destructor in the virtual table and where is address of the non-virtual function (gcc4.6.3)".

Related Research Articles

Multiple dispatch or multimethods is a feature of some programming languages in which a function or method can be dynamically dispatched based on the run-time (dynamic) type or, in the more general case, some other attribute of more than one of its arguments. This is a generalization of single-dispatch polymorphism where a function or method call is dynamically dispatched based on the derived type of the object on which the method has been called. Multiple dispatch routes the dynamic dispatch to the implementing function or method using the combined characteristics of one or more arguments.

This is a list of terms found in object-oriented programming.

A method in object-oriented programming (OOP) is a procedure associated with an object, and generally also a message. An object consists of state data and behavior; these compose an interface, which specifies how the object may be used. A method is a behavior of an object parametrized by a user.

<span class="mw-page-title-main">D (programming language)</span> Multi-paradigm system programming language

D, also known as dlang, is a multi-paradigm system programming language created by Walter Bright at Digital Mars and released in 2001. Andrei Alexandrescu joined the design and development effort in 2007. Though it originated as a re-engineering of C++, D is a profoundly different language — features of D can be considered streamlined and expanded-upon ideas from C++, however D also draws inspiration from other high-level programming languages, notably Java, Python, Ruby, C#, and Eiffel.

In programming language theory and type theory, polymorphism is the provision of a single interface to entities of different types or the use of a single symbol to represent multiple different types. The concept is borrowed from a principle in biology where an organism or species can have many different forms or stages.

In computer science, a type signature or type annotation defines the inputs and outputs for a function, subroutine or method. A type signature includes the number, types, and order of the arguments contained by a function. A type signature is typically used during overload resolution for choosing the correct definition of a function to be called among many overloaded forms.

The fragile base class problem is a fundamental architectural problem of object-oriented programming systems where base classes (superclasses) are considered "fragile" because seemingly safe modifications to a base class, when inherited by the derived classes, may cause the derived classes to malfunction. The programmer cannot determine whether a base class change is safe simply by examining in isolation the methods of the base class.

In object-oriented programming, in languages such as C++, and Object Pascal, a virtual function or virtual method is an inheritable and overridable function or method for which dynamic dispatch is facilitated. This concept is an important part of the (runtime) polymorphism portion of object-oriented programming (OOP). In short, a virtual function defines a target function to be executed, but the target might not be known at compile time.

A function pointer, also called a subroutine pointer or procedure pointer, is a pointer referencing executable code, rather than data. Dereferencing the function pointer yields the referenced function, which can be invoked and passed arguments just as in a normal function call. Such an invocation is also known as an "indirect" call, because the function is being invoked indirectly through a variable instead of directly through a fixed identifier or address.

In computer programming, run-time type information or run-time type identification (RTTI) is a feature of some programming languages that exposes information about an object's data type at runtime. Run-time type information may be available for all types or only to types that explicitly have it. Run-time type information is a specialization of a more general concept called type introspection.

In computer programming, a thunk is a subroutine used to inject a calculation into another subroutine. Thunks are primarily used to delay a calculation until its result is needed, or to insert operations at the beginning or end of the other subroutine. They have many other applications in compiler code generation and modular programming.

In computer science, dynamic dispatch is the process of selecting which implementation of a polymorphic operation to call at run time. It is commonly employed in, and considered a prime characteristic of, object-oriented programming (OOP) languages and systems.

<span class="mw-page-title-main">Method overriding</span> Language feature in object-oriented programming

Method overriding, in object-oriented programming, is a language feature that allows a subclass or child class to provide a specific implementation of a method that is already provided by one of its superclasses or parent classes. In addition to providing data-driven algorithm-determined parameters across virtual network interfaces, it also allows for a specific type of polymorphism (subtyping). The implementation in the subclass overrides (replaces) the implementation in the superclass by providing a method that has same name, same parameters or signature, and same return type as the method in the parent class. The version of a method that is executed will be determined by the object that is used to invoke it. If an object of a parent class is used to invoke the method, then the version in the parent class will be executed, but if an object of the subclass is used to invoke the method, then the version in the child class will be executed. This helps in preventing problems associated with differential relay analytics which would otherwise rely on a framework in which method overriding might be obviated. Some languages allow a programmer to prevent a method from being overridden.

<span class="mw-page-title-main">Virtual inheritance</span> Technique in the C++ language

Virtual inheritance is a C++ technique that ensures only one copy of a base class's member variables are inherited by grandchild derived classes. Without virtual inheritance, if two classes B and C inherit from a class A, and a class D inherits from both B and C, then D will contain two copies of A's member variables: one via B, and one via C. These will be accessible independently, using scope resolution.

In object-oriented programming, inheritance is the mechanism of basing an object or class upon another object or class, retaining similar implementation. Also defined as deriving new classes from existing ones such as super class or base class and then forming them into a hierarchy of classes. In most class-based object-oriented languages like C++, an object created through inheritance, a "child object", acquires all the properties and behaviors of the "parent object", with the exception of: constructors, destructors, overloaded operators and friend functions of the base class. Inheritance allows programmers to create classes that are built upon existing classes, to specify a new implementation while maintaining the same behaviors, to reuse code and to independently extend original software via public classes and interfaces. The relationships of objects or classes through inheritance give rise to a directed acyclic graph.

In computer programming, the term hooking covers a range of techniques used to alter or augment the behaviour of an operating system, of applications, or of other software components by intercepting function calls or messages or events passed between software components. Code that handles such intercepted function calls, events or messages is called a hook.

A class in C++ is a user-defined type or data structure declared with keyword class that has data and functions as its members whose access is governed by the three access specifiers private, protected or public. By default access to members of a C++ class is private. The private members are not accessible outside the class; they can be accessed only through methods of the class. The public members form an interface to the class and are accessible outside the class.

The curiously recurring template pattern (CRTP) is an idiom, originally in C++, in which a class X derives from a class template instantiation using X itself as a template argument. More generally it is known as F-bound polymorphism, and it is a form of F-bounded quantification.

Dynamic loading is a mechanism by which a computer program can, at run time, load a library into memory, retrieve the addresses of functions and variables contained in the library, execute those functions or access those variables, and unload the library from memory. It is one of the 3 mechanisms by which a computer program can use some other software; the other two are static linking and dynamic linking. Unlike static linking and dynamic linking, dynamic loading allows a computer program to start up in the absence of these libraries, to discover available libraries, and to potentially gain additional functionality.

Objective-C is a high-level general-purpose, object-oriented programming language that adds Smalltalk-style messaging to the C programming language. Originally developed by Brad Cox and Tom Love in the early 1980s, it was selected by NeXT for its NeXTSTEP operating system. Due to Apple macOS’s direct lineage from NeXTSTEP, Objective-C was the standard programming language used, supported, and promoted by Apple for developing macOS and iOS applications until the introduction of the Swift programming language in 2014.

References

  1. 1 2 Zendra, Olivier; Colnet, Dominique; Collin, Suzanne (1997). Efficient Dynamic Dispatch without Virtual Function Tables: The SmallEiffel Compiler -- 12th Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages and Applications (OOPSLA'97), ACM SIGPLAN, Oct 1997, Atlanta, United States. pp.125-141. inria-00565627. Centre de Recherche en Informatique de Nancy Campus Scientifique, Bâtiment LORIA. p. 16.
  2. Ellis & Stroustrup 1990, pp. 227–232
  3. Danny Kalev. "C++ Reference Guide: The Object Model II". 2003. Heading "Inheritance and Polymorphism" and "Multiple Inheritance".
  4. "C++ ABI Closed Issues". Archived from the original on 25 July 2011. Retrieved 17 June 2011.{{cite web}}: CS1 maint: bot: original URL status unknown (link)
  5. Driesen, Karel and Hölzle, Urs, "The Direct Cost of Virtual Function Calls in C++", OOPSLA 1996
  6. Zendra, Olivier and Driesen, Karel, "Stress-testing Control Structures for Dynamic Dispatch in Java", pp. 105–118, Proceedings of the USENIX 2nd Java Virtual Machine Research and Technology Symposium, 2002 (JVM '02)