Expression templates

Last updated

Expression templates are a C++ template metaprogramming technique that builds structures representing a computation at compile time, where expressions are evaluated only as needed to produce efficient code for the entire computation. [1] Expression templates thus allow programmers to bypass the normal order of evaluation of the C++ language and achieve optimizations such as loop fusion.

Contents

Expression templates were invented independently by Todd Veldhuizen and David Vandevoorde; [2] [3] it was Veldhuizen who gave them their name. [3] They are a popular technique for the implementation of linear algebra software. [1]

Motivation and example

Consider a library representing vectors and operations on them. One common mathematical operation is to add two vectors u and v, element-wise, to produce a new vector. The obvious C++ implementation of this operation would be an overloaded operator+ that returns a new vector object:

/// @brief class representing a mathematical 3D vectorclassVec:publicstd::array<double,3>{public:usingstd::array<double,3>::array;// inherit constructor (C++11)// see https://en.cppreference.com/w/cpp/language/using_declaration};/// @brief sum 'u' and 'v' into a new instance of VecVecoperator+(Vecconst&u,Vecconst&v){Vecsum;for(size_ti=0;i<u.size();i++){sum[i]=u[i]+v[i];}returnsum;}

Users of this class can now write Vec x = a + b; where a and b are both instances of Vec.

A problem with this approach is that more complicated expressions such as Vec x = a + b + c are implemented inefficiently. The implementation first produces a temporary Vec to hold a + b, then produces another Vec with the elements of c added in. Even with return value optimization this will allocate memory at least twice and require two loops.

Delayed evaluation solves this problem, and can be implemented in C++ by letting operator+ return an object of an auxiliary type, say VecSum, that represents the unevaluated sum of two Vecs, or a vector with a VecSum, etc. Larger expressions then effectively build expression trees that are evaluated only when assigned to an actual Vec variable. But this requires traversing such trees to do the evaluation, which is in itself costly. [4]

Expression templates implement delayed evaluation using expression trees that only exist at compile time. Each assignment to a Vec, such as Vec x = a + b + c, generates a new Vec constructor if needed by template instantiation. This constructor operates on three Vec; it allocates the necessary memory and then performs the computation. Thus only one memory allocation is performed.

Example implementation of expression templates :

An example implementation of expression templates looks like the following. A base class VecExpression represents any vector-valued expression. It is templated on the actual expression type E to be implemented, per the curiously recurring template pattern. The existence of a base class like VecExpression is not strictly necessary for expression templates to work. It will merely serve as a function argument type to distinguish the expressions from other types (note the definition of a Vec constructor and operator+ below).

template<typenameE>classVecExpression{public:staticconstexprboolis_leaf=false;doubleoperator[](size_ti)const{// Delegation to the actual expression type. This avoids dynamic polymorphism (a.k.a. virtual functions in C++)returnstatic_cast<Econst&>(*this)[i];}size_tsize()const{returnstatic_cast<Econst&>(*this).size();}};

The Boolean is_leaf is there to tag VecExpressions that are "leafs", i.e. that actually contain data. The Vec class is a leaf that stores the coordinates of a fully evaluated vector expression, and becomes a subclass of VecExpression.

classVec:publicVecExpression<Vec>{std::array<double,3>elems;public:staticconstexprboolis_leaf=true;decltype(auto)operator[](size_ti)const{returnelems[i];}decltype(auto)&operator[](size_ti){returnelems[i];}size_tsize()const{returnelems.size();}// construct Vec using initializer list Vec(std::initializer_list<double>init){std::copy(init.begin(),init.end(),elems.begin());}// A Vec can be constructed from any VecExpression, forcing its evaluation.template<typenameE>Vec(VecExpression<E>const&expr){for(size_ti=0;i!=expr.size();++i){elems[i]=expr[i];}}};

The sum of two Vecs is represented by a new type, VecSum, that is templated on the types of the left- and right-hand sides of the sum so that it can be applied to arbitrary pairs of Vec expressions. An overloaded operator+ serves as syntactic sugar for the VecSum constructor. A subtlety intervenes in this case: in order to reference the original data when summing two VecExpressions, VecSum needs to store a const reference to each VecExpression if it is a leaf, otherwise it is a temporary object that needs to be copied to be properly saved.

template<typenameE1,typenameE2>classVecSum:publicVecExpression<VecSum<E1,E2>>{// cref if leaf, copy otherwisetypenamestd::conditional<E1::is_leaf,constE1&,constE1>::type_u;typenamestd::conditional<E2::is_leaf,constE2&,constE2>::type_v;public:staticconstexprboolis_leaf=false;VecSum(E1const&u,E2const&v):_u(u),_v(v){assert(u.size()==v.size());}decltype(auto)operator[](size_ti)const{return_u[i]+_v[i];}size_tsize()const{return_v.size();}};template<typenameE1,typenameE2>VecSum<E1,E2>operator+(VecExpression<E1>const&u,VecExpression<E2>const&v){returnVecSum<E1,E2>(*static_cast<constE1*>(&u),*static_cast<constE2*>(&v));}

With the above definitions, the expression a + b + c is of type

VecSum<VecSum<Vec,Vec>,Vec>

so Vec x = a + b + c invokes the templated Vec constructor Vec(VecExpression<E> const& expr) with its template argument E being this type (meaning VecSum<VecSum<Vec, Vec>, Vec>). Inside this constructor, the loop body

elems[i]=expr[i];

is effectively expanded (following the recursive definitions of operator+ and operator[]on this type) to

elems[i]=a.elems[i]+b.elems[i]+c.elems[i];

with no temporary Vec objects needed and only one pass through each memory block.

Basic Usage :

intmain(){Vecv0={23.4,12.5,144.56};Vecv1={67.12,34.8,90.34};Vecv2={34.90,111.9,45.12};// Following assignment will call the ctor of Vec which accept type of // `VecExpression<E> const&`. Then expand the loop body to // a.elems[i] + b.elems[i] + c.elems[i]Vecsum_of_vec_type=v0+v1+v2;for(size_ti=0;i<sum_of_vec_type.size();++i)std::cout<<sum_of_vec_type[i]<<std::endl;// To avoid creating any extra storage, other than v0, v1, v2// one can do the following (Tested with C++11 on GCC 5.3.0)autosum=v0+v1+v2;for(size_ti=0;i<sum.size();++i)std::cout<<sum[i]<<std::endl;// Observe that in this case typeid(sum) will be VecSum<VecSum<Vec, Vec>, Vec>// and this chaining of operations can go on.}

Applications

Expression templates have been found especially useful by the authors of libraries for linear algebra, that is, for dealing with vectors and matrices of numbers. Among libraries employing expression template are Dlib, Armadillo, Blaze, [5] Blitz++, [6] Boost uBLAS, [7] Eigen, [8] POOMA, [9] Stan Math Library, [10] and xtensor. [11] Expression templates can also accelerate C++ automatic differentiation implementations, [12] as demonstrated in the Adept library.

Outside of vector math, the Spirit parser framework uses expression templates to represent formal grammars and compile these into parsers.

See also

Related Research Articles

Templates are a feature of the C++ programming language that allows functions and classes to operate with generic types. This allows a function or class declaration to reference via a generic variable another different class without creating full declaration for each of these different classes.

Generic programming is a style of computer programming in which algorithms are written in terms of data types to-be-specified-later that are then instantiated when needed for specific types provided as parameters. This approach, pioneered in the programming language ML in 1973, permits writing common functions or data types that differ only in the set of types on which they operate when used, thus reducing duplicate code.

In object-oriented programming, the iterator pattern is a design pattern in which an iterator is used to traverse a container and access the container's elements. The iterator pattern decouples algorithms from containers; in some cases, algorithms are necessarily container-specific and thus cannot be decoupled.

Template metaprogramming (TMP) is a metaprogramming technique in which templates are used by a compiler to generate temporary source code, which is merged by the compiler with the rest of the source code and then compiled. The output of these templates can include compile-time constants, data structures, and complete functions. The use of templates can be thought of as compile-time polymorphism. The technique is used by a number of languages, the best-known being C++, but also Curl, D, Nim, and XL.

In mathematics and computer science, a higher-order function (HOF) is a function that does at least one of the following:

In computer programming, rank with no further specifications is usually a synonym for "number of dimensions"; thus, a two-dimensional array has rank two, a three-dimensional array has rank three and so on. Strictly, no formal definition can be provided which applies to every programming language, since each of them has its own concepts, semantics and terminology; the term may not even be applicable or, to the contrary, applied with a very specific meaning in the context of a given language.

In computer science, a tagged union, also called a variant, variant record, choice type, discriminated union, disjoint union, sum type, or coproduct, is a data structure used to hold a value that could take on several different, but fixed, types. Only one of the types can be in use at any one time, and a tag field explicitly indicates which type is in use. It can be thought of as a type that has several "cases", each of which should be handled correctly when that type is manipulated. This is critical in defining recursive datatypes, in which some component of a value may have the same type as that value, for example in defining a type for representing trees, where it is necessary to distinguish multi-node subtrees and leaves. Like ordinary unions, tagged unions can save storage by overlapping storage areas for each type, since only one is in use at a time.

In computer programming, the ternary conditional operator is a ternary operator that is part of the syntax for basic conditional expressions in several programming languages. It is commonly referred to as the conditional operator, conditional expression, ternary if, or inline if. An expression if a then b else c or a ? b : c evaluates to b if the value of a is true, and otherwise to c. One can read it aloud as "if a then b otherwise c". The form a ? b : c is the most common, but alternative syntax do exist; for example, Raku uses the syntax a ?? b !! c to avoid confusion with the infix operators ? and !, whereas in Visual Basic .NET, it instead takes the form If(a, b, c).

typedef is a reserved keyword in the programming languages C, C++, and Objective-C. It is used to create an additional name (alias) for another data type, but does not create a new type, except in the obscure case of a qualified typedef of an array type where the typedef qualifiers are transferred to the array element type. As such, it is often used to simplify the syntax of declaring complex data structures consisting of struct and union types, although it is also commonly used to provide specific descriptive type names for integer data types of varying sizes.

A class in C++ is a user-defined type or data structure declared with any of the keywords class, struct or union that has data and functions as its members whose access is governed by the three access specifiers private, protected or public. By default access to members of a C++ class declared with the keyword class is private. The private members are not accessible outside the class; they can be accessed only through member functions of the class. The public members form an interface to the class and are accessible outside the class.

sizeof is a unary operator in the programming languages C and C++. It generates the storage size of an expression or a data type, measured in the number of char-sized units. Consequently, the construct sizeof (char) is guaranteed to be 1. The actual number of bits of type char is specified by the preprocessor macro CHAR_BIT, defined in the standard include file limits.h. On most modern computing platforms this is eight bits. The result of sizeof has an unsigned integer type that is usually denoted by size_t.

The curiously recurring template pattern (CRTP) is an idiom, originally in C++, in which a class X derives from a class template instantiation using X itself as a template argument. More generally it is known as F-bound polymorphism, and it is a form of F-bounded quantification.

C++11 is a version of a joint technical standard, ISO/IEC 14882, by the International Organization for Standardization (ISO) and International Electrotechnical Commission (IEC), for the C++ programming language. C++11 replaced the prior version of the C++ standard, named C++03, and was later replaced by C++14. The name follows the tradition of naming language versions by the publication year of the specification, though it was formerly named C++0x because it was expected to be published before 2010.

In computing, compile-time function execution is the ability of a compiler, that would normally compile a function to machine code and execute it at run time, to execute the function at compile time. This is possible if the arguments to the function are known at compile time, and the function does not make any reference to or attempt to modify any global state.

In the C++ programming language, placement syntax allows programmers to explicitly specify the memory management of individual objects — i.e. their "placement" in memory. Normally, when an object is created dynamically, an allocation function is invoked in such a way that it will both allocate memory for the object, and initialize the object within the newly allocated memory. The placement syntax allows the programmer to supply additional arguments to the allocation function. A common use is to supply a pointer to a suitable region of storage where the object can be initialized, thus separating memory allocation from object construction.

In computer programming, variadic templates are templates that take a variable number of arguments.

In C++ computer programming, allocators are a component of the C++ Standard Library. The standard library provides several data structures, such as list and set, commonly referred to as containers. A common trait among these containers is their ability to change size during the execution of the program. To achieve this, some form of dynamic memory allocation is usually required. Allocators handle all the requests for allocation and deallocation of memory for a given container. The C++ Standard Library provides general-purpose allocators that are used by default, however, custom allocators may also be supplied by the programmer.

Concepts are an extension to the templates feature provided by the C++ programming language. Concepts are named Boolean predicates on template parameters, evaluated at compile time. A concept may be associated with a template, in which case it serves as a constraint: it limits the set of arguments that are accepted as template parameters.

C++14 is a version of the ISO/IEC 14882 standard for the C++ programming language. It is intended to be a small extension over C++11, featuring mainly bug fixes and small improvements, and was replaced by C++17. Its approval was announced on August 18, 2014. C++14 was published as ISO/IEC 14882:2014 in December 2014.

The Möller–Trumbore ray-triangle intersection algorithm, named after its inventors Tomas Möller and Ben Trumbore, is a fast method for calculating the intersection of a ray and a triangle in three dimensions without needing precomputation of the plane equation of the plane containing the triangle. Among other uses, it can be used in computer graphics to implement ray tracing computations involving triangle meshes.

References

  1. 1 2 Matsuzaki, Kiminori; Emoto, Kento (2009). Implementing fusion-equipped parallel skeletons by expression templates. Proc. Int'l Symp. on Implementation and Application of Functional Languages. pp. 72–89.
  2. Vandevoorde, David; Josuttis, Nicolai (2002). C++ Templates: The Complete Guide. Addison Wesley. ISBN   0-201-73484-2.
  3. 1 2 Veldhuizen, Todd (1995). "Expression Templates". C++ Report. 7 (5): 26–31. Archived from the original on 10 February 2005.
  4. Abrahams, David; Gurtovoy, Aleksey (2004). C++ Template Metaprogramming: Concepts, Tools, and Techniques from Boost and Beyond. Pearson Education. ISBN   9780321623911.
  5. Bitbucket
  6. "Blitz++ User's Guide" (PDF). Retrieved December 12, 2015.
  7. "Boost Basic Linear Algebra Library" . Retrieved October 25, 2015.
  8. Guennebaud, Gaël (2013). Eigen: A C++ linear algebra library (PDF). Eurographics/CGLibs.
  9. Veldhuizen, Todd (2000). Just when you thought your little language was safe: "Expression Templates" in Java. Int'l Symp. Generative and Component-Based Software Engineering. CiteSeerX   10.1.1.22.6984 .
  10. "Stan documentation" . Retrieved April 27, 2016.
  11. "Multi-dimensional arrays with broadcasting and lazy computing" . Retrieved September 18, 2017.
  12. Hogan, Robin J. (2014). "Fast reverse-mode automatic differentiation using expression templates in C++" (PDF). ACM Trans. Math. Softw. 40 (4): 26:1–26:16. doi:10.1145/2560359. S2CID   9047237.