Single compilation unit

Last updated

Single compilation unit (SCU) is a computer programming technique for the C and C++ languages, which reduces compilation time for programs spanning multiple files. Specifically, it allows the compiler to keep data from shared header files, definitions and templates, so that it need not recreate them for each file. It is an instance of program optimization. The technique can be applied to an entire program or to some subset of source files; when applied to an entire program, it is also known as a unity build. [1]

Contents

Purpose

In the C/C++ compilation model (formally "translation environment"), individual .c/.cpp source files are preprocessed into translation units, which are then compiled separately by the compiler into multiple object (.o or .obj) files. These object files can then be linked together to create a single executable file or library. However, this leads to multiple passes being performed on common header files, and with C++, multiple template instantiations of the same templates in different translation units.

The single compilation unit technique uses pre-processor directives to "glue" different translation units together at compile time rather than at link time. This reduces the overall build time, due to eliminating the duplication, but increases the incremental build time (the time required after making a change to any single source file that is included in the SCU), due to requiring a full rebuild of the entire unit if any single input file changes. [2] Therefore, this technique is appropriate for a set of infrequently modified source files with significant overlap (many or expensive common headers or templates), or source files that frequently require recompilation together, such as due to all including a common header or template that changes frequently. [3]

Another disadvantage of SCU is that it is serial, compiling all included source files in sequence in one process, and thus cannot be parallelized, as can be done in separate compilation (via distcc or similar programs). Thus SCU requires explicit partitioning (manual partitioning or "sharding" into multiple units) to parallelize compilation.

SCU also allows an optimizing compiler to perform interprocedural optimization without requiring link-time optimization, therefore allowing optimizations such as inlining, and helps avoiding implicit code bloat due to exceptions, side effects, and register allocation. These optimizations are often not possible in many compilers, due to independent compilation, where optimization happens separately in each translation unit during compilation, but the "dumb linker" simply links object files, without performing any optimizations itself, and thus interprocedural optimization between translation units is not possible.

Example

For example, if you have the source files foo.cpp and bar.cpp, they can be placed in a Single Compilation Unit as follows:

#include"foo.cpp"#include"bar.cpp"

Suppose foo.cpp and bar.cpp are:

//foo.cpp#include<iostream> // A large, standard header#include"bar.h"    // Declaration of function 'bar'intmain()// Definition of function 'main'{bar();}
//bar.cpp#include<iostream> // The same large, standard headervoidbar()// Definition of function 'bar'{...}

Now the standard header file (iostream) is compiled only once, and function bar may be inlined into function main, despite being from another module.

See also

Related Research Articles

In computing, a compiler is a computer program that translates computer code written in one programming language into another language. The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a lower level language to create an executable program.

In computing, an optimizing compiler is a compiler that tries to minimize or maximize some attributes of an executable computer program. Common requirements are to minimize a program's execution time, memory footprint, storage size, and power consumption.

Java and C++ are two prominent object-oriented programming languages. By many language popularity metrics, the two languages have dominated object-oriented and high-performance software development for much of the 21st century, and are often directly compared and contrasted. Java appeared about 10 years later and its syntax was based on C/C++.

Interpreter (computing) Program that executes source code without a separate compilation step

In computer science, an interpreter is a computer program that directly executes instructions written in a programming or scripting language, without requiring them previously to have been compiled into a machine language program. An interpreter generally uses one of the following strategies for program execution:

  1. Parse the source code and perform its behavior directly;
  2. Translate source code into some efficient intermediate representation or object code and immediately execute that;
  3. Explicitly execute stored precompiled bytecode made by a compiler and matched with the interpreter Virtual Machine.

Template metaprogramming (TMP) is a metaprogramming technique in which templates are used by a compiler to generate temporary source code, which is merged by the compiler with the rest of the source code and then compiled. The output of these templates can include compile-time constants, data structures, and complete functions. The use of templates can be thought of as compile-time polymorphism. The technique is used by a number of languages, the best-known being C++, but also Curl, D, Nim, and XL.

C preprocessor Macro preprocessor used in the C, C++, and Objective-C programming languages

The C preprocessor is the macro preprocessor for the C, Objective-C and C++ computer programming languages. The preprocessor provides the ability for the inclusion of header files, macro expansions, conditional compilation, and line control.

In the C and C++ programming languages, an inline function is one qualified with the keyword inline; this serves two purposes:

  1. It serves as a compiler directive that suggests that the compiler substitute the body of the function inline by performing inline expansion, i.e. by inserting the function code at the address of each function call, thereby saving the overhead of a function call. In this respect it is analogous to the register storage class specifier, which similarly provides an optimization hint.
  2. The second purpose of inline is to change linkage behavior; the details of this are complicated. This is necessary due to the C/C++ separate compilation + linkage model, specifically because the definition (body) of the function must be duplicated in all translation units where it is used, to allow inlining during compiling, which, if the function has external linkage, causes a collision during linking. C and C++ resolve this in different ways.

In mathematics and in computer programming, a variadic function is a function of indefinite arity, i.e., one which accepts a variable number of arguments. Support for variadic functions differs widely among programming languages.

The One Definition Rule (ODR) is an important rule of the C++ programming language that prescribes that objects and non-inline functions cannot have more than one definition in the entire program and template and types cannot have more than one definition by translation unit. It is defined in the ISO C++ Standard 2003, at section 3.2.

In computer programming, a precompiled header (PCH) is a header file that is compiled into an intermediate form that is faster to process for the compiler. Usage of precompiled headers may significantly reduce compilation time, especially when applied to large header files, header files that include many other header files, or header files that are included in many translation units.

In the C and C++ programming languages, pragma once is a non-standard but widely supported preprocessor directive designed to cause the current source file to be included only once in a single compilation. Thus, #pragma once serves the same purpose as include guards, but with several advantages, including: less code, avoidance of name clashes, and sometimes improvement in compilation speed. On the other hand, #pragma once is not necessarily available in all compilers and its implementation is tricky and might not always be reliable.

The C and C++ programming languages are closely related but have many significant differences. C++ began as a fork of an early, pre-standardized C, and was designed to be mostly source-and-link compatible with C compilers of the time. Due to this, development tools for the two languages are often integrated into a single product, with the programmer able to specify C or C++ as their source language.

Interprocedural optimization (IPO) is a collection of compiler techniques used in computer programming to improve performance in programs containing many frequently used functions of small or medium length. IPO differs from other compiler optimizations because it analyzes the entire program; other optimizations look at only a single function, or even a single block of code.

Haxe is an open source high-level cross-platform programming language and compiler that can produce applications and source code, for many different computing platforms from one code-base. It is free and open-source software, released under the MIT License. The compiler, written in OCaml, is released under the GNU General Public License (GPL) version 2.

Compilation error refers to a state when a compiler fails to compile a piece of computer program source code, either due to errors in the code, or, more unusually, due to errors in the compiler itself. A compilation error message often helps programmers debugging the source code. Although the definitions of compilation and interpretation can be vague, generally compilation errors only refer to static compilation and not dynamic compilation. However, dynamic compilation can still technically have compilation errors, although many programmers and sources may identify them as run-time errors. Most just-in-time compilers, such as the Javascript V8 engine, ambiguously refer to compilation errors as syntax errors since they check for them at run time.

Substitution failure is not an error (SFINAE) refers to a situation in C++ where an invalid substitution of template parameters is not in itself an error. David Vandevoorde first introduced the acronym SFINAE to describe related programming techniques.

In the context of the C or C++ programming languages, a library is called header-only if the full definitions of all macros, functions and classes comprising the library are visible to the compiler in a header file form. Header-only libraries do not need to be separately compiled, packaged and installed in order to be used. All that is required is to point the compiler at the location of the headers, and then #include the header files into the application source. Another advantage is that the compiler's optimizer can do a much better job when all the library's source code is available.

In C and C++ programming language terminology, a translation unit is the ultimate input to a C or C++ compiler from which an object file is generated. A translation unit roughly consists of a source file after it has been processed by the C preprocessor, meaning that header files listed in #include directives are literally included, sections of code within #ifndef may be included, and macros have been expanded.

Many programming languages and other computer files have a directive, often called include, that causes the contents of the specified file to be inserted into the original file. These included files are called copybooks or header files. There are over 1000+ C library files and they are often used to define the physical layout of program data, pieces of procedural code, and/or forward declarations while promoting encapsulation and the reuse of code or data.

In software engineering, a unity build is a method used in C and C++ software development to speed up the compilation of projects by combining multiple translation units into a single one, usually achieved by using include directives to bundle multiple source files into one larger file.

References

  1. Developer, Unicorn (2017-12-25). "Speeding up the Build of C and C++ Projects". Medium. Retrieved 2022-03-16.
  2. Krajewski, Marek (2019-01-31). Hands-On High Performance Programming with Qt 5: Build cross-platform applications using concurrency, parallel programming, and memory management. Packt Publishing Ltd. ISBN   978-1-78953-330-9.
  3. Schach (1992-05-19). Practical Software Engineering. CRC Press. p. 183. ISBN   978-0-256-11454-6.