Unity build

Last updated

In software engineering, a unity build (also known as unified build or jumbo build) is a method used in C and C++ software development to speed up the compilation of projects by combining multiple translation units into a single one, usually achieved by using include directives to bundle multiple source files into one larger file.

Contents

Implementation

If two different translation units file_a.cc

#include"header.h"// content of source file A ...

and file_b.cc

#include"header.h"// content of source file B ...

in the same project both include the header header.h, that header will be processed twice by the compiler chain, once for each build task. If the two translation units are merged into a single source file jumbo_file.cc

#include"file_a.cc"#include"file_b.cc"

then header.h will be processed only once (thanks to include guards) when compiling jumbo_file.cc. [1]

Effects

The main benefit of unity builds is a reduction of duplicated effort in parsing and compiling the content of headers that are included in more than one source file. The content of headers usually accounts for the majority of code in a source file after preprocessing. Unity builds also mitigate the overhead due to having a large number of small source files by reducing the number of object files created and processed by the compilation chain, and allows interprocedural analysis and optimisation across the files that form the unity build task (similar to the effects of link-time optimisation). They make it also easier to detect violations of the One Definition Rule, because if a symbol is defined twice in different source files in the same unity build, the compiler will be able to identify the redefinition and emit a warning or error.

One of the drawbacks of unity builds is a larger memory footprint due to larger translation units. Larger translation units can also negatively affect parallel builds, since a small number of large compile jobs is generally harder or impossible to schedule to saturate all available parallel computing resources effectively. Unity builds can also deny part of the benefits of incremental builds, that rely on rebuilding as little code as possible, i.e. only the translation units affected by changes since the last build.

Unity builds have also potentially dangerous effects on the semantics of programs. Some valid C++ constructs that rely on internal linkage may fail under a unity build, for instance clashes of static symbols and symbols defined in anonymous namespaces with the same identifier in different files. If different C++ files define different functions with the same name, the compiler may unexpectedly resolve the overloading by selecting the wrong function, in a way that was not possible when designing the software with the files as different translation units. Another adverse effect is the possible leakage of macro definitions across different source files. [2]

Build system support

Some build systems provide built-in support for automated unity builds, including Visual Studio, [3] Meson [4] and CMake. [5]

Related Research Articles

GNU Bison, commonly known as Bison, is a parser generator that is part of the GNU Project. Bison reads a specification in Bison syntax, warns about any parsing ambiguities, and generates a parser that reads sequences of tokens and decides whether the sequence conforms to the syntax specified by the grammar.

GNU Autoconf is a tool for producing configure scripts for building, installing, and packaging software on computer systems where a Bourne shell is available.

<span class="mw-page-title-main">GNU Autotools</span> Suite of programming tools

The GNU Autotools, also known as the GNU Build System, is a suite of programming tools designed to assist in making source code packages portable to many Unix-like systems.

In software development, Make is a build automation tool that builds executable programs and libraries from source code by reading files called makefiles which specify how to derive the target program. Though integrated development environments and language-specific compiler features can also be used to manage a build process, Make remains widely used, especially in Unix and Unix-like operating systems.

In the C and C++ programming languages, an inline function is one qualified with the keyword inline; this serves two purposes:

  1. It serves as a compiler directive that suggests that the compiler substitute the body of the function inline by performing inline expansion, i.e. by inserting the function code at the address of each function call, thereby saving the overhead of a function call. In this respect it is analogous to the register storage class specifier, which similarly provides an optimization hint.
  2. The second purpose of inline is to change linkage behavior; the details of this are complicated. This is necessary due to the C/C++ separate compilation + linkage model, specifically because the definition (body) of the function must be duplicated in all translation units where it is used, to allow inlining during compiling, which, if the function has external linkage, causes a collision during linking. C and C++ resolve this in different ways.
<span class="mw-page-title-main">SCons</span>

SCons is a computer software build tool that automatically analyzes source code file dependencies and operating system adaptation requirements from a software project description and generates final binary executables for installation on the target operating system platform. Its function is analogous to the traditional GNU build system based on the make utility and the autoconf tools.

pkg-config is a computer program that defines and supports a unified interface for querying installed libraries for the purpose of compiling software that depends on them. It allows programmers and installation scripts to work without explicit knowledge of detailed library path information. pkg-config was originally designed for Linux, but it is now also available for BSD, Microsoft Windows, macOS, and Solaris.

In the C and C++ programming languages, an #include guard, sometimes called a macro guard, header guard or file guard, is a particular construct used to avoid the problem of double inclusion when dealing with the include directive.

In computer programming, a precompiled header (PCH) is a header file that is compiled into an intermediate form that is faster to process for the compiler. Usage of precompiled headers may significantly reduce compilation time, especially when applied to large header files, header files that include many other header files, or header files that are included in many translation units.

In the C and C++ programming languages, #pragma once is a non-standard but widely supported preprocessor directive designed to cause the current header file to be included only once in a single compilation. Thus, #pragma once serves the same purpose as include guards, but with several advantages, including less code, avoidance of name clashes, and sometimes improvement in compilation speed. On the other hand, #pragma once is not necessarily available in all compilers and its implementation is tricky and might not always be reliable.

<span class="mw-page-title-main">CMake</span> Cross-platform, compiler-independent build system generator

In software development, CMake is cross-platform free and open-source software for build automation, testing, packaging and installation of software by using a compiler-independent method. CMake is not a build system itself; it generates another system's build files. It supports directory hierarchies and applications that depend on multiple libraries. It can invoke native build environments such as Make, Qt Creator, Ninja, Android Studio, Apple's Xcode, and Microsoft Visual Studio. It has minimal dependencies, requiring only a C++ compiler on its own build system.

A weak symbol denotes a specially annotated symbol during linking of Executable and Linkable Format (ELF) object files. By default, without any annotation, a symbol in an object file is strong. During linking, a strong symbol can override a weak symbol of the same name. In contrast, in the presence of two strong symbols by the same name, the linker resolves the symbol in favor of the first one found. This behavior allows an executable to override standard library functions, such as malloc(3). When linking a binary executable, a weakly declared symbol does not need a definition. In comparison, a declared strong symbol without a definition triggers an undefined symbol link error.

Single compilation unit (SCU) is a computer programming technique for the C and C++ languages, which reduces compilation time for programs spanning multiple files. Specifically, it allows the compiler to keep data from shared header files, definitions and templates, so that it need not recreate them for each file. It is an instance of program optimization. The technique can be applied to an entire program or to some subset of source files; when applied to an entire program, it is also known as a unity build.

Premake is an open-source software development utility for automatically building configuration from source code.

In C and C++ programming language terminology, a translation unit is the ultimate input to a C or C++ compiler from which an object file is generated. A translation unit roughly consists of a source file after it has been processed by the C preprocessor, meaning that header files listed in #include directives are literally included, sections of code within #ifndef may be included, and macros have been expanded.

Many programming languages and other computer files have a directive, often called include, import, or copy, that causes the contents of the specified file to be inserted into the original file. These included files are called header files or copybooks. They are often used to define the physical layout of program data, pieces of procedural code, and/or forward declarations while promoting encapsulation and the reuse of code or data.

<span class="mw-page-title-main">Meson (software)</span> Build automation system

Meson is a software tool for automating the building (compiling) of software. The overall goal for Meson is to promote programmer productivity. Meson is free and open-source software written in Python, under the Apache License 2.0.

<span class="mw-page-title-main">Bazel (software)</span> Software tool that automates software builds and tests

Bazel is a free and open-source software tool used for the automation of building and testing software. Google uses the build tool Blaze internally and released an open-sourced port of the Blaze tool as Bazel, named as an anagram of Blaze. Bazel was first released in March 2015 and was in beta status by September 2015. Version 1.0 was released in October 2019.

<span class="mw-page-title-main">Ninja (build system)</span> Free build automation software

Ninja is a small build system developed by Evan Martin, a Google employee. Ninja has a focus on speed and it differs from other build systems in two major respects: it is designed to have its input files generated by a higher-level build system, and it is designed to run builds as fast as possible.

References

  1. Kubota et al. (2019)
  2. Kirilov, Viktor (7 July 2018). "A guide to unity builds". Archived from the original on 2020-11-12.
  3. Olga Arkhipova (2 July 2018). "Support for Unity (Jumbo) Files in Visual Studio 2017 15.8 (Experimental)". Microsoft.
  4. "Unity builds".
  5. "UNITY_BUILD - CMake 3.17.0 Documentation".