Precompiled header

Last updated

In computer programming, a precompiled header (PCH) is a (C or C++) header file that is compiled into an intermediate form that is faster to process for the compiler. Usage of precompiled headers may significantly reduce compilation time, especially when applied to large header files, header files that include many other header files, or header files that are included in many translation units.

Contents

Rationale

In the C and C++ programming languages, a header file is a file whose text may be automatically included in another source file by the C preprocessor by the use of a preprocessor directive in the source file.

Header files can sometimes contain very large amounts of source code (for instance, the header files windows.h and Cocoa/Cocoa.h on Microsoft Windows and OS X, respectively). This is especially true with the advent of large "header" libraries that make extensive use of templates, like the Eigen math library and Boost C++ libraries. They are written almost entirely as header files that the user #includes, rather than being linked at runtime. Thus, each time the user compiles their program, the user is essentially recompiling numerous header libraries as well. (These would be precompiled into shared objects or dynamic link libraries in non "header" libraries.)

To reduce compilation times, some compilers allow header files to be compiled into a form that is faster for the compiler to process. This intermediate form is known as a precompiled header, and is commonly held in a file named with the extension .pch or similar, such as .gch under the GNU Compiler Collection.

Usage

For example, given a C++ file source.cpp that includes header.hpp:

//header.hpp...
//source.cpp#include"header.hpp"...

When compiling source.cpp for the first time with the precompiled header feature turned on, the compiler will generate a precompiled header, header.pch. The next time, if the timestamp of this header did not change, the compiler can skip the compilation phase relating to header.hpp and instead use header.pch directly.

Common implementations

Microsoft Visual C and C++

Microsoft Visual C++ (version 6.0 and newer[ citation needed ]) can precompile any code, not just headers. [1] It can do this in two ways: either precompiling all code up to a file whose name matches the /Ycfilename option or (when /Yc is specified without any filename) precompiling all code up to the first occurrence of #pragma hdrstop in the code [2] [3] The precompiled output is saved in a file named after the filename given to the /Yc option, with a .pch extension, or in a file named according to the name supplied by the /Fpfilename option. [3] The /Yu option, subordinate to the /Yc option if used together, causes the compiler to make use of already precompiled code from such a file. [3]

pch.h (named stdafx.h before Visual Studio 2017 [4] ) is a file generated by the Microsoft Visual Studio IDE wizard, that describes both standard system and project specific include files that are used frequently but hardly ever change.

The afx in stdafx.h stands for application framework extensions. AFX was the original abbreviation for the Microsoft Foundation Classes (MFC). While the name stdafx.h was used by default in MSVC projects prior to version 2017, any alternative name may be manually specified.

Compatible compilers will precompile this file to reduce overall compile times. Visual C++ will not compile anything before the #include "pch.h" in the source file, unless the compile option /Yu'pch.h' is unchecked (by default); it assumes all code in the source up to and including that line is already compiled.

GCC

Precompiled headers are supported in GCC (3.4 and newer). GCC's approach is similar to these of VC and compatible compilers. GCC saves precompiled versions of header files using a ".gch" suffix. When compiling a source file, the compiler checks whether this file is present in the same directory and uses it if possible.

GCC can only use the precompiled version if the same compiler switches are set as when the header was compiled and it may use at most one. Further, only preprocessor instructions may be placed before the precompiled header (because it must be directly or indirectly included through another normal header, before any compilable code).

GCC automatically identifies most header files by their extension. However, if this fails (e.g. because of non-standard header extensions), the -x switch can be used to ensure that GCC treats the file as a header.

clang

The clang compiler added support for PCH in Clang 2.5 / LLVM 2.5 of 2009. [5] The compiler both tokenizes the input source code and performs syntactic and semantic analyses of headers, writing out the compiler's internal generated abstract syntax tree (AST) and symbol table to a precompiled header file. [6]

clang's precompiled header scheme, with some improvements such as the ability for one precompiled header to reference another, internally used, precompiled header, also forms the basis for its modules mechanism. [6] It uses the same bitcode file format that is employed by LLVM, encapsulated in clang-specific sections within Common Object File Format or Extensible Linking Format files. [6]

C++Builder

In the default project configuration, the C++Builder compiler implicitly generates precompiled headers for all headers included by a source module until the line #pragma hdrstop is found. [7] :76 Precompiled headers are shared for all modules of the project if possible. For example, when working with the Visual Component Library, it is common to include the vcl.h header first which contains most of the commonly used VCL header files. Thus, the precompiled header can be shared across all project modules, which dramatically reduces the build times.

In addition, C++Builder can be instrumented to use a specific header file as precompiled header, similar to the mechanism provided by Visual C++.

C++Builder 2009 introduces a "Precompiled Header Wizard" which parses all source modules of the project for included header files, classifies them (i.e. excludes header files if they are part of the project or do not have an Include guard) and generates and tests a precompiled header for the specified files automatically.

Pretokenized header

A pretokenized header (PTH) is a header file stored in a form that has been run through lexical analysis, but no semantic operations have been done on it. PTH is present in Clang before it supported PCH, and has also been tried in a branch of GCC. [8]

Compared to a full PCH mechanism, PTH has the advantages of language (and dialect) independence, as lexical analysis is similar for the C-family languages, and architecture independence, as the same stream of tokens can be used when compiling for different target architectures. [9] It however has the disadvantage of not going any further than simple lexical analysis, requiring that syntactic and semantic analysis of the token stream be performed with every compilation. In addition, the time to compile scaling linearly with the size, in lexical tokens, of the pretokenized file, which is not necessarily the case for a fully-fledged precompilation mechanism (PCH in clang allows random access). [9]

Clang's pretokenization mechanism includes several minor mechanisms for assisting the pre-processor: caching of file existence and datestamp information, and recording inclusion guards so that guarded code can be quickly skipped over. [9]

See also

Related Research Articles

In computing, a compiler is a computer program that translates computer code written in one programming language into another language. The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a low-level programming language to create an executable program.

The C preprocessor is the macro preprocessor for several computer programming languages, such as C, Objective-C, C++, and a variety of Fortran languages. The preprocessor provides inclusion of header files, macro expansions, conditional compilation, and line control.

In the C and C++ programming languages, an inline function is one qualified with the keyword inline; this serves two purposes:

  1. It serves as a compiler directive that suggests that the compiler substitute the body of the function inline by performing inline expansion, i.e. by inserting the function code at the address of each function call, thereby saving the overhead of a function call. In this respect it is analogous to the register storage class specifier, which similarly provides an optimization hint.
  2. The second purpose of inline is to change linkage behavior; the details of this are complicated. This is necessary due to the C/C++ separate compilation + linkage model, specifically because the definition (body) of the function must be duplicated in all translation units where it is used, to allow inlining during compiling, which, if the function has external linkage, causes a collision during linking. C and C++ resolve this in different ways.
<span class="mw-page-title-main">LLVM</span> Compiler backend for multiple programming languages

LLVM is a set of compiler and toolchain technologies that can be used to develop a frontend for any programming language and a backend for any instruction set architecture. LLVM is designed around a language-independent intermediate representation (IR) that serves as a portable, high-level assembly language that can be optimized with a variety of transformations over multiple passes. The name LLVM originally stood for Low Level Virtual Machine, though the project has expanded and the name is no longer officially an initialism.

An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" IR must be accurate – capable of representing the source code without loss of information – and independent of any particular source or target language. An IR may take one of several forms: an in-memory data structure, or a special tuple- or stack-based code readable by the program. In the latter case it is also called an intermediate language.

<span class="mw-page-title-main">Code::Blocks</span> Free and open source, cross-platform IDE

Code::Blocks is a free, open-source cross-platform IDE that supports multiple compilers including GCC, Clang and Visual C++. It is developed in C++ using wxWidgets as the GUI toolkit. Using a plugin architecture, its capabilities and features are defined by the provided plugins. Currently, Code::Blocks is oriented towards C, C++, and Fortran. It has a custom build system and optional Make support.

<span class="mw-page-title-main">Syntax (programming languages)</span> Set of rules defining correctly structured programs

In computer science, the syntax of a computer language is the rules that define the combinations of symbols that are considered to be correctly structured statements or expressions in that language. This applies both to programming languages, where the document represents source code, and to markup languages, where the document represents data.

In the C and C++ programming languages, #pragma once is a non-standard but widely supported preprocessor directive designed to cause the current source file to be included only once in a single compilation. Thus, #pragma once serves the same purpose as include guards, but with several advantages, including: less code, avoidance of name clashes, and sometimes improvement in compilation speed. On the other hand, #pragma once is not necessarily available in all compilers and its implementation is tricky and might not always be reliable.

Interprocedural optimization (IPO) is a collection of compiler techniques used in computer programming to improve performance in programs containing many frequently used functions of small or medium length. IPO differs from other compiler optimizations by analyzing the entire program as opposed to a single function or block of code.

<span class="mw-page-title-main">CMake</span> Cross-platform, compiler-independent build system generator

In software development, CMake is cross-platform free and open-source software for build automation, testing, packaging and installation of software by using a compiler-independent method. CMake is not a build system itself; it generates another system's build files. It supports directory hierarchies and applications that depend on multiple libraries. It is used in conjunction with native build environments such as Make, Qt Creator, Ninja, Android Studio, Apple's Xcode, and Microsoft Visual Studio. It has minimal dependencies, requiring only a C++ compiler on its own build system.

Program database (PDB) is a file format for storing debugging information about a program. PDB files commonly have a .pdb extension. A PDB file is typically created from source files during compilation. It stores a list of all symbols in a module with their addresses and possibly the name of the file and the line on which the symbol was declared. This symbol information is not stored in the module itself, because it takes up a lot of space.

Clang is a compiler front end for the C, C++, Objective-C, and Objective-C++ programming languages, as well as the OpenMP, OpenCL, RenderScript, CUDA, SYCL, and HIP frameworks. It acts as a drop-in replacement for the GNU Compiler Collection (GCC), supporting most of its compilation flags and unofficial language extensions. It includes a static analyzer, and several code analysis tools.

Blocks are a non-standard extension added by Apple Inc. to Clang's implementations of the C, C++, and Objective-C programming languages that uses a lambda expression-like syntax to create closures within these languages. Blocks are supported for programs developed for Mac OS X 10.6+ and iOS 4.0+, although third-party runtimes allow use on Mac OS X 10.5 and iOS 2.2+ and non-Apple systems.

In C and C++ programming language terminology, a translation unit is the ultimate input to a C or C++ compiler from which an object file is generated. A translation unit roughly consists of a source file after it has been processed by the C preprocessor, meaning that header files listed in #include directives are literally included, sections of code within #ifndef may be included, and macros have been expanded.

Many programming languages and other computer files have a directive, often called include, import, or copy, that causes the contents of the specified file to be inserted into the original file. These included files are called header files or copybooks. They are often used to define the physical layout of program data, pieces of procedural code, and/or forward declarations while promoting encapsulation and the reuse of code or data.

<span class="mw-page-title-main">Roslyn (compiler)</span>

.NET Compiler Platform, also known by its codename Roslyn, is a set of open-source compilers and code analysis APIs for C# and Visual Basic (VB.NET) languages from Microsoft.

C++ Accelerated Massive Parallelism is a native programming model that contains elements that span the C++ programming language and its runtime library. It provides an easy way to write programs that compile and execute on data-parallel hardware, such as graphics cards (GPUs).

Mingw-w64 is a free and open source software development environment to create (cross-compile) Microsoft Windows PE applications. It was forked in 2005–2010 from MinGW.

Objective-C is a high-level general-purpose, object-oriented programming language that adds Smalltalk-style messaging to the C programming language. Originally developed by Brad Cox and Tom Love in the early 1980s, it was selected by NeXT for its NeXTSTEP operating system. Due to Apple macOS’s direct lineage from NeXTSTEP, Objective-C was the standard programming language used, supported, and promoted by Apple for developing macOS and iOS applications until the introduction of the Swift programming language in 2014.

References

  1. "Creating Precompiled Header Files". MSDN . Microsoft. 2015. Archived from the original on 2018-03-28. Retrieved 2018-03-28.
  2. "Two Choices for Precompiling Code". MSDN. Microsoft. 2015. Retrieved 2018-03-28.
  3. 1 2 3 "/Yc (Create Precompiled Header File)". MSDN. Microsoft. 2015. Retrieved 2018-03-28.
  4. "Can I use #include "pch.h" instead of #include "stdafx.h" as my precompile header in Visual Studio C++?". Stack Overflow.
  5. "LLVM 2.5 Release Notes". releases.llvm.org.
  6. 1 2 3 The Clang Team (2018). "Precompiled Header and Modules Internals". Clang 7 documentation. Retrieved 2018-03-28.
  7. Swart, Bob (2003). Borland C++ Builder 6 Developer's Guide. Sams Publishing. ISBN   9780672324802.
  8. "pph - GCC Wiki". gcc.gnu.org.
  9. 1 2 3 The Clang Team (2018). "Pretokenized Headers (PTH)". Clang 7 documentation. Archived from the original on 2018-03-22. Retrieved 2018-03-28.