This article has multiple issues. Please help improve it or discuss these issues on the talk page . (Learn how and when to remove these messages)
|
The C preprocessor (CPP) is a text file processor that is used with C, C++ and other programming tools. The preprocessor provides for file inclusion (often header files), macro expansion, conditional compilation, and line control. Although named in association with C and used with C, the preprocessor capabilities are not inherently tied to the C language. It can and is used to process other kinds of files. [1]
C, C++, and Objective-C compilers provide a preprocessor capability, as it is required by the definition of each language. Some compilers provide extensions and deviations from the target language standard. Some provide options to control standards compliance. For instance, the GNU C preprocessor can be made more standards compliant by supplying certain command-line flags. [2]
Features of the preprocessor are encoded in source code as directives that start with #
.
Although C++ source files are often named with a .cpp
extension, that is an abbreviation for "C plus plus"; not C preprocessor.
The preprocessor was introduced to C around 1973 at the urging of Alan Snyder and also in recognition of the usefulness of the file inclusion mechanisms available in BCPL and PL/I. The first version offered file inclusion via #include
and parameterless string replacement macros via #define
. It was extended shortly after, firstly by Mike Lesk and then by John Reiser, to add arguments to macros and to support conditional compilation. [3]
The C preprocessor was part of a long macro-language tradition at Bell Labs, which was started by Douglas Eastwood and Douglas McIlroy in 1959. [4]
Preprocessing is defined by the first four (of eight) phases of translation specified in the C Standard.
_Pragma
operators.To include the content of one file into another, the preprocessor replaces a line that starts with #include
with the content of the file specified after the directive. The inclusion may be logical in the sense that the resulting content may not be stored on disk and certainly is not overwritten to the source file.
In the following example code, the preprocessor replaces the line #include <stdio.h>
with the content of the standard library header file named 'stdio.h' in which the function printf()
and other symbols are declared.
#include<stdio.h>intmain(void){printf("Hello, World!\n");return0;}
In this case, the file name is enclosed in angle brackets to denote that it is a system file. For a file in the codebase being built, double-quotes are used instead. The preprocessor may use a different search algorithm to find the file based on this distinction.
For C, a header file is usually named with a .h
extension. In C++, the convension for file extension varies with common extensions .h
and .hpp
. But the preprocessor includes a file regardless of the extension. In fact, sometimes code includes .c
or .cpp
files.
To prevent including the same file multiple times which often leads to a compiler error, a header file typically contains an include guard or if supported by the preprocessor #pragma once
to prevent multiple inclusion.
Conditional compilation is supported via the if-else core directives #if
, #else
, #elif
, and #endif
and with contraction directives #ifdef
and #ifndef
which stand for #if defined(...)
and #if !defined(...)
, respectively. In the following example code, the printf()
call is only included for compilation if VERBOSE
is defined.
#ifdef VERBOSEprintf("trace message");#endif
The following demonstrates more complex logic:
#if !(defined __LP64__ || defined __LLP64__) || defined _WIN32 && !defined _WIN64// code for a 32-bit system#else// code for a 64-bit system#endif
A macro specifies how to replace text in the source code with other text. An object-like macro defines a token that the preprocessor replaces with other text. It does not include parameter syntax and therefore cannot support parameterization. The following macro definition associates the text "1 / 12" with the token "VALUE":
#define VALUE 1 / 12
A function-like macro supports parameters; although the parameter list can be empty. The following macro definition associates the expression "(A + B)" with the token "ADD" that has parameters "A" and "B".
#define ADD(A, B) (A + B)
A function-like macro declaration cannot have whitespace between the token and the first, opening parenthesis. If whitespace is present, the macro is interpreted as object-like with everything starting at the first parenthesis included in the replacement text.
The preprocessor replaces each token of the code that matches a macro token with the associated replacement text in what is known as macro expansion. Note that text of string literals and comments is not parsed as tokens and is therefore ignored for macro expansion. For a function-like macro, the macro parameters are also replaced with the values specified in the macro reference. For example, ADD(VALUE, 2)
expands to 1 / 12 + 2
.
A variadic macro (introduced with C99) accepts a varying number of arguments which is particularly useful when wrapping functions that accept a variable number of parameters, such as printf
.
Function-like macro expansion occurs in the following stages:
This may produce surprising results:
#define HE HI#define LLO _THERE#define HELLO "HI THERE"#define CAT(a,b) a##b#define XCAT(a,b) CAT(a,b)#define CALL(fn) fn(HE,LLO)CAT(HE,LLO)// "HI THERE", because concatenation occurs before normal expansionXCAT(HE,LLO)// HI_THERE, because the tokens originating from parameters ("HE" and "LLO") are expanded firstCALL(CAT)// "HI THERE", because this evaluates to CAT(a,b)
A macro definition can be removed from the preprocessor context via #undef
such that subsequent reference to the macro token will not expand. For example:
#undef VALUE
The preprocessor provides some macro definitions automatically. The C standard specifies that __FILE__
expands to the name of the file being processed and __LINE__
expands to the number of the line that contains the directive. The following macro, DEBUGPRINT
, formats and prints a message with the file name and line number.
#define DEBUGPRINT(_fmt, ...) printf("[%s:%d]: " _fmt, __FILE__, __LINE__, __VA_ARGS__)
For the example code below that is on line 30 of file "util.c" and for count 123, the output is: "[util.c:30]: count=123".
DEBUGPRINT("count=%d\n",count);
The first C Standard specified that __STDC__
expand to "1" if the implementation conforms to the ISO standard and "0" otherwise and that __STDC_VERSION__
expand to a numeric literal specifying the version of the standard supported by the implementation. Standard C++ compilers support the __cplusplus
macro. Compilers running in non-standard mode must not set these macros or must define others to signal the differences.
Other standard macros include __DATE__
, the current date, and __TIME__
, the current time.
The second edition of the C Standard, C99, added support for __func__
, which contains the name of the function definition within which it is contained, but because the preprocessor is agnostic to the grammar of C, this must be done in the compiler itself using a variable local to the function.
One little-known usage pattern of the C preprocessor is known as X-Macros. [5] [6] [7] An X-Macro is a header file. Commonly, these use the extension .def
instead of the traditional .h
. This file contains a list of similar macro calls, which can be referred to as "component macros." The include file is then referenced repeatedly.
Many compilers define additional, non-standard macros. A common reference for these macros is the Pre-defined C/C++ Compiler Macros project, which lists "various pre-defined compiler macros that can be used to identify standards, compilers, operating systems, hardware architectures, and even basic run-time libraries at compile-time."
Most compilers targeting Microsoft Windows implicitly define _WIN32
. [8] This allows code, including preprocessor commands, to compile only when targeting Windows systems. A few compilers define WIN32
instead. For such compilers that do not implicitly define the _WIN32
macro, it can be specified on the compiler's command line, using -D_WIN32
.
#ifdef __unix__ /* __unix__ is usually defined by compilers targeting Unix systems */#include<unistd.h>#elif defined _WIN32 /* _WIN32 is usually defined by compilers targeting 32 or 64 bit Windows systems */#include<windows.h>#endif
The example code tests if a macro __unix__
is defined. If it is, the file <unistd.h>
is then included. Otherwise, it tests if a macro _WIN32
is defined instead. If it is, the file <windows.h>
is then included.
The values of the predefined macros __FILE__
and __LINE__
can be set for a subsequent line via the #line
directive. In the code below, __LINE__
expands to 314 and __FILE__
to "pi.c".
#line 314 "pi.c"printf("line=%d file=%s\n",__LINE__,__FILE__);
The stringification operator (a.k.a. stringizing operator), denoted by #
converts a token into a string literal, escaping any quotes or backslashes as needed. For definition:
#define str(s) #s
str(\n)
expands to "\n"
and str(p = "foo\n";)
expands to "p = \"foo\\n\";"
.
If stringification of the expansion of a macro argument is desired, two levels of macros must be used. For definition:
#define xstr(s) str(s)#define str(s) #s#define foo 4
str(foo)
expands to "foo" and xstr(foo)
expands to "4".
A macro argument cannot be combined with additional text and then stringified. However, a series of adjacent string literals and stringified arguments, also string literals, are concatenated by the C compiler.
The token pasting operator, denoted by ##
, concatenates two tokens into one. For definition:
#define DECLARE_STRUCT_TYPE(name) typedef struct name##_s name##_t
DECLARE_STRUCT_TYPE(g_object)
expands to typedef struct g_object_s g_object_t
.
Processing can be aborted via the #error
directive. For example:
#if RUBY_VERSION == 190#error Ruby version 1.9.0 is not supported#endif
C23 introduces the #embed
directive for binary resource inclusion which allows including the content of a binary file into a source even though it's not valid C code. [9] This allows binary resources (like images) to be included into a program without requiring processing by external tools like xxd -i
and without the use of string literals which have a length limit on MSVC. Similarly to xxd -i
the directive is replaced by a comma separated list of integers corresponding to the data of the specified resource. More precisely, if an array of type unsigned char
is initialized using an #embed
directive, the result is the same as-if the resource was written to the array using fread
(unless a parameter changes the embed element width to something other than CHAR_BIT
). Apart from the convenience, #embed
is also easier for compilers to handle, since they are allowed to skip expanding the directive to its full form due to the as-if rule.
The file to embed is specified the same as for #include
– either with brackets or double quotes. The directive also allows certain parameters to be passed to it to customize its behavior. The C standard defines some parameters and implementations may define additional. The limit
parameter is used to limit the width of the included data. It is mostly intended to be used with "infinite" files like urandom. The prefix
and suffix
parameters allow for specifying a prefix and suffix to the embedded data. Finally, the if_empty
parameter replaces the entire directive if the resource is empty. All standard parameters can be surrounded by double underscores, just like standard attributes on C23, for example __prefix__
is interchangeable with prefix
. Implementation-defined parameters use a form similar to attribute syntax (e.g., vendor::attr
) but without the square brackets. While all standard parameters require an argument to be passed to them (e.g., limit requires a width), this is generally optional and even the set of parentheses can be omitted if an argument is not required, which might be the case for some implementation-defined parameters.
The #pragma
directive is defined by standard languages, but with little or no requirements for syntax after its name so that compilers are free to define subsequent syntax and associated behavior. For instance, a pragma is often used to allow suppression of error messages, manage heap and stack debugging and so on.
C99 introduced a few standard pragmas, taking the form #pragma STDC ...
, which are used to control the floating-point implementation. The alternative, macro-like form _Pragma(...)
was also added.
Many implementations do not support trigraphs or do not replace them by default.
Many implementations (such as the C compilers by GNU, Intel, Microsoft and IBM) provide a non-standard directive to print a message without aborting, typically to warn about the use of deprecated functionality. For example:
// GNU, Intel and IBM#warning "Do not use ABC, which is deprecated. Use XYZ instead."
// Microsoft#pragma message("Do not use ABC, which is deprecated. Use XYZ instead.")
C23 [10] and C++23 [11] standardize #warning
.
Some Unix preprocessors provided an assertion feature – which has little similarity to standard library assertions. [12]
GCC provides #include_next
for chaining headers of the same name. [13]
Unlike C and C++, Objective-C includes an #import
directive that is like #include
but results in a file being included only once – eliminating the need for include guards and #pragma once
.
Traditionally, the C preprocessor was a separate development tool from the compiler with which it is usually used. In that case, it can be used separately from the compiler. Notable examples include use with the (deprecated) imake system and for preprocessing Fortran. However, use as a general purpose preprocessor is limited since the source code language must be relatively C-like for the preprocessor to parse it. [2]
The GNU Fortran compiler runs "traditional mode" CPP before compiling Fortran code if certain file extensions are used. [14] Intel offers a Fortran preprocessor, fpp, for use with the ifort compiler, which has similar capabilities. [15]
CPP also works acceptably with most assembly languages and Algol-like languages. This requires that the language syntax not conflict with CPP syntax, which means no lines starting with #
and that double quotes, which CPP interprets as string literals and thus ignores, don't have syntactical meaning other than that. The "traditional mode" (acting like a pre-ISO C preprocessor) is generally more permissive and better suited for such use. [16]
Some modern compilers such as the GNU C Compiler provide preprocessing as a feature of the compiler; not as a separate tool.
Text substitution has a relatively high risk of causing a software bug as compared to other programming constructs. [17] [18]
Consider the common definition of a max macro:
#define max(a,b) (((a) > (b)) ? (a) : (b))
The expressions represented by a and b are both evaluated two times due to macro expansion, but this aspect is not obvious in the code where the macro is referenced. If the actual expressions have constant value, then multiple evaluation is not problematic from a logic standpoint even though it can affect runtime performance. But if an expression evaluates to a different value on subsequent evaluation, then the result may be unexpected. For example, given int i = 1; j = 2;
, the result of max(i,j)
is 2. If a and b were only evaluated once, the result of max(i++,j++)
would be the same, but with double evaluation the result is 3.
Failure to bracket arguments can lead to unexpected results. For example, a macro to double a value might be written as:
#define double(x) 2 * x
But double(1 + 2)
expands to 2 * 1 + 2
which due to order of operations, evaluates to 4 when the expected is 6. To mitigate this problem, a macro should bracket all expressions and substitution variables:
#define double(x) (2 * (x))
The C preprocessor is not Turing-complete, but comes close. Recursive computations can be specified, but with a fixed upper bound on the amount of recursion performed. [19] However, the C preprocessor is not designed to be, nor does it perform well as, a general-purpose programming language. As the C preprocessor does not have features of some other preprocessors, such as recursive macros, selective expansion according to quoting, and string evaluation in conditionals, it is very limited in comparison to a more general macro processor such as m4.
Due to its limitations, C and C++ language features have been added over the years to minimize the value and need for the preprocessor.
For a long time, a preprocessor macro provided the preferred way to define a constant value. An alternative has always been to define a const
variable, but that results in consuming runtime memory. A newer language construct (since C++11 and C23), constexpr
allows for declaring a compile-time constant value that need not consume runtime memory. [20]
For a long time, a function-like macro was the only way to define function-like behavior that did not incur runtime function call overhead. Via the inline
keyword and optimizing compilers that inline automatically, some functions can be invoked without call overhead.
The include directive limits code structure since it only allows including the content of one file into another. More modern languages support a module concept that has public symbols that other modules import – instead of including file content. Many contend that resulting code is easier to maintain since there is only one file for a module; not both a header and a body. C++20 adds a module concept and an import statement that is not handled via preprocessing. [21] [22]
C is a general-purpose programming language. It was created in the 1970s by Dennis Ritchie and remains very widely used and influential. By design, C's features cleanly reflect the capabilities of the targeted CPUs. It has found lasting use in operating systems code, device drivers, and protocol stacks, but its use in application software has been decreasing. C is commonly used on computer architectures that range from the largest supercomputers to the smallest microcontrollers and embedded systems.
In computer science, a preprocessor is a program that processes its input data to produce output that is used as input in another program. The output is said to be a preprocessed form of the input data, which is often used by some subsequent programs like compilers. The amount and kind of processing done depends on the nature of the preprocessor; some preprocessors are only capable of performing relatively simple textual substitutions and macro expansions, while others have the power of full-fledged programming languages.
The C standard library, sometimes referred to as libc, is the standard library for the C programming language, as specified in the ISO C standard. Starting from the original ANSI C standard, it was developed at the same time as the C POSIX library, which is a superset of it. Since ANSI C was adopted by the International Organization for Standardization, the C standard library is also called the ISO C library.
The syntax of the C programming language is the set of rules governing writing of software in C. It is designed to allow for programs that are extremely terse, have a close relationship with the resulting object code, and yet provide relatively high-level data abstraction. C was the first widely successful high-level language for portable operating-system development.
printf is a C standard library function that formats text and writes it to standard output.
C99 is a past version of the C programming language open standard. It extends the previous version (C90) with new features for the language and the standard library, and helps implementations make better use of available computer hardware, such as IEEE 754-1985 floating-point arithmetic, and compiler technology. The C11 version of the C programming language standard, published in 2011, updates C99.
m4 is a general-purpose macro processor included in most Unix-like operating systems, and is a component of the POSIX standard.
A variadic macro is a feature of some computer programming languages, especially the C preprocessor, whereby a macro may be declared to accept a varying number of arguments.
In computer programming, a directive or pragma is a language construct that specifies how a compiler should process its input. Depending on the programming language, directives may or may not be part of the grammar of the language and may vary from compiler to compiler. They can be processed by a preprocessor to specify compiler behavior, or function as a form of in-band parameterization.
OpenGL Shading Language (GLSL) is a high-level shading language with a syntax based on the C programming language. It was created by the OpenGL ARB to give developers more direct control of the graphics pipeline without having to use ARB assembly language or hardware-specific languages.
In the C and C++ programming languages, an #include guard, sometimes called a macro guard, header guard or file guard, is a way to avoid the problem of double inclusion when dealing with the include directive.
In computer programming, a precompiled header (PCH) is a header file that is compiled into an intermediate form that is faster to process for the compiler. Usage of precompiled headers may significantly reduce compilation time, especially when applied to large header files, header files that include many other header files, or header files that are included in many translation units.
In the C and C++ programming languages, #pragma once
is a non-standard but widely supported preprocessor directive designed to cause the current header file to be included only once in a single compilation. Thus, #pragma once
serves the same purpose as #include guards, but with several advantages, including less code, avoidance of name clashes, and sometimes improvement in compilation speed. While #pragma once
is available in most modern compilers, its implementation is tricky and might not always be reliable.
A weak symbol denotes a specially annotated symbol during linking of Executable and Linkable Format (ELF) object files. By default, without any annotation, a symbol in an object file is strong. During linking, a strong symbol can override a weak symbol of the same name. In contrast, in the presence of two strong symbols by the same name, the linker resolves the symbol in favor of the first one found. This behavior allows an executable to override standard library functions, such as malloc(3). When linking a binary executable, a weakly declared symbol does not need a definition. In comparison, a declared strong symbol without a definition triggers an undefined symbol link error.
The Windows software trace preprocessor is a preprocessor that simplifies the use of WMI event tracing to implement efficient software tracing in drivers and applications that target Windows 2000 and later operating systems. WPP was created by Microsoft and is included in the Windows DDK. Although WPP is wide in its applicability, it is not included in the Windows SDK, and therefore is primarily used for drivers and driver support software produced by software vendors that purchase the Windows DDK.
In computer programming, variadic templates are templates that take a variable number of arguments.
In C and C++ programming language terminology, a translation unit is the ultimate input to a C or C++ compiler from which an object file is generated. A translation unit roughly consists of a source file after it has been processed by the C preprocessor, meaning that header files listed in #include
directives are literally included, sections of code within #ifndef
may be included, and macros have been expanded.
An include directive instructs a text file processor to replace the directive text with the content of a specified file.
OpenHMPP - programming standard for heterogeneous computing. Based on a set of compiler directives, standard is a programming model designed to handle hardware accelerators without the complexity associated with GPU programming. This approach based on directives has been implemented because they enable a loose relationship between an application code and the use of a hardware accelerator (HWA).
In computer programming, ellipsis notation is used to denote ranges, an unspecified number of arguments, or a parent directory. Most programming languages require the ellipsis to be written as a series of periods; a single (Unicode) ellipsis character cannot be used.
{{cite journal}}
: Cite journal requires |journal=
(help)Having said that, you can often get away with using cpp on things which are not C. Other Algol-ish programming languages are often safe (Ada, etc.) So is assembly, with caution. -traditional-cpp mode preserves more white space, and is otherwise more permissive. Many of the problems can be avoided by writing C or C++ style comments instead of native language comments, and keeping macros simple.