This article needs additional citations for verification .(February 2015) |
printf is a C standard library function that formats text and writes it to standard output.
The name, printf is short for print formatted where print refers to output to a printer although the functions are not limited to printer output.
The standard library provides many other similar functions that form a family of printf-like functions. These functions accept a format string parameter and a variable number of value parameters that the function serializes per the format string and writes to an output stream or a string buffer.
The format string is encoded as a template language consisting of verbatim text and format specifiers that each specify how to serialize a value. As the format string is processed left-to-right, a subsequent value is used for each format specifier found. A format specifier starts with a %
character and has one or more following characters that specify how to serialize a value.
The format string syntax and semantics is the same for all of the functions in the printf-like family.
Mismatch between the format specifiers and count and type of values can cause a crash or vulnerability.
The printf format string is complementary to the scanf format string, which provides formatted input (lexing a.k.a. parsing). Both format strings provide relatively simple functionality compared to other template engines, lexers and parsers.
The formatting design has been copied in other programming languages.
Early programming languages like Fortran used special statements with different syntax from other calculations to build formatting descriptions. [1] In this example, the format is specified on line 601, and the PRINT [a] command refers to it by line number:
PRINT 601,IA,IB,AREA 601 FORMAT(4HA=,I5,5HB=,I5,8HAREA=,F10.2,13HSQUAREUNITS)
Hereby:
4H
indicates a string of 4 characters " A= "
(H
means Hollerith Field);I5
indicates an integer field of width 5;F10.2
indicates a floating-point field of width 10 with 2 digits after the decimal point.An output with input arguments 100, 200, and 1500.25 might look like this:
A= 100 B= 200 AREA= 1500.25 SQUARE UNITS
In 1967, BCPL appeared. [2] Its library included the writef
routine. [3] An example application looks like this:
WRITEF("%I2-QUEENS PROBLEM HAS %I5 SOLUTIONS*N", NUMQUEENS, COUNT)
Hereby:
%I2
indicates an integer of width 2 (the order of the format specification's field width and type is reversed compared to C's printf
);%I5
indicates an integer of width 5;*N
is a BCPL language escape sequence representing a newline character (for which C uses the escape sequence \n
).In 1968, ALGOL 68 had a more function-like API, but still used special syntax (the $
delimiters surround special formatting syntax):
printf(($"Color "g", number1 "6d,", number2 "4zd,", hex "16r2d,", float "-d.2d,", unsigned value"-3d"."l$,"red",123456,89,BIN255,3.14,250));
In contrast to Fortran, using normal function calls and data types simplifies the language and compiler, and allows the implementation of the input/output to be written in the same language.
These advantages were thought to outweigh the disadvantages (such as a complete lack of type safety in many instances) up until the 2000s, and in most newer languages of that era I/O is not part of the syntax.
People have since learned the hard way [4] that this belief is false, resulting in plethora undesired consequences, ranging from security exploits to hardware failures (e.g., phone's networking capabilities being permanently disabled after trying to connect to an access point named "%p%s%s%s%s%n". [5] ).
Modern languages, such as C++20 and later, are therefore taking steps to reverse this mistake, and do include format specifications as a part of the language syntax, [6] which restore type safety in formatting to an extent, and allow the compiler to detect some invalid combinations of format specifiers and data types at compile time.
In 1973, printf
is included as a C routine as part of Version 4 Unix. [7]
In 1990, a printf
shell command is attested as part of 4.3BSD-Reno. It is modeled after the standard library function. [8]
In 1991, a printf
command is bundled with GNU shellutils (now part of GNU Core Utilities).
The need to do something about the range of problems resulting from lack of type safety has prompted attempts to make the C++ compiler printf
-aware.
The -Wformat
option of GCC allows compile-time checks to printf
calls, enabling the compiler to detect a subset of invalid calls (and issue either a warning or an error, stopping the compilation altogether, depending on other flags). [9]
Since the compiler is inspecting printf
format specifiers, enabling this options effectively extends the C++ syntax by making formatting a part of it.
As said above, numerous issues [10] with printf()
's lack of type safety resulted in the revision [11] of approach to formatting, and C++20 an onwards include format specifications in the language [12] to enable type-safe formatting.
The approach (and syntax) of C++20 std::format
resulted from effectively incorporating Victor Zverovich's libfmt
[13] API into the language specification [14] (Zverovich wrote [15] the first draft of the new format proposal); consequently, libfmt
is an implementation of the C++20 format specification.
The formatting function has been combined with output in C++23, which provides [16] the std::print
command as a replacement for printf()
.
As the format specification has become a part of the language syntax, C++ compiler is able to prevent invalid combinations of types and format specifiers in many cases. Unlike the -Wformat
option, this is not an optional feature.
The format specification of libfmt
and std::format
is, in itself, an extensible "mini-language" (referred to as such in the specification, [17] an example of a Domain-specific language.
Incorporation of a separate, domain specific mini-language specifically for formatting into the C++ language syntax for std::print
, therefore, completes the historical cycle, bringing the state-of-the-art (as of 2024) back to what it was in the case of FORTRAN's first PRINT
implementation in the 1950s discussed in the beginning of this section.
Formatting a value is specified as markup in the format string. For example, the following outputs "Your age is " and then the value of variable age
in decimal format.
printf("Your age is %d",age);
The syntax for a format specifier is:
%[parameter][flags][width][.precision][length]type
The parameter field is optional. If included, then matching specifiers to values is not sequential. The numeric value, n, selects the nth value parameter.
Character | Description |
---|---|
n$ | n is the index of the value parameter to serialize using this format specifier |
This is a POSIX extension; not C99.
This field allows for using the same value multiple times in a format string instead of having to pass the value multiple times. If a specifier includes this field, then subsequent specifiers must also.
For example,
printf("%2$d %2$#x; %1$d %1$#x",16,17)
outputs: 17 0x11; 16 0x10
.
This field is particularly useful for localizing messages to different natural languages that often use different word order.
In Microsoft Windows, support for this feature is via a different function, printf_p
.
The flags field can be zero or more of (in any order):
Character | Description |
---|---|
- (minus) | Left-align the output of this placeholder. (The default is to right-align the output.) |
+ (plus) | Prepends a plus for positive signed-numeric types. positive = + , negative = - .(The default does not prepend anything in front of positive numbers.) |
(space) | Prepends a space for positive signed-numeric types. positive = , negative = - . This flag is ignored if the + flag exists.(The default does not prepend anything in front of positive numbers.) |
0 (zero) | When the 'width' option is specified, prepends zeros for numeric types. (The default prepends spaces.) For example, printf("%4X",3) produces 3 , while printf("%04X",3) produces 0003 . |
' (apostrophe) | The integer or exponent of a decimal has the thousands grouping separator applied. |
# (hash) | Alternate form: For g and G types, trailing zeros are not removed. For f, F, e, E, g, G types, the output always contains a decimal point. For o, x, X types, the text 0, 0x, 0X, respectively, is prepended to non-zero numbers. |
The width field specifies the minimum number of characters to output. If the value can be represented in fewer characters, then the value is left-padded with spaces so that output is the number of characters specified. If the value requires more characters, then the output is longer than the specified width. A value is never truncated.
For example, printf("%3d",12)
specifies a width of 3 and outputs 12
with a space on the left to output 3 characters. The call printf("%3d",1234)
outputs 1234
which is 4 characters long since that is the minimum width for that value even though the width specified is 3.
If the width field is omitted, the output is the minimum number of characters for the value.
If the field is specified as *
, then the width value is read from the list of values in the call. [18] For example, printf("%*d",3,10)
outputs 10
where the second parameter, 3, is the width (matches with *) and 10 is the value to serialize (matches with d).
Though not part of the width field, a leading zero is interpreted as the zero-padding flag mentioned above, and a negative value is treated as the positive value in conjunction with the left-alignment -
flag also mentioned above.
The width field can be used to format values as a table (tabulated output). But, columns do not align if any value is larger than fits in the width specified. For example, notice that the last line value (1234) does not fit in the first column of width 3 and therefore the column is not aligned.
1112121231231234123
The precision field usually specifies a maximum limit of the output, depending on the particular formatting type. For floating-point numeric types, it specifies the number of digits to the right of the decimal point that the output should be rounded. For the string type, it limits the number of characters that should be output, after which the string is truncated.
The precision field may be omitted, or a numeric integer value, or a dynamic value when passed as another argument when indicated by an asterisk *
. For example, printf("%.*s",3,"abcdef")
outputs abc.
The length field can be omitted or be any of:
Character | Description |
---|---|
hh | For integer types, causes printf to expect an int-sized integer argument which was promoted from a char. |
h | For integer types, causes printf to expect an int-sized integer argument which was promoted from a short. |
l | For integer types, causes printf to expect a long-sized integer argument. For floating-point types, this is ignored. float arguments are always promoted to double when used in a varargs call. [19] |
ll | For integer types, causes printf to expect a long long-sized integer argument. |
L | For floating-point types, causes printf to expect a long double argument. |
z | For integer types, causes printf to expect a size_t-sized integer argument. |
j | For integer types, causes printf to expect a intmax_t-sized integer argument. |
t | For integer types, causes printf to expect a ptrdiff_t-sized integer argument. |
Platform-specific length options came to exist prior to widespread use of the ISO C99 extensions, including:
Characters | Description | Commonly found platforms |
---|---|---|
I | For signed integer types, causes printf to expect ptrdiff_t-sized integer argument; for unsigned integer types, causes printf to expect size_t-sized integer argument. | Win32/Win64 |
I32 | For integer types, causes printf to expect a 32-bit (double word) integer argument. | Win32/Win64 |
I64 | For integer types, causes printf to expect a 64-bit (quad word) integer argument. | Win32/Win64 |
q | For integer types, causes printf to expect a 64-bit (quad word) integer argument. | BSD |
ISO C99 includes the inttypes.h
header file that includes a number of macros for platform-independent printf
coding. For example: printf("%"PRId64,t);
specifies decimal format for a 64-bit signed integer. Since the macros evaluate to a string literal, and the compiler concatenates adjacent string literals, the expression "%"PRId64
compiles to a single string.
Macros include:
Macro | Description |
---|---|
PRId32 | Typically equivalent to I32d (Win32/Win64) or d |
PRId64 | Typically equivalent to I64d (Win32/Win64), lld (32-bit platforms) or ld (64-bit platforms) |
PRIi32 | Typically equivalent to I32i (Win32/Win64) or i |
PRIi64 | Typically equivalent to I64i (Win32/Win64), lli (32-bit platforms) or li (64-bit platforms) |
PRIu32 | Typically equivalent to I32u (Win32/Win64) or u |
PRIu64 | Typically equivalent to I64u (Win32/Win64), llu (32-bit platforms) or lu (64-bit platforms) |
PRIx32 | Typically equivalent to I32x (Win32/Win64) or x |
PRIx64 | Typically equivalent to I64x (Win32/Win64), llx (32-bit platforms) or lx (64-bit platforms) |
The type field can be any of:
Character | Description |
---|---|
% | Prints a literal % character (this type does not accept any flags, width, precision, length fields). |
d, i | int as a signed integer. %d and %i are synonymous for output, but are different when used with scanf for input (where using %i will interpret a number as hexadecimal if it's preceded by 0x, and octal if it's preceded by 0.) |
u | Print decimal unsigned int. |
f, F | double in normal (fixed-point) notation. f and F only differs in how the strings for an infinite number or NaN are printed (inf, infinity and nan for f; INF, INFINITY and NAN for F). |
e, E | double value in standard form (d.ddde±dd). An E conversion uses the letter E (rather than e) to introduce the exponent. The exponent always contains at least two digits; if the value is zero, the exponent is 00. In Windows, the exponent contains three digits by default, e.g. 1.5e002, but this can be altered by Microsoft-specific _set_output_format function. |
g, G | double in either normal or exponential notation, whichever is more appropriate for its magnitude. g uses lower-case letters, G uses upper-case letters. This type differs slightly from fixed-point notation in that insignificant zeroes to the right of the decimal point are not included. Also, the decimal point is not included on whole numbers. |
x, X | unsigned int as a hexadecimal number. x uses lower-case letters and X uses upper-case. |
o | unsigned int in octal. |
s | null-terminated string. |
c | char (character). |
p | void* (pointer to void) in an implementation-defined format. |
a, A | double in hexadecimal notation, starting with 0x or 0X. a uses lower-case letters, A uses upper-case letters. [20] [21] (C++11 iostreams have a hexfloat that works the same). |
n | Print nothing, but writes the number of characters written so far into an integer pointer parameter. In Java this prints a newline. [22] |
A common way to handle formatting with a custom data type is to format the custom data type value into a string, then use the %s
specifier to include the serialized value in a larger message.
Some printf-like functions allow extensions to the escape-character-based mini-language, thus allowing the programmer to use a specific formatting function for non-builtin types. One is the (now deprecated) glibc's register_printf_function()
. However, it is rarely used due to the fact that it conflicts with static format string checking. Another is Vstr custom formatters, which allows adding multi-character format names.
Some applications (like the Apache HTTP Server) include their own printf-like function, and embed extensions into it. However these all tend to have the same problems that register_printf_function()
has.
The Linux kernel printk
function supports a number of ways to display kernel structures using the generic %p
specification, by appending additional format characters. [23] For example, %pI4
prints an IPv4 address in dotted-decimal form. This allows static format string checking (of the %p
portion) at the expense of full compatibility with normal printf.
Variants of printf
provide the formatting features but with additional or slightly different behavior.
fprintf
outputs to a system file object which allows output to other than standard output.
sprintf
writes to a string buffer instead of standard output.
snprintf
provides a level of safety over sprintf
since the caller provides a length (n) parameter that specifies the maximum number or chars to write to the buffer.
For most printf-family functions, there is a variant that accepts va_list
rather than a variable length parameter list. For example, there is a vfprintf
, vsprintf
, vsnprintf
.
Extra value parameters are ignored, but if the format string has more format specifiers than value parameters passed the behavior is undefined. For some C compilers, an extra format specifier results in consuming a value even though there isn't one. This can allow the format string attack. Generally, for C, arguments are passed on the stack. If too few arguments are passed, then printf can read past the end of the stackframe, thus allowing an attacker to read the stack.
Some compilers, like the GNU Compiler Collection, will statically check the format strings of printf-like functions and warn about problems (when using the flags -Wall
or -Wformat
). GCC will also warn about user-defined printf-style functions if the non-standard "format" __attribute__
is applied to the function.
The format string is often a string literal, which allows static analysis of the function call. However, the format string can be the value of a variable, which allows for dynamic formatting but also a security vulnerability known as an uncontrolled format string exploit.
Although an outputting function on the surface, printf
allows writing to a memory location specified by an argument via %n
. This functionality is occasionally used as a part of more elaborate format-string attacks. [24]
The %n
functionality also makes printf
accidentally Turing-complete even with a well-formed set of arguments. A game of tic-tac-toe written in the format string is a winner of the 27th IOCCC. [25]
Notable programming languages that include printf or printf-like functionality.
Excluded are languages that use format strings that deviate from the style in this article (such as AMPL and Elixir), languages that inherit their implementation from the JVM or other environment (such as Clojure and Scala), and languages that do not have a standard native printf implementation but have external libraries which emulate printf behavior (such as JavaScript).
%
operator) [28] printf
, sprintf
, and fmt
)print()
and FileStream.printf()
)C is a general-purpose programming language. It was created in the 1970s by Dennis Ritchie and remains very widely used and influential. By design, C's features cleanly reflect the capabilities of the targeted CPUs. It has found lasting use in operating systems code, device drivers, and protocol stacks, but its use in application software has been decreasing. C is commonly used on computer architectures that range from the largest supercomputers to the smallest microcontrollers and embedded systems.
A "Hello, World!" program is generally a simple computer program that emits to the screen a message similar to "Hello, World!". A small piece of code in most general-purpose programming languages, this program is used to illustrate a language's basic syntax. A "Hello, World!" program is often the first written by a student of a new programming language, but such a program can also be used as a sanity check to ensure that the computer software intended to compile or run source code is correctly installed, and that its operator understands how to use it.
A string literal or anonymous string is a literal for a string value in the source code of a computer program. Modern programming languages commonly use a quoted sequence of characters, formally "bracketed delimiters", as in x = "foo"
, where, "foo"
is a string literal with value foo
. Methods such as escape sequences can be used to avoid the problem of delimiter collision and allow the delimiters to be embedded in a string. There are many alternate notations for specifying string literals especially in complicated cases. The exact notation depends on the programming language in question. Nevertheless, there are general guidelines that most modern programming languages follow.
The C preprocessor is the macro preprocessor for several computer programming languages, such as C, Objective-C, C++, and a variety of Fortran languages. The preprocessor provides inclusion of header files, macro expansions, conditional compilation, and line control.
In computer programming, a parameter or a formal argument is a special kind of variable used in a subroutine to refer to one of the pieces of data provided as input to the subroutine. These pieces of data are the values of the arguments with which the subroutine is going to be called/invoked. An ordered list of parameters is usually included in the definition of a subroutine, so that, each time the subroutine is called, its arguments for that call are evaluated, and the resulting values can be assigned to the corresponding parameters.
The C standard library, sometimes referred to as libc, is the standard library for the C programming language, as specified in the ISO C standard. Starting from the original ANSI C standard, it was developed at the same time as the C library POSIX specification, which is a superset of it. Since ANSI C was adopted by the International Organization for Standardization, the C standard library is also called the ISO C library.
The syntax of the C programming language is the set of rules governing writing of software in C. It is designed to allow for programs that are extremely terse, have a close relationship with the resulting object code, and yet provide relatively high-level data abstraction. C was the first widely successful high-level language for portable operating-system development.
In computer science, a union is a value that may have any of multiple representations or formats within the same area of memory; that consists of a variable that may hold such a data structure. Some programming languages support a union type for such a data type. In other words, a union type specifies the permitted types that may be stored in its instances, e.g., float
and integer
. In contrast with a record, which could be defined to contain both a float and an integer; a union would hold only one at a time.
In mathematics and in computer programming, a variadic function is a function of indefinite arity, i.e., one which accepts a variable number of arguments. Support for variadic functions differs widely among programming languages.
Format is a function in Common Lisp that can produce formatted text using a format string similar to the print format string. It provides more functionality than print
, allowing the user to output numbers in various formats, apply certain format specifiers only under certain conditions, iterate over data structures, output data tabularly, and even recurse, calling format
internally to handle data structures that include their own preferred formatting strings. This functionally originates in MIT's Lisp Machine Lisp, where it was based on Multics.
scanf, short for scan formatted, is a C standard library function that reads and parses text from standard input.
sizeof is a unary operator in the programming languages C and C++. It generates the storage size of an expression or a data type, measured in the number of char-sized units. Consequently, the construct sizeof (char) is guaranteed to be 1. The actual number of bits of type char is specified by the preprocessor macro CHAR_BIT, defined in the standard include file limits.h. On most modern computing platforms this is eight bits. The result of sizeof has an unsigned integer type that is usually denoted by size_t.
C++11 is a version of a joint technical standard, ISO/IEC 14882, by the International Organization for Standardization (ISO) and International Electrotechnical Commission (IEC), for the C++ programming language. C++11 replaced the prior version of the C++ standard, named C++03, and was later replaced by C++14. The name follows the tradition of naming language versions by the publication year of the specification, though it was formerly named C++0x because it was expected to be published before 2010.
stdarg.h
is a header in the C standard library of the C programming language that allows functions to accept an indefinite number of arguments. It provides facilities for stepping through a list of function arguments of unknown number and type. C++ provides this functionality in the header cstdarg
.
This article compares a large number of programming languages by tabulating their data types, their expression, statement, and declaration syntax, and some common operating-system interfaces.
In computer programming, variadic templates are templates that take a variable number of arguments.
Getopt is a C library function used to parse command-line options of the Unix/POSIX style. It is a part of the POSIX specification, and is universal to Unix-like systems. It is also the name of a Unix program for parsing command line arguments in shell scripts.
In Unix and Unix-like operating systems, printf is a shell builtin that formats and outputs text like the same-named C function.
In computer programming, string interpolation is the process of evaluating a string literal containing one or more placeholders, yielding a result in which the placeholders are replaced with their corresponding values. It is a form of simple template processing or, in formal terms, a form of quasi-quotation. The placeholder may be a variable name, or in some languages an arbitrary expression, in either case evaluated in the current context.
In computer programming, ellipsis notation is used to denote ranges, an unspecified number of arguments, or a parent directory. Most programming languages require the ellipsis to be written as a series of periods; a single (Unicode) ellipsis character cannot be used.
printf
-style String Formatting", The Python Standard Library, Python Software Foundation, retrieved 24 February 2021std::fprintf
Formatter
specification in Java 1.5printf(1)
builtin