Scanf

Last updated May 19, 2024

scanf, short for scan formatted, is a C standard library function that reads and parses text from standard input.

History

Mike Lesk's portable input/output library, including scanf, officially became part of Unix in Version 7.^[1]

Usage

The scanf function reads input for numbers and other datatypes from standard input.

The following C code reads a variable number of unformatted decimal integers from standard input and prints each of them out on separate lines:

#include<stdio.h>intmain(void){intn;while(scanf("%d",&n)==1)printf("%d\n",n);return0;}

For input:

456 123 789 456 12456 1      2378

The output is:

4561237894561245612378

To print out a word:

#include<stdio.h>intmain(void){charword[20];if(scanf("%19s",word)==1)puts(word);return0;}

No matter what the data type the programmer wants the program to read, the arguments (such as &n above) must be pointers pointing to memory. Otherwise, the function will not perform correctly because it will be attempting to overwrite the wrong sections of memory, rather than pointing to the memory location of the variable you are attempting to get input for.

In the last example an address-of operator (&) is not used for the argument: as word is the name of an array of char, as such it is (in all contexts in which it evaluates to an address) equivalent to a pointer to the first element of the array. While the expression &word would numerically evaluate to the same value, semantically, it has an entirely different meaning in that it stands for the address of the whole array rather than an element of it. This fact needs to be kept in mind when assigning scanf output to strings.

As scanf is designated to read only from standard input, many programming languages with interfaces, such as PHP, have derivatives such as sscanf and fscanf but not scanf itself.

Format string specifications

The formatting placeholders in scanf are more or less the same as that in printf, its reverse function. As in printf, the POSIX extension n$ is defined.^[2]

There are rarely constants (i.e., characters that are not formatting placeholders) in a format string, mainly because a program is usually not designed to read known data, although scanf does accept these if explicitly specified. The exception is one or more whitespace characters, which discards all whitespace characters in the input.^[2]

Some of the most commonly used placeholders follow:

%a : Scan a floating-point number in its hexadecimal notation.
%d : Scan an integer as a signed decimal number.
%i : Scan an integer as a signed number. Similar to %d, but interprets the number as hexadecimal when preceded by 0x and octal when preceded by 0. For example, the string 031 would be read as 31 using %d, and 25 using %i. The flag h in %hi indicates conversion to a short and hh conversion to a char.
%u : Scan for decimal unsigned int (Note that in the C99 standard the input value minus sign is optional, so if a minus sign is read, no errors will arise and the result will be the two's complement of a negative number, likely a very large value. See strtoul().^{[ failed verification ]}) Correspondingly, %hu scans for an unsigned short and %hhu for an unsigned char.
%f : Scan a floating-point number in normal (fixed-point) notation.
%g, %G : Scan a floating-point number in either normal or exponential notation. %g uses lower-case letters and %G uses upper-case.
%x, %X : Scan an integer as an unsigned hexadecimal number.
%o : Scan an integer as an octal number.
%s : Scan a character string. The scan terminates at whitespace. A null character is stored at the end of the string, which means that the buffer supplied must be at least one character longer than the specified input length.
%c : Scan a character (char). No null character is added.
whitespace: Any whitespace characters trigger a scan for zero or more whitespace characters. The number and type of whitespace characters do not need to match in either direction.
%lf : Scan as a double floating-point number. "Float" format with the "long" specifier.
%Lf : Scan as a long double floating-point number. "Float" format the "long long" specifier.
%n : Nothing is expected. The number of characters consumed thus far from the input is stored through the next pointer, which must be a pointer to int. This is not a conversion and does not increase the count returned by the function.

The above can be used in compound with numeric modifiers and the l, L modifiers which stand for "long" and "long long" in between the percent symbol and the letter. There can also be numeric values between the percent symbol and the letters, preceding the long modifiers if any, that specifies the number of characters to be scanned. An optional asterisk (*) right after the percent symbol denotes that the datum read by this format specifier is not to be stored in a variable. No argument behind the format string should be included for this dropped variable.

The ff modifier in printf is not present in scanf, causing differences between modes of input and output. The ll and hh modifiers are not present in the C90 standard, but are present in the C99 standard.^[3]

An example of a format string is

"%7d%s %c%lf"

The above format string scans the first seven characters as a decimal integer, then reads the remaining as a string until a space, newline, or tab is found, then consumes whitespace until the first non-whitespace character is found, then consumes that character, and finally scans the remaining characters as a double. Therefore, a robust program must check whether the scanf call succeeded and take appropriate action. If the input was not in the correct format, the erroneous data will still be on the input stream and must discarded before new input can be read. An alternative method, which avoids this, is to use fgets and then examine the string read in. The last step can be done by sscanf , for example.

In the case of the many float type characters a, e, f, g, many implementations choose to collapse most into the same parser. Microsoft MSVCRT does it with e, f, g,^[4] while glibc does so with all four.^[2]

ISO C99 includes the inttypes.h header file that includes a number of macros for use in platform-independent scanf coding. These must be outside double-quotes, e.g. scanf("%"SCNd64"\n",&t);

Example macros include:

Macro	Description
SCNd32	Typically equivalent to I32d (Win32/Win64) or d
SCNd64	Typically equivalent to I64d (Win32/Win64), lld (32-bit platforms) or ld (64-bit platforms)
SCNi32	Typically equivalent to I32i (Win32/Win64) or i
SCNi64	Typically equivalent to I64i (Win32/Win64), lli (32-bit platforms) or li (64-bit platforms)
SCNu32	Typically equivalent to I32u (Win32/Win64) or u
SCNu64	Typically equivalent to I64u (Win32/Win64), llu (32-bit platforms) or lu (64-bit platforms)
SCNx32	Typically equivalent to I32x (Win32/Win64) or x
SCNx64	Typically equivalent to I64x (Win32/Win64), llx (32-bit platforms) or lx (64-bit platforms)

Vulnerabilities

scanf is vulnerable to format string attacks. Great care should be taken to ensure that the formatting string includes limitations for string and array sizes. In most cases the input string size from a user is arbitrary and cannot be determined before the scanf function is executed. This means that %s placeholders without length specifiers are inherently insecure and exploitable for buffer overflows. Another potential problem is to allow dynamic formatting strings, for example formatting strings stored in configuration files or other user-controlled files. In this case the allowed input length of string sizes cannot be specified unless the formatting string is checked beforehand and limitations are enforced. Related to this are additional or mismatched formatting placeholders which do not match the actual vararg list. These placeholders might be partially extracted from the stack or contain undesirable or even insecure pointers, depending on the particular implementation of varargs.

Related Research Articles

C is a general-purpose computer programming language. It was created in the 1970s by Dennis Ritchie, and remains very widely used and influential. By design, C's features cleanly reflect the capabilities of the targeted CPUs. It has found lasting use in operating systems, device drivers, and protocol stacks, but its use in application software has been decreasing. C is commonly used on computer architectures that range from the largest supercomputers to the smallest microcontrollers and embedded systems.

In computing, NaN, standing for Not a Number, is a particular value of a numeric data type which is undefined as a number, such as the result of 0/0. Systematic use of NaNs was introduced by the IEEE 754 floating-point standard in 1985, along with the representation of other non-finite quantities such as infinities.

A string literal or anonymous string is a literal for a string value in the source code of a computer program. Modern programming languages commonly use a quoted sequence of characters, formally "bracketed delimiters", as in x = "foo", where "foo" is a string literal with value foo. Methods such as escape sequences can be used to avoid the problem of delimiter collision and allow the delimiters to be embedded in a string. There are many alternate notations for specifying string literals especially in complicated cases. The exact notation depends on the programming language in question. Nevertheless, there are general guidelines that most modern programming languages follow.

The C preprocessor is the macro preprocessor for several computer programming languages, such as C, Objective-C, C++, and a variety of Fortran languages. The preprocessor provides inclusion of header files, macro expansions, conditional compilation, and line control.

The C programming language provides many standard library functions for file input and output. These functions make up the bulk of the C standard library header <stdio.h>. The functionality descends from a "portable I/O package" written by Mike Lesk at Bell Labs in the early 1970s, and officially became part of the Unix operating system in Version 7.

In computer science, a type signature or type annotation defines the inputs and outputs of a function, subroutine or method. A type signature includes the number, types, and order of the function's arguments. One important use of a type signature is for function overload resolution, where one particular definition of a function to be called is selected among many overloaded forms.

In computer science, primitive data types are a set of basic data types from which all other data types are constructed. Specifically it often refers to the limited set of data representations in use by a particular processor, which all compiled programs must use. Most processors support a similar set of primitive data types, although the specific representations vary. More generally, "primitive data types" may refer to the standard data types built into a programming language. Data types which are not primitive are referred to as derived or composite.

The syntax of the C programming language is the set of rules governing writing of software in C. It is designed to allow for programs that are extremely terse, have a close relationship with the resulting object code, and yet provide relatively high-level data abstraction. C was the first widely successful high-level language for portable operating-system development.

printf is a C standard library function that formats text and writes it to standard output.

IEC 61131-3 is the third part of the international standard IEC 61131 for programmable logic controllers. It was first published in December 1993 by the IEC; the current (third) edition was published in February 2013.

The computer programming languages C and Pascal have similar times of origin, influences, and purposes. Both were used to design their own compilers early in their lifetimes. The original Pascal definition appeared in 1969 and a first compiler in 1970. The first version of C appeared in 1972.

In the C programming language, data types constitute the semantics and characteristics of storage of data elements. They are expressed in the language syntax in form of declarations for memory locations or variables. Data types also determine the types of operations or methods of processing of data elements.

Format is a function in Common Lisp that can produce formatted text using a format string similar to the printf format string. It provides more functionality than printf, allowing the user to output numbers in various formats, apply certain format specifiers only under certain conditions, iterate over data structures, output data tabularly, and even recurse, calling format internally to handle data structures that include their own preferred formatting strings. This functionally originates in MIT's Lisp Machine Lisp, where it was based on Multics ioa_.

String functions are used in computer programming languages to manipulate a string or query information about a string.

sizeof is a unary operator in the programming languages C and C++. It generates the storage size of an expression or a data type, measured in the number of char-sized units. Consequently, the construct sizeof (char) is guaranteed to be 1. The actual number of bits of type char is specified by the preprocessor macro CHAR_BIT, defined in the standard include file limits.h. On most modern computing platforms this is eight bits. The result of sizeof has an unsigned integer type that is usually denoted by size_t.

stdarg.h is a header in the C standard library of the C programming language that allows functions to accept an indefinite number of arguments. It provides facilities for stepping through a list of function arguments of unknown number and type. C++ provides this functionality in the header cstdarg.

This article compares a large number of programming languages by tabulating their data types, their expression, statement, and declaration syntax, and some common operating-system interfaces.

In computer programming, ellipsis notation is used to denote ranges, an unspecified number of arguments, or a parent directory. Most programming languages require the ellipsis to be written as a series of periods; a single (Unicode) ellipsis character cannot be used.

printk is a C function from the Linux kernel interface that prints messages to the kernel log. It accepts a string parameter called the format string, which specifies a method for rendering an arbitrary number of varied data type parameter(s) into a string. The string is then printed to the kernel log.

In software engineering, the module pattern is a design pattern used to implement the concept of software modules, defined by modular programming, in a programming language with incomplete direct support for the concept.

References

↑ McIlroy, M. D. (1987). A Research Unix reader: annotated excerpts from the Programmer's Manual, 1971–1986 (PDF) (Technical report). CSTR. Bell Labs. 139.
1 2 3 scanf(3) – Linux Programmer's Manual – Library Functions
↑ C99 standard, §7.19.6.2 "The fscanf function" alinea 11.
↑ "scanf Type Field Characters". docs.microsoft.com. 26 October 2022.

External links

scanf – System Interfaces Reference, The Single UNIX Specification , Version 4 from The Open Group
C++ reference for std::scanf

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[reader-1] McIlroy, M. D. (1987). A Research Unix reader: annotated excerpts from the Programmer's Manual, 1971–1986 (PDF) (Technical report). CSTR. Bell Labs. 139.

[linux-2] 1 2 3 scanf(3) – Linux Programmer's Manual – Library Functions

[3] C99 standard, §7.19.6.2 "The fscanf function" alinea 11.

[4] "scanf Type Field Characters". docs.microsoft.com. 26 October 2022.

[1]

[2]

[3]

[4]