C standard library

Last updated

The C standard library or libc is the standard library for the C programming language, as specified in the ISO C standard. [1] Starting from the original ANSI C standard, it was developed at the same time as the C library POSIX specification, which is a superset of it. [2] [3] Since ANSI C was adopted by the International Organization for Standardization, [4] the C standard library is also called the ISO C library.

Contents

The C standard library provides macros, type definitions and functions for tasks such as string handling, mathematical computations, input/output processing, memory management, and several other operating system services.

Application programming interface (API)

Header files

The application programming interface (API) of the C standard library is declared in a number of header files. Each header file contains one or more function declarations, data type definitions, and macros.

After a long period of stability, three new header files (iso646.h, wchar.h, and wctype.h) were added with Normative Addendum 1 (NA1), an addition to the C Standard ratified in 1995. Six more header files (complex.h, fenv.h, inttypes.h, stdbool.h, stdint.h, and tgmath.h) were added with C99, a revision to the C Standard published in 1999, and five more files (stdalign.h, stdatomic.h, stdnoreturn.h, threads.h, and uchar.h) with C11 in 2011. In total, there are now 29 header files:

NameFromDescription
<assert.h> Declares the assert macro, used to assist with detecting logical errors and other types of bugs while debugging a program.
<complex.h> C99Defines a set of functions for manipulating complex numbers.
<ctype.h> Defines set of functions used to classify characters by their types or to convert between upper and lower case in a way that is independent of the used character set (typically ASCII or one of its extensions, although implementations utilizing EBCDIC are also known).
<errno.h> For testing error codes reported by library functions.
<fenv.h> C99Defines a set of functions for controlling floating-point environment.
<float.h> Defines macro constants specifying the implementation-specific properties of the floating-point library.
<inttypes.h> C99Defines exact-width integer types.
<iso646.h> NA1Defines several macros that implement alternative ways to express several standard tokens. For programming in ISO 646 variant character sets.
<limits.h> Defines macro constants specifying the implementation-specific properties of the integer types.
<locale.h>Defines localization functions.
<math.h> Defines common mathematical functions.
<setjmp.h> Declares the macros setjmp and longjmp, which are used for non-local exits.
<signal.h> Defines signal-handling functions.
<stdalign.h>C11For querying and specifying the alignment of objects.
<stdarg.h> For accessing a varying number of arguments passed to functions.
<stdatomic.h>C11For atomic operations on data shared between threads.
<stdbool.h> C99Defines a Boolean data type.
<stddef.h> Defines several useful types and macros.
<stdint.h> C99Defines exact-width integer types.
<stdio.h> Defines core input and output functions
<stdlib.h>Defines numeric conversion functions, pseudo-random numbers generation functions, memory allocation, process control functions
<stdnoreturn.h>C11For specifying non-returning functions
<string.h> Defines string-handling functions
<tgmath.h> C99Defines type-generic mathematical functions.
<threads.h>C11Defines functions for managing multiple threads, mutexes and condition variables
<time.h> Defines date- and time-handling functions
<uchar.h>C11Types and functions for manipulating Unicode characters
<wchar.h> NA1Defines wide-string-handling functions
<wctype.h> NA1Defines set of functions used to classify wide characters by their types or to convert between upper and lower case

Three of the header files (complex.h, stdatomic.h, and threads.h) are conditional features that implementations are not required to support.

The POSIX standard added several nonstandard C headers for Unix-specific functionality. Many have found their way to other architectures. Examples include fcntl.h and unistd.h . A number of other groups are using other nonstandard headers – the GNU C Library has alloca.h, and OpenVMS has the va_count() function.

Documentation

On Unix-like systems, the authoritative documentation of the API is provided in the form of man pages. On most systems, man pages on standard library functions are in section 3; section 7 may contain some more generic pages on underlying concepts (e.g. man 7 math_error in Linux).

Implementations

Unix-like systems typically have a C library in shared library form, but the header files (and compiler toolchain) may be absent from an installation so C development may not be possible. The C library is considered part of the operating system on Unix-like systems; in addition to functions specified by the C standard, it includes other functions that are part of the operating system API, such as functions specified in the POSIX standard. The C library functions, including the ISO C standard ones, are widely used by programs, and are regarded as if they were not only an implementation of something in the C language, but also de facto part of the operating system interface. Unix-like operating systems generally cannot function if the C library is erased. This is true for applications which are dynamically as opposed to statically linked. Further, the kernel itself (at least in the case of Linux) operates independently of any libraries.

On Microsoft Windows, the core system dynamic libraries (DLLs) provide an implementation of the C standard library for the Microsoft Visual C++ compiler v6.0; the C standard library for newer versions of the Microsoft Visual C++ compiler is provided by each compiler individually, as well as redistributable packages. Compiled applications written in C are either statically linked with a C library, or linked to a dynamic version of the library that is shipped with these applications, rather than relied upon to be present on the targeted systems. Functions in a compiler's C library are not regarded as interfaces to Microsoft Windows.

Many C library implementations exist, provided with both various operating systems and C compilers. Some of the popular implementations are the following:

Compiler built-in functions

Some compilers (for example, GCC [7] ) provide built-in versions of many of the functions in the C standard library; that is, the implementations of the functions are written into the compiled object file, and the program calls the built-in versions instead of the functions in the C library shared object file. This reduces function-call overhead, especially if function calls are replaced with inline variants, and allows other forms of optimization (as the compiler knows the control-flow characteristics of the built-in variants), but may cause confusion when debugging (for example, the built-in versions cannot be replaced with instrumented variants).

However, the built-in functions must behave like ordinary functions in accordance with ISO C. The main implication is that the program must be able to create a pointer to these functions by taking their address, and invoke the function by means of that pointer. If two pointers to the same function are derived in two different translation units in the program, these two pointers must compare equal; that is, the address comes by resolving the name of the function, which has external (program-wide) linkage.

Linking, libm

Under FreeBSD [8] and glibc, [9] some functions such as sin() are not linked in by default and are instead bundled in the mathematical library libm. If any of them are used, the linker must be given the directive -lm. POSIX requires that the c99 compiler supports -lm, and that the functions declared in the headers math.h, complex.h, and fenv.h are available for linking if -lm is specified, but does not specify if the functions are linked by default. [10] musl satisfies this requirement by putting everything into a single libc library and providing an empty libm. [11]

Detection

According to the C standard the macro __STDC_HOSTED__ shall be defined to 1 if the implementation is hosted. A hosted implementation has all the headers specified by the C standard. An implementation can also be freestanding which means that these headers will not be present. If an implementation is freestanding, it shall define __STDC_HOSTED__ to 0.

Problems and workarounds

Buffer overflow vulnerabilities

Some functions in the C standard library have been notorious for having buffer overflow vulnerabilities and generally encouraging buggy programming ever since their adoption. [lower-alpha 1] The most criticized items are:

Except the extreme case with gets(), all the security vulnerabilities can be avoided by introducing auxiliary code to perform memory management, bounds checking, input checking, etc. This is often done in the form of wrappers that make standard library functions safer and easier to use. This dates back to as early as The Practice of Programming book by B. Kernighan and R. Pike where the authors commonly use wrappers that print error messages and quit the program if an error occurs.

The ISO C committee published Technical reports TR 24731-1 [12] and is working on TR 24731-2 [13] to propose adoption of some functions with bounds checking and automatic buffer allocation, correspondingly. The former has met severe criticism with some praise, [14] [15] the latter received mixed responses. Despite this, TR 24731-1 has been implemented into Microsoft's C standard library and its compiler issues warnings when using old "insecure" functions.

Threading problems, vulnerability to race conditions

The strerror() routine is criticized for being thread unsafe and otherwise vulnerable to race conditions.

Error handling

The error handling of the functions in the C standard library is not consistent and sometimes confusing. According to the Linux manual page math_error, "The current (version 2.8) situation under glibc is messy. Most (but not all) functions raise exceptions on errors. Some also set errno. A few functions set errno, but do not raise an exception. A very few functions do neither." [16]

Standardization

The original C language provided no built-in functions such as I/O operations, unlike traditional languages such as COBOL and Fortran.[ citation needed ] Over time, user communities of C shared ideas and implementations of what is now called C standard libraries. Many of these ideas were incorporated eventually into the definition of the standardized C language.

Both Unix and C were created at AT&T's Bell Laboratories in the late 1960s and early 1970s. During the 1970s the C language became increasingly popular. Many universities and organizations began creating their own variants of the language for their own projects. By the beginning of the 1980s compatibility problems between the various C implementations became apparent. In 1983 the American National Standards Institute (ANSI) formed a committee to establish a standard specification of C known as "ANSI C". This work culminated in the creation of the so-called C89 standard in 1989. Part of the resulting standard was a set of software libraries called the ANSI C standard library.

POSIX standard library

POSIX, as well as SUS, specify a number of routines that should be available over and above those in the basic C standard library. The POSIX specification includes header files for, among other uses, multi-threading, networking, and regular expressions. These are often implemented alongside the C standard library functionality, with varying degrees of closeness. For example, glibc implements functions such as fork within libc.so, but before NPTL was merged into glibc it constituted a separate library with its own linker flag argument. Often, this POSIX-specified functionality will be regarded as part of the library; the basic C library may be identified as the ANSI or ISO C library.

BSD libc

BSD libc is a superset of the POSIX standard library supported by the C libraries included with BSD operating systems such as FreeBSD, NetBSD, OpenBSD and macOS. BSD libc has some extensions that are not defined in the original standard, many of which first appeared in 1994's 4.4BSD release (the first to be largely developed after the first standard was issued in 1989). Some of the extensions of BSD libc are:

The C standard library in other languages

Some languages include the functionality of the standard C library in their own libraries. The library may be adapted to better suit the language's structure, but the operational semantics are kept similar. The C++ language, for example, includes the functionality of the C standard library in the namespace std (e.g., std::printf, std::atoi, std::feof), in header files with similar names to the C ones (cstdio, cmath, cstdlib, etc.). Other languages that take similar approaches are D, Perl, Ruby and the main implementation of Python known as CPython. In Python 2, for example, the built-in file objects are defined as "implemented using C's stdio package", [38] so that the available operations (open, read, write, etc.) are expected to have the same behavior as the corresponding C functions. Rust has a crate called libc which allows several C functions, structs, and other type definitions to be used. [39]

Comparison to standard libraries of other languages

The C standard library is small compared to the standard libraries of some other languages. The C library provides a basic set of mathematical functions, string manipulation, type conversions, and file and console-based I/O. It does not include a standard set of "container types" like the C++ Standard Template Library, let alone the complete graphical user interface (GUI) toolkits, networking tools, and profusion of other functionality that Java and the .NET Framework provide as standard. The main advantage of the small standard library is that providing a working ISO C environment is much easier than it is with other languages, and consequently porting C to a new platform is comparatively easy.

See also

Notes

  1. Morris worm that takes advantage of the well-known vulnerability in gets() have been created as early as in 1988.
  2. in C standard library, string length calculation and looking for a string's end have linear time complexities and are inefficient when used on the same or related strings repeatedly

Related Research Articles

The Portable Operating System Interface is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines both the system and user-level application programming interfaces (APIs), along with command line shells and utility interfaces, for software compatibility (portability) with variants of Unix and other operating systems. POSIX is also a trademark of the IEEE. POSIX is intended to be used by both application and system developers.

In computing, tar is a computer software utility for collecting many files into one archive file, often referred to as a tarball, for distribution or backup purposes. The name is derived from "tape archive", as it was originally developed to write data to sequential I/O devices with no file system of their own, such as devices that use magnetic tape. The archive data sets created by tar contain various file system parameters, such as name, timestamps, ownership, file-access permissions, and directory organization. POSIX abandoned tar in favor of pax, yet tar sees continued widespread use.

In software development, Make is a build automation tool that builds executable programs and libraries from source code by reading files called makefiles which specify how to derive the target program. Though integrated development environments and language-specific compiler features can also be used to manage a build process, Make remains widely used, especially in Unix and Unix-like operating systems.

The GNU C Library, commonly known as glibc, is the GNU Project's implementation of the C standard library. It is a wrapper around the system calls of the Linux kernel for application use. Despite its name, it now also directly supports C++. It was started in the 1980s by the Free Software Foundation (FSF) for the GNU operating system.

C dynamic memory allocation refers to performing manual memory management for dynamic memory allocation in the C programming language via a group of functions in the C standard library, namely malloc, realloc, calloc, aligned_alloc and free.

errno.h is a header file in the standard library of the C programming language. It defines macros for reporting and retrieving error conditions using the symbol errno.

The archiver, also known simply as ar, is a Unix utility that maintains groups of files as a single archive file. Today, ar is generally used only to create and update static library files that the link editor or linker uses and for generating .deb packages for the Debian family; it can be used to create archives for any purpose, but has been largely replaced by tar for purposes other than static libraries. An implementation of ar is included as one of the GNU Binutils.

The printf family of functions in the C programming language are a set of functions that take a format string as input among a variable sized list of other values and produce as output a string that corresponds to the format specifier and given input values. The string is written in a simple template language: characters are usually copied literally into the function's output, but format specifiers, which start with a % character, indicate the location and method to translate a piece of data to characters. The design has been copied to expose similar functionality in other programming languages.

In computing, gettext is an internationalization and localization system commonly used for writing multilingual programs on Unix-like computer operating systems. One of the main benefits of gettext is that it separates programming from translating. The most commonly used implementation of gettext is GNU gettext, released by the GNU Project in 1995. The runtime library is libintl. gettext provides an option to use different strings for any number of plural forms of nouns, but this feature has no support for grammatical gender. The main filename extensions used by this system are .POT, .PO and .MO.

Newlib is a C standard library implementation intended for use on embedded systems. It is a conglomeration of several library parts, all under free software licenses that make them easily usable on embedded products.

<span class="mw-page-title-main">Linux kernel interfaces</span> An overview and comparison of the Linux kernal APIs and ABIs.

The Linux kernel provides multiple interfaces to user-space and kernel-mode code that are used for varying purposes and that have varying properties by design. There are two types of application programming interface (API) in the Linux kernel:

  1. the "kernel–user space" API; and
  2. the "kernel internal" API.

C mathematical operations are a group of functions in the standard library of the C programming language implementing basic mathematical functions. All functions use floating-point numbers in one manner or another. Different C standards provide different, albeit backwards-compatible, sets of functions. Most of these functions are also available in the C++ standard library, though in different headers.

conio.h is a C header file used mostly by MS-DOS compilers to provide console input/output. It is not part of the C standard library or ISO C, nor is it defined by POSIX.

In the C and C++ programming languages, unistd.h is the name of the header file that provides access to the POSIX operating system API. It is defined by the POSIX.1 standard, the base of the Single Unix Specification, and should therefore be available in any POSIX-compliant operating system and compiler. For instance, this includes Unix and Unix-like operating systems, such as GNU variants, distributions of Linux and BSD, and macOS, and compilers such as GCC and LLVM.

C11 is an informal name for ISO/IEC 9899:2011, a past standard for the C programming language. It replaced C99 and has been superseded by C17. C11 mainly standardizes features already supported by common contemporary compilers, and includes a detailed memory model to better support multiple threads of execution. Due to delayed availability of conforming C99 implementations, C11 makes certain features optional, to make it easier to comply with the core language standard.

Getopt is a C library function used to parse command-line options of the Unix/POSIX style. It is a part of the POSIX specification, and is universal to Unix-like systems. It is also the name of a Unix program for parsing command line arguments in shell scripts.

Bionic is an implementation of the standard C library, developed by Google for its Android operating system. It differs from the GNU C Library (glibc) in being designed for devices with less memory and processor power than a typical Linux system. It is a combination of new code and code from FreeBSD, NetBSD, and OpenBSD released under a BSD license, rather than glibc, which uses the GNU Lesser General Public License. This difference was important in the early days of Android, when static linking was common, and since bionic has its own ABI, it can't be replaced by a different libc without breaking all existing apps.

The C programming language has a set of functions implementing operations on strings in its standard library. Various operations, such as copying, concatenation, tokenization and searching are supported. For character strings, the standard library uses the convention that strings are null-terminated: a string of n characters is represented as an array of n + 1 elements, the last of which is a "NUL character" with numeric value 0.

musl Implementation of C standard library for Linux operating system

musl is a C standard library intended for operating systems based on the Linux kernel, released under the MIT License. It was developed by Rich Felker to write a clean, efficient, and standards-conformant libc implementation.

crypt is a POSIX C library function. It is typically used to compute the hash of user account passwords. The function outputs a text string which also encodes the salt, and identifies the hash algorithm used. This output string forms a password record, which is usually stored in a text file.

References

  1. ISO/IEC (2018). ISO/IEC 9899:2018(E): Programming Languages - C §7
  2. "The GNU C Library – Introduction". gnu.org. Retrieved 2013-12-05.
  3. "Difference between C standard library and C POSIX library". stackoverflow.com. 2012. Retrieved 2015-03-04.
  4. "C Standards". C: C Standards. Keil. Retrieved 24 November 2011.
  5. "Re: Does Newlib support mmu-less CPUs?". Cygwin.com. 23 March 2006. Archived from the original on 22 November 2008. Retrieved 28 October 2011.
  6. "musl libc". Etalabs.net. Retrieved 28 October 2011.
  7. Other built-in functions provided by GCC, GCC Manual
  8. "Compiling with cc" . Retrieved 2013-03-02.
  9. Weimer, Florian. "c - What functions is the libm intended for?". Stack Overflow. Retrieved 24 February 2021.
  10. "c99 - compile standard C programs". The Open Group Base Specifications Issue 7, 2018 edition. The Open Group. Retrieved 24 February 2021.
  11. "musl FAQ". www.musl-libc.org. Retrieved 24 February 2021.
  12. "ISO/IEC TR 24731-1: Extensions to the C Library, Part I: Bounds-checking interfaces" (PDF). open-std.org. 2007-03-28. Retrieved 2014-03-13.
  13. "ISO/IEC WDTR 24731-2: Extensions to the C Library, Part II: Dynamic Allocation Functions" (PDF). open-std.org. 2008-08-10. Retrieved 2014-03-13.
  14. Do you use the TR 24731 'safe' functions in your C code? - Stack overflow
  15. "Austin Group Review of ISO/IEC WDTR 24731" . Retrieved 28 October 2011.
  16. "math_error - detecting errors from mathematical functions". man7.org. 2008-08-11. Retrieved 2014-03-13.
  17. "tree". Man.freebsd.org. 2007-12-27. Retrieved 2013-08-25.
  18. "Super User's BSD Cross Reference: /OpenBSD/sys/sys/tree.h". bxr.su.
  19. "queue". Man.freebsd.org. 2011-05-13. Retrieved 2013-08-25.
  20. "Super User's BSD Cross Reference: /OpenBSD/sys/sys/queue.h". bxr.su.
  21. "fgetln". Man.freebsd.org. 1994-04-19. Retrieved 2013-08-25.
  22. "Super User's BSD Cross Reference: /OpenBSD/lib/libc/stdio/fgetln.c". bxr.su.
  23. "Super User's BSD Cross Reference: /OpenBSD/include/stdio.h". bxr.su.
  24. "fts". Man.freebsd.org. 2012-03-18. Retrieved 2013-08-25.
  25. "Super User's BSD Cross Reference: /OpenBSD/include/fts.h". bxr.su.
  26. "db". Man.freebsd.org. 2010-09-10. Retrieved 2013-08-25.
  27. "Super User's BSD Cross Reference: /OpenBSD/include/db.h". bxr.su.
  28. Miller, Todd C. and Theo de Raadt. strlcpy and strlcat - consistent, safe, string copy and concatenation. Proceedings of the 1999 USENIX Annual Technical Conference, June 6–11, 1999, pp. 175–178.
  29. "Super User's BSD Cross Reference: /OpenBSD/lib/libc/string/strlcat.c". bxr.su.
  30. "Super User's BSD Cross Reference: /OpenBSD/lib/libc/string/strlcpy.c". bxr.su.
  31. "Super User's BSD Cross Reference: /OpenBSD/lib/libc/string/strncat.c". bxr.su.
  32. "Super User's BSD Cross Reference: /OpenBSD/lib/libc/string/strncpy.c". bxr.su.
  33. "err". Man.freebsd.org. 2012-03-29. Retrieved 2013-08-25.
  34. "Super User's BSD Cross Reference: /OpenBSD/include/err.h". bxr.su.
  35. "vis(3)". Man.FreeBSD.org. Retrieved 14 September 2013.
  36. "Super User's BSD Cross Reference: /OpenBSD/lib/libc/gen/vis.c". bxr.su.
  37. "Super User's BSD Cross Reference: /OpenBSD/include/vis.h". bxr.su.
  38. "The Python Standard Library: 6.9. File Objects". Docs.python.org. Retrieved 28 October 2011.
  39. "libc". Rust Crates. Archived from the original on 18 August 2016. Retrieved 31 July 2016.

Further reading