C character classification

Last updated

C character classification is a group of operations in the C standard library that test a character for membership in a particular class of characters; such as alphabetic, control, etc. Both single-byte, and wide characters are supported. [1]

Contents

History

Early C programmers working on the Unix operating system developed programming idioms for classifying characters. For example, the following code evaluates as true for an ASCII letter character c:

('A'<=c&&c<='Z')||('a'<=c&&c<='z')

Eventually, the interface to common character classification functionality was codified in the C standard library file ctype.h.

Implementation

For performance, the standard character classification functions are usually implemented as macros instead of functions. But, due to limitations of macro evaluation, they are generally not implemented today as they were in early versions of Linux like:

#define isdigit(c) ((c) >= '0' && (c) <= '9')

This can lead to an error when the macro parameter x is expanded to an expression with a side effect; for example: isdigit(x++). If the implementation was a function, then x would be incremented only once. But for this macro definition it is incremented twice.

To eliminate this problem, a common implementation is for the macro to use table lookup. For example, the standard library provides an array of 256 integers one for each character value that each contain a bit-field for each supported classification. A macro references an integer by character value index and accesses the associated bit-field. For example, if the low bit indicates whether the character is a digit, then the isdigit macro could be written as:

#define isdigit(c) (TABLE[c] & 1)

The macro argument, c, is referenced only once, so is evaluated only once.

Overview of functions

The functions that operate on single-byte characters are defined in ctype.h header file (cctype in C++). The functions that operate on wide characters are defined in wctype.h header file (cwctype in C++).

The classification is evaluated according to the effective locale.

Byte
character
Wide
character
Description
isalnum iswalnum checks whether the operand is alphanumeric
isalpha iswalpha checks whether the operand is alphabetic
islower iswlower checks whether the operand is lowercase
isupper iswupper checks whether the operand is an uppercase
isdigit iswdigit checks whether the operand is a digit
isxdigit iswxdigit checks whether the operand is hexadecimal
iscntrl iswcntrl checks whether the operand is a control character
isgraph iswgraph checks whether the operand is a graphical character
isspace iswspace checks whether the operand is space
isblank iswblank checks whether the operand is a blank space character
isprint iswprint checks whether the operand is a printable character
ispunct iswpunct checks whether the operand is punctuation
tolower towlower converts the operand to lowercase
toupper towupper converts the operand to uppercase
iswctype checks whether the operand falls into specific class
towctrans converts the operand using a specific mapping
wctype returns a wide character class to be used with iswctype
wctrans returns a transformation mapping to be used with towctrans

Related Research Articles

C is a general-purpose programming language. It was created in the 1970s by Dennis Ritchie and remains very widely used and influential. By design, C's features cleanly reflect the capabilities of the targeted CPUs. It has found lasting use in operating systems code, device drivers, and protocol stacks, but its use in application software has been decreasing. C is commonly used on computer architectures that range from the largest supercomputers to the smallest microcontrollers and embedded systems.

<span class="mw-page-title-main">Common Lisp</span> Programming language standard

Common Lisp (CL) is a dialect of the Lisp programming language, published in American National Standards Institute (ANSI) standard document ANSI INCITS 226-1994 (S2018). The Common Lisp HyperSpec, a hyperlinked HTML version, has been derived from the ANSI Common Lisp standard.

<span class="mw-page-title-main">Scheme (programming language)</span> Dialect of Lisp

Scheme is a dialect of the Lisp family of programming languages. Scheme was created during the 1970s at the MIT Computer Science and Artificial Intelligence Laboratory and released by its developers, Guy L. Steele and Gerald Jay Sussman, via a series of memos now known as the Lambda Papers. It was the first dialect of Lisp to choose lexical scope and the first to require implementations to perform tail-call optimization, giving stronger support for functional programming and associated techniques such as recursive algorithms. It was also one of the first programming languages to support first-class continuations. It had a significant influence on the effort that led to the development of Common Lisp.

The C preprocessor is the macro preprocessor for several computer programming languages, such as C, Objective-C, C++, and a variety of Fortran languages. The preprocessor provides inclusion of header files, macro expansions, conditional compilation, and line control.

The C programming language provides many standard library functions for file input and output. These functions make up the bulk of the C standard library header <stdio.h>. The functionality descends from a "portable I/O package" written by Mike Lesk at Bell Labs in the early 1970s, and officially became part of the Unix operating system in Version 7.

The C standard library, sometimes referred to as libc, is the standard library for the C programming language, as specified in the ISO C standard. Starting from the original ANSI C standard, it was developed at the same time as the C library POSIX specification, which is a superset of it. Since ANSI C was adopted by the International Organization for Standardization, the C standard library is also called the ISO C library.

In computer programming, a magic number is any of the following:

In the C++ programming language, the C++ Standard Library is a collection of classes and functions, which are written in the core language and part of the C++ ISO Standard itself.

errno.h is a header file in the standard library of the C programming language. It defines macros for reporting and retrieving error conditions using the symbol errno.

<span class="mw-page-title-main">C syntax</span> Set of rules defining correctly structured programs

The syntax of the C programming language is the set of rules governing writing of software in C. It is designed to allow for programs that are extremely terse, have a close relationship with the resulting object code, and yet provide relatively high-level data abstraction. C was the first widely successful high-level language for portable operating-system development.

<span class="mw-page-title-main">Pointer (computer programming)</span> Object which stores memory addresses in a computer program

In computer science, a pointer is an object in many programming languages that stores a memory address. This can be that of another value located in computer memory, or in some cases, that of memory-mapped computer hardware. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer. As an analogy, a page number in a book's index could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number and reading the text found on that page. The actual format and content of a pointer variable is dependent on the underlying computer architecture.

The archiver, also known simply as ar, is a Unix utility that maintains groups of files as a single archive file. Today, ar is generally used only to create and update static library files that the link editor or linker uses and for generating .deb packages for the Debian family; it can be used to create archives for any purpose, but has been largely replaced by tar for purposes other than static libraries. An implementation of ar is included as one of the GNU Binutils.

This is a list of operators in the C and C++ programming languages. All the operators listed exist in C++; the column "Included in C", states whether an operator is also present in C. Note that C does not support operator overloading.

Mach-O, short for Mach object file format, is a file format for executables, object code, shared libraries, dynamically loaded code, and core dumps. It was developed to replace the a.out format.

The computer programming languages C and Pascal have similar times of origin, influences, and purposes. Both were used to design their own compilers early in their lifetimes. The original Pascal definition appeared in 1969 and a first compiler in 1970. The first version of C appeared in 1972.

In the C programming language, data types constitute the semantics and characteristics of storage of data elements. They are expressed in the language syntax in form of declarations for memory locations or variables. Data types also determine the types of operations or methods of processing of data elements.

C mathematical operations are a group of functions in the standard library of the C programming language implementing basic mathematical functions. All functions use floating-point numbers in one manner or another. Different C standards provide different, albeit backwards-compatible, sets of functions. Most of these functions are also available in the C++ standard library, though in different headers.

sizeof is a unary operator in the programming languages C and C++. It generates the storage size of an expression or a data type, measured in the number of char-sized units. Consequently, the construct sizeof (char) is guaranteed to be 1. The actual number of bits of type char is specified by the preprocessor macro CHAR_BIT, defined in the standard include file limits.h. On most modern computing platforms this is eight bits. The result of sizeof has an unsigned integer type that is usually denoted by size_t.

windows.h is a source code header file that Microsoft provides for the development of programs that access the Windows API (WinAPI) via C language syntax. It declares the WinAPI functions, associated data types and common macros.

An include directive instructs a text file processor to replace the directive text with the content of a specified file.

References

  1. ISO/IEC 9899:1999 specification (PDF). p. 193, § 7.4.