Offsetof

Last updated

C's offsetof() macro is an ANSI C library feature found in stddef.h . It evaluates to the offset (in bytes) of a given member within a struct or union type, an expression of type size_t . The offsetof() macro takes two parameters, the first being a structure or union name, and the second being the name of a subobject of the structure/union that is not a bit field. It cannot be described as a C prototype. [1]

Contents

Implementation

The "traditional" implementation of the macro relied on the compiler obtaining the offset of a member by specifying a hypothetical structure that begins at address zero:

#define offsetof(st, m) \    ((size_t)&(((st *)0)->m))

This can be understood as taking a null pointer of type structure st, and then obtaining the address of member m within said structure. While this implementation works correctly in many compilers, it has generated some debate regarding whether this is undefined behavior according to the C standard, [2] since it appears to involve a dereference of a null pointer (although, according to the standard, section 6.6 Constant Expressions, Paragraph 9, the value of the object is not accessed by the operation). It also tends to produce confusing compiler diagnostics if one of the arguments is misspelled. [ citation needed ]

An alternative is:

#define offsetof(st, m) \    ((size_t)((char *)&((st *)0)->m - (char *)0))

It may be specified this way because the standard does not specify that the internal representation of the null pointer is at address zero. Therefore the difference between the member address and the base address needs to be made. Again, since these are constant expressions it can be calculated at compile time and not necessarily at run-time.

Some modern compilers (such as GCC) define the macro using a special form (as a language extension) instead, e.g. [3]

#define offsetof(st, m) \    __builtin_offsetof(st, m)

This builtin is especially useful with C++ classes that declare a custom unary operator &. [4]

Usage

It is useful when implementing generic data structures in C. For example, the Linux kernel uses offsetof() to implement container_of(), which allows something like a mixin type to find the structure that contains it: [5]

#define container_of(ptr, type, member) ({ \                const typeof( ((type *)0)->member ) *__mptr = (ptr); \                (type *)( (char *)__mptr - offsetof(type,member) );})

This macro is used to retrieve an enclosing structure from a pointer to a nested element, such as this iteration of a linked list of my_struct objects:

structmy_struct{constchar*name;structlist_nodelist;};externstructlist_node*list_next(structlist_node*);structlist_node*current=/* ... */while(current!=NULL){structmy_struct*element=container_of(current,structmy_struct,list);printf("%s\n",element->name);current=list_next(&element->list);}

The linux kernel implementation of container_of uses a GNU C extension called statement expressions. [6] It's possible a statement expression was used to ensure type safety and therefore eliminate potential accidental bugs. There is, however, a way to implement the same behaviour without using statement expressions while still ensuring type safety:

#define container_of(ptr, type, member) ((type *)((char *)(1 ? (ptr) : &((type *)0)->member) - offsetof(type, member)))

At first glance, this implementation may seem more complex than necessary, and the unusual use of the conditional operator may seem out of place. A simpler implementation is possible:

#define container_of(ptr, type, member) ((type *)((char *)(ptr) - offsetof(type, member)))

This implementation would also serve the same purpose, however, there's a fundamental omission in terms of the original linux kernel implementation. The type of ptr is never checked against the type of the member, this is something that the linux kernel implementation would catch.

In the aforementioned type-checked implementation, the check is performed by the unusual use of the conditional operator. The constraints of the conditional operator specify that if the operands to the conditional operator are both pointers to a type, they must both be pointers to compatible types. In this case, despite the fact that the value of the third operand of the conditional expression will never be used, the compiler must perform a check to ensure that (ptr) and &((type *)0)->member are both compatible pointer types.

Limitations

Usage of offsetof is limited to POD types in C++98, standard-layout classes in C++11, [7] and more cases are conditionally-supported in C++17, [8] otherwise it has an undefined behavior. While most compilers will generate a correct result even in cases that don't respect the standard, there are edge cases when offsetof will either yield an incorrect value, generate a compile-time warning or error, or outright crash the program. This is especially the case for virtual inheritance. [9] The following program will generate several warnings and print obviously suspicious results when compiled with gcc 4.7.3 on an amd64 architecture:

#include<stddef.h>#include<stdio.h>structA{inta;virtualvoiddummy(){}};structB:publicvirtualA{intb;};intmain(){printf("offsetof(A, a) : %zu\n",offsetof(A,a));printf("offsetof(B, b) : %zu\n",offsetof(B,b));return0;}

Output is:

offsetof(A, a) : 8 offsetof(B, b) : 8 

Related Research Articles

C is a general-purpose computer programming language. It was created in the 1970s by Dennis Ritchie, and remains very widely used and influential. By design, C's features cleanly reflect the capabilities of the targeted CPUs. It has found lasting use in operating systems, device drivers, and protocol stacks, but its use in application software has been decreasing. C is commonly used on computer architectures that range from the largest supercomputers to the smallest microcontrollers and embedded systems.

In computing, a segmentation fault or access violation is a fault, or failure condition, raised by hardware with memory protection, notifying an operating system (OS) the software has attempted to access a restricted area of memory. On standard x86 computers, this is a form of general protection fault. The operating system kernel will, in response, usually perform some corrective action, generally passing the fault on to the offending process by sending the process a signal. Processes can in some cases install a custom signal handler, allowing them to recover on their own, but otherwise the OS default signal handler is used, generally causing abnormal termination of the process, and sometimes a core dump.

The C preprocessor is the macro preprocessor for several computer programming languages, such as C, Objective-C, C++, and a variety of Fortran languages. The preprocessor provides inclusion of header files, macro expansions, conditional compilation, and line control.

<span class="mw-page-title-main">C syntax</span> Set of rules defining correctly structured programs

The syntax of the C programming language is the set of rules governing writing of software in C. It is designed to allow for programs that are extremely terse, have a close relationship with the resulting object code, and yet provide relatively high-level data abstraction. C was the first widely successful high-level language for portable operating-system development.

<span class="mw-page-title-main">Pointer (computer programming)</span> Object which stores memory addresses in a computer program

In computer science, a pointer is an object in many programming languages that stores a memory address. This can be that of another value located in computer memory, or in some cases, that of memory-mapped computer hardware. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer. As an analogy, a page number in a book's index could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number and reading the text found on that page. The actual format and content of a pointer variable is dependent on the underlying computer architecture.

stat (system call) Unix system call

stat is a Unix system call that returns file attributes about an inode. The semantics of stat vary between operating systems. As an example, Unix command ls uses this system call to retrieve information on files that includes:

Sparse is a computer software tool designed to find possible coding faults in the Linux kernel. Unlike other such tools, this static analysis tool was initially designed to only flag constructs that were likely to be of interest to kernel developers, such as the mixing of pointers to user and kernel address spaces.

<span class="mw-page-title-main">Dangling pointer</span> Pointer that does not point to a valid object

Dangling pointers and wild pointers in computer programming are pointers that do not point to a valid object of the appropriate type. These are special cases of memory safety violations. More generally, dangling references and wild references are references that do not resolve to a valid destination.

In some programming languages, const is a type qualifier, which indicates that the data is read-only. While this can be used to declare constants, const in the C family of languages differs from similar constructs in other languages in that it is part of the type, and thus has complicated behavior when combined with pointers, references, composite data types, and type-checking. In other languages, the data is not in a single memory location, but copied at compile time for each use. Languages which use it include C, C++, D, JavaScript, Julia, and Rust.

The computer programming languages C and Pascal have similar times of origin, influences, and purposes. Both were used to design their own compilers early in their lifetimes. The original Pascal definition appeared in 1969 and a first compiler in 1970. The first version of C appeared in 1972.

<span class="mw-page-title-main">C data types</span> Data types supported by the C programming language

In the C programming language, data types constitute the semantics and characteristics of storage of data elements. They are expressed in the language syntax in form of declarations for memory locations or variables. Data types also determine the types of operations or methods of processing of data elements.

typedef is a reserved keyword in the programming languages C, C++, and Objective-C. It is used to create an additional name (alias) for another data type, but does not create a new type, except in the obscure case of a qualified typedef of an array type where the typedef qualifiers are transferred to the array element type. As such, it is often used to simplify the syntax of declaring complex data structures consisting of struct and union types, although it is also commonly used to provide specific descriptive type names for integer data types of varying sizes.

In computer programming, an opaque pointer is a special case of an opaque data type, a data type declared to be a pointer to a record or data structure of some unspecified type.

In computer programming, the term hooking covers a range of techniques used to alter or augment the behaviour of an operating system, of applications, or of other software components by intercepting function calls or messages or events passed between software components. Code that handles such intercepted function calls, events or messages is called a hook.

sizeof is a unary operator in the programming languages C and C++. It generates the storage size of an expression or a data type, measured in the number of char-sized units. Consequently, the construct sizeof (char) is guaranteed to be 1. The actual number of bits of type char is specified by the preprocessor macro CHAR_BIT, defined in the standard include file limits.h. On most modern computing platforms this is eight bits. The result of sizeof has an unsigned integer type that is usually denoted by size_t.

The C and C++ programming languages are closely related but have many significant differences. C++ began as a fork of an early, pre-standardized C, and was designed to be mostly source-and-link compatible with C compilers of the time. Due to this, development tools for the two languages are often integrated into a single product, with the programmer able to specify C or C++ as their source language.

C++11 is a version of the ISO/IEC 14882 standard for the C++ programming language. C++11 replaced the prior version of the C++ standard, called C++03, and was later replaced by C++14. The name follows the tradition of naming language versions by the publication year of the specification, though it was formerly named C++0x because it was expected to be published before 2010.

In computer programming, variadic templates are templates that take a variable number of arguments.

A code sanitizer is a programming tool that detects bugs in the form of undefined or suspicious behavior by a compiler inserting instrumentation code at runtime. The class of tools was first introduced by Google's AddressSanitizer of 2012, which uses directly mapped shadow memory to detect memory corruption such as buffer overflows or accesses to a dangling pointer (use-after-free).

<span class="mw-page-title-main">Zig (programming language)</span> A general-purpose programming language, toolchain to build Zig/C/C++ code

Zig is an imperative, general-purpose, statically typed, compiled system programming language designed by Andrew Kelley. It is intended to be a successor to the C programming language, with the goals of being even smaller and simpler to program in while also offering modern features, new optimizations and a variety of safety mechanisms while not as demanding of runtime safety as seen in other languages. It is distinct from languages like Go, Rust and Carbon, which have similar goals but also target the C++ space.

References

  1. "offsetof reference". MSDN . Retrieved 2010-09-19.
  2. "Does &((struct name *)NULL -> b) cause undefined behaviour in C11?" . Retrieved 2015-02-07.
  3. "GCC offsetof reference". Free Software Foundation . Retrieved 2010-09-19.
  4. "what is the purpose and return type of the __builtin_offsetof operator?" . Retrieved 2012-10-20.
  5. Greg Kroah-Hartman (June 2003). "container_of()". Linux Journal . Retrieved 2010-09-19.
  6. "Statements and Declarations in Expressions". Free Software Foundation . Retrieved 2016-01-01.
  7. "offsetof reference". cplusplus.com. Retrieved 2016-04-01.
  8. "offsetof reference". cppreference.com. Retrieved 2020-07-20.
  9. Steve Jessop (July 2009). "Why can't you use offsetof on non-POD structures in C++?". Stack Overflow . Retrieved 2016-04-01.