Data structure alignment

Last updated

Data structure alignment is the way data is arranged and accessed in computer memory. It consists of three separate but related issues: data alignment, data structure padding, and packing.

Contents

The CPU in modern computer hardware performs reads and writes to memory most efficiently when the data is naturally aligned, which generally means that the data's memory address is a multiple of the data size. For instance, in a 32-bit architecture, the data may be aligned if the data is stored in four consecutive bytes and the first byte lies on a 4-byte boundary.

Data alignment is the aligning of elements according to their natural alignment. To ensure natural alignment, it may be necessary to insert some padding between structure elements or after the last element of a structure. For example, on a 32-bit machine, a data structure containing a 16-bit value followed by a 32-bit value could have 16 bits of padding between the 16-bit value and the 32-bit value to align the 32-bit value on a 32-bit boundary. Alternatively, one can pack the structure, omitting the padding, which may lead to slower access, but uses three quarters as much memory.

Although data structure alignment is a fundamental issue for all modern computers, many computer languages and computer language implementations handle data alignment automatically. Fortran, Ada, [1] [2] PL/I, [3] Pascal, [4] certain C and C++ implementations, D, [5] Rust, [6] C#, [7] and assembly language allow at least partial control of data structure padding, which may be useful in certain special circumstances.

Definitions

A memory address a is said to be n-byte aligned when a is a multiple of n (where n is a power of 2). In this context, a byte is the smallest unit of memory access, i.e. each memory address specifies a different byte. An n-byte aligned address would have a minimum of log2(n) least-significant zeros when expressed in binary.

The alternate wording b-bit aligned designates a b/8 byte aligned address (ex. 64-bit aligned is 8 bytes aligned).

A memory access is said to be aligned when the data being accessed is n bytes long and the datum address is n-byte aligned. When a memory access is not aligned, it is said to be misaligned. Note that by definition byte memory accesses are always aligned.

A memory pointer that refers to primitive data that is n bytes long is said to be aligned if it is only allowed to contain addresses that are n-byte aligned, otherwise it is said to be unaligned. A memory pointer that refers to a data aggregate (a data structure or array) is aligned if (and only if) each primitive datum in the aggregate is aligned.

Note that the definitions above assume that each primitive datum is a power of two bytes long. When this is not the case (as with 80-bit floating-point on x86) the context influences the conditions where the datum is considered aligned or not.

Data structures can be stored in memory on the stack with a static size known as bounded or on the heap with a dynamic size known as unbounded.

Problems

The CPU accesses memory by a single memory word at a time. As long as the memory word size is at least as large as the largest primitive data type supported by the computer, aligned accesses will always access a single memory word. This may not be true for misaligned data accesses.

If the highest and lowest bytes in a datum are not within the same memory word, the computer must split the datum access into multiple memory accesses. This requires a lot of complex circuitry to generate the memory accesses and coordinate them. To handle the case where the memory words are in different memory pages the processor must either verify that both pages are present before executing the instruction or be able to handle a TLB miss or a page fault on any memory access during the instruction execution.

Some processor designs deliberately avoid introducing such complexity, and instead yield alternative behavior in the event of a misaligned memory access. For example, implementations of the ARM architecture prior to the ARMv6 ISA require mandatory aligned memory access for all multi-byte load and store instructions. [8] Depending on which specific instruction was issued, the result of attempted misaligned access might be to round down the least significant bits of the offending address turning it into an aligned access (sometimes with additional caveats), or to throw an MMU exception (if MMU hardware is present), or to silently yield other potentially unpredictable results. The ARMv6 and later architectures support unaligned access in many circumstances, but not necessarily all.

When a single memory word is accessed the operation is atomic, i.e. the whole memory word is read or written at once and other devices must wait until the read or write operation completes before they can access it. This may not be true for unaligned accesses to multiple memory words, e.g. the first word might be read by one device, both words written by another device and then the second word read by the first device so that the value read is neither the original value nor the updated value. Although such failures are rare, they can be very difficult to identify.

Data structure padding

Although the compiler (or interpreter) normally allocates individual data items on aligned boundaries, data structures often have members with different alignment requirements. To maintain proper alignment the translator normally inserts additional unnamed data members so that each member is properly aligned. In addition, the data structure as a whole may be padded with a final unnamed member. This allows each member of an array of structures to be properly aligned.

Padding is only inserted when a structure member is followed by a member with a larger alignment requirement or at the end of the structure. By changing the ordering of members in a structure, it is possible to change the amount of padding required to maintain alignment. For example, if members are sorted by descending alignment requirements a minimal amount of padding is required. The minimal amount of padding required is always less than the largest alignment in the structure. Computing the maximum amount of padding required is more complicated, but is always less than the sum of the alignment requirements for all members minus twice the sum of the alignment requirements for the least aligned half of the structure members.

Although C and C++ do not allow the compiler to reorder structure members to save space, other languages might. It is also possible to tell most C and C++ compilers to "pack" the members of a structure to a certain level of alignment, e.g. "pack(2)" means align data members larger than a byte to a two-byte boundary so that any padding members are at most one byte long. Likewise, in PL/I a structure may be declared UNALIGNED to eliminate all padding except around bit strings.

One use for such "packed" structures is to conserve memory. For example, a structure containing a single byte (such as a char) and a four-byte integer (such as uint32_t) would require three additional bytes of padding. A large array of such structures would use 37.5% less memory if they are packed, although accessing each structure might take longer. This compromise may be considered a form of space–time tradeoff.

Although use of "packed" structures is most frequently used to conserve memory space, it may also be used to format a data structure for transmission using a standard protocol. However, in this usage, care must also be taken to ensure that the values of the struct members are stored with the endianness required by the protocol (often network byte order), which may be different from the endianness used natively by the host machine.

Computing padding

The following formulas provide the number of padding bytes required to align the start of a data structure (where mod is the modulo operator):

padding = (align - (offset mod align)) mod align aligned = offset + padding         = offset + ((align - (offset mod align)) mod align)

For example, the padding to add to offset 0x59d for a 4-byte aligned structure is 3. The structure will then start at 0x5a0, which is a multiple of 4. However, when the alignment of offset is already equal to that of align, the second modulo in (align - (offset mod align)) mod align will return zero, therefore the original value is left unchanged.

Since the alignment is by definition a power of two, [a] the modulo operation can be reduced to a bitwise AND operation.

The following formulas produce the correct values (where & is a bitwise AND and ~ is a bitwise NOT) – providing the offset is unsigned or the system uses two's complement arithmetic:

padding = (align - (offset & (align - 1))) & (align - 1)         = -offset & (align - 1) aligned = (offset + (align - 1)) & ~(align - 1)         = (offset + (align - 1)) & -align

Typical alignment of C structs on x86

Data structure members are stored sequentially in memory so that, in the structure below, the member Data1 will always precede Data2; and Data2 will always precede Data3:

structMyData{shortData1;shortData2;shortData3;};

If the type "short" is stored in two bytes of memory then each member of the data structure depicted above would be 2-byte aligned. Data1 would be at offset 0, Data2 at offset 2, and Data3 at offset 4. The size of this structure would be 6 bytes.

The type of each member of the structure usually has a default alignment, meaning that it will, unless otherwise requested by the programmer, be aligned on a pre-determined boundary. The following typical alignments are valid for compilers from Microsoft (Visual C++), Borland/CodeGear (C++Builder), Digital Mars (DMC), and GNU (GCC) when compiling for 32-bit x86:

The only notable differences in alignment for an LP64 64-bit system when compared to a 32-bit system are:

Some data types are dependent on the implementation.

Here is a structure with members of various types, totaling 8 bytes before compilation:

structMixedData{charData1;shortData2;intData3;charData4;};

After compilation the data structure will be supplemented with padding bytes to ensure a proper alignment for each of its members:

structMixedData/* After compilation in 32-bit x86 machine */{charData1;/* 1 byte */charPadding1[1];/* 1 byte for the following 'short' to be aligned on a 2-byte boundaryassuming that the address where structure begins is an even number */shortData2;/* 2 bytes */intData3;/* 4 bytes - largest structure member */charData4;/* 1 byte */charPadding2[3];/* 3 bytes to make total size of the structure 12 bytes */};

The compiled size of the structure is now 12 bytes.

The last member is padded with the number of bytes required so that the total size of the structure should be a multiple of the largest alignment of any structure member (alignof(int) in this case, which = 4 on linux-32bit/gcc)[ citation needed ].

In this case 3 bytes are added to the last member to pad the structure to the size of 12 bytes (alignof(int) * 3).

structFinalPad{floatx;charn[1];};

In this example the total size of the structure sizeof(FinalPad) == 8, not 5 (so that the size is a multiple of 4 (alignof(float))).

structFinalPadShort{shorts;charn[3];};

In this example the total size of the structure sizeof(FinalPadShort) == 6, not 5 (not 8 either) (so that the size is a multiple of 2 (alignof(short) == 2 on linux-32bit/gcc)).

It is possible to change the alignment of structures to reduce the memory they require (or to conform to an existing format) by reordering structure members or changing the compiler's alignment (or “packing”) of structure members.

structMixedData/* after reordering */{charData1;charData4;/* reordered */shortData2;intData3;};

The compiled size of the structure now matches the pre-compiled size of 8 bytes. Note that Padding1[1] has been replaced (and thus eliminated) by Data4 and Padding2[3] is no longer necessary as the structure is already aligned to the size of a long word.

The alternative method of enforcing the MixedData structure to be aligned to a one byte boundary will cause the pre-processor to discard the pre-determined alignment of the structure members and thus no padding bytes would be inserted.

While there is no standard way of defining the alignment of structure members (while C and C++ allow using the alignas specifier for this purpose it can be used only for specifying a stricter alignment), some compilers use #pragma directives to specify packing inside source files. Here is an example:

#pragma pack(push)  /* push current alignment to stack */#pragma pack(1)     /* set alignment to 1-byte boundary */structMyPackedData{charData1;longData2;charData3;};#pragma pack(pop)   /* restore original alignment from stack */

This structure would have a compiled size of 6 bytes on a 32-bit system. The above directives are available in compilers from Microsoft, [9] Borland, GNU, [10] and many others.

Another example:

structMyPackedData{charData1;longData2;charData3;}__attribute__((packed));

Default packing and #pragma pack

On some Microsoft compilers, particularly for RISC processors, there is an unexpected relationship between project default packing (the /Zp directive) and the #pragma pack directive. The #pragma pack directive can only be used to reduce the packing size of a structure from the project default packing. [11] This leads to interoperability problems with library headers which use, for example, #pragma pack(8), if the project packing is smaller than this. For this reason, setting the project packing to any value other than the default of 8 bytes would break the #pragma pack directives used in library headers and result in binary incompatibilities between structures. This limitation is not present when compiling for x86.

Allocating memory aligned to cache lines

It would be beneficial to allocate memory aligned to cache lines. If an array is partitioned for more than one thread to operate on, having the sub-array boundaries unaligned to cache lines could lead to performance degradation. Here is an example to allocate memory (double array of size 10) aligned to cache of 64 bytes.

#include<stdlib.h>double*foo(void){//create array of size 10double*array;if(0==posix_memalign((void**)&array,64,10*sizeof(double)))returnarray;returnNULL;}

Hardware significance of alignment requirements

Alignment concerns can affect areas much larger than a C structure when the purpose is the efficient mapping of that area through a hardware address translation mechanism (PCI remapping, operation of a MMU).

For instance, on a 32-bit operating system, a 4  KiB (4096 bytes) page is not just an arbitrary 4 KiB chunk of data. Instead, it is usually a region of memory that's aligned on a 4 KiB boundary. This is because aligning a page on a page-sized boundary lets the hardware map a virtual address to a physical address by substituting the higher bits in the address, rather than doing complex arithmetic.

Example: Assume that we have a TLB mapping of virtual address 0x2CFC7000 to physical address 0x12345000. (Note that both these addresses are aligned at 4 KiB boundaries.) Accessing data located at virtual address va=0x2CFC7ABC causes a TLB resolution of 0x2CFC7 to 0x12345 to issue a physical access to pa=0x12345ABC. Here, the 20/12-bit split luckily matches the hexadecimal representation split at 5/3 digits. The hardware can implement this translation by simply combining the first 20 bits of the physical address (0x12345) and the last 12 bits of the virtual address (0xABC). This is also referred to as virtually indexed (ABC) physically tagged (12345).

A block of data of size 2(n+1) − 1 always has one sub-block of size 2n aligned on 2n bytes.

This is how a dynamic allocator that has no knowledge of alignment, can be used to provide aligned buffers, at the price of a factor two in space loss.

// Example: get 4096 bytes aligned on a 4096-byte buffer with malloc()// unaligned pointer to large areavoid*up=malloc((1<<13)-1);// well-aligned pointer to 4 KiBvoid*ap=aligntonext(up,12);

where aligntonext(p, r) works by adding an aligned increment, then clearing the r least significant bits of p. A possible implementation is

// Assume `uint32_t p, bits;` for readability#define alignto(p, bits)      (((p) >> bits) << bits)#define aligntonext(p, bits)  alignto(((p) + (1 << bits) - 1), bits)

Notes

  1. On modern computers where the target alignment is a power of two. This might not be true, for example, on a system using 9-bit bytes or 60-bit words.

Related Research Articles

C is a general-purpose programming language. It was created in the 1970s by Dennis Ritchie and remains very widely used and influential. By design, C's features cleanly reflect the capabilities of the targeted CPUs. It has found lasting use in operating systems code, device drivers, and protocol stacks, but its use in application software has been decreasing. C is commonly used on computer architectures that range from the largest supercomputers to the smallest microcontrollers and embedded systems.

The BMP file format, or bitmap, is a raster graphics image file format used to store bitmap digital images, independently of the display device, especially on Microsoft Windows and OS/2 operating systems.

In computing, a bus error is a fault raised by hardware, notifying an operating system (OS) that a process is trying to access memory that the CPU cannot physically address: an invalid address for the address bus, hence the name. In modern use on most architectures these are much rarer than segmentation faults, which occur primarily due to memory access violations: problems in the logical address or permissions.

In computer programming, the stride of an array is the number of locations in memory between beginnings of successive array elements, measured in bytes or in units of the size of the array's elements. The stride cannot be smaller than the element size but can be larger, indicating extra space between elements.

<span class="mw-page-title-main">C syntax</span> Set of rules defining correctly structured programs

The syntax of the C programming language is the set of rules governing writing of software in C. It is designed to allow for programs that are extremely terse, have a close relationship with the resulting object code, and yet provide relatively high-level data abstraction. C was the first widely successful high-level language for portable operating-system development.

<span class="mw-page-title-main">Pointer (computer programming)</span> Object which stores memory addresses in a computer program

In computer science, a pointer is an object in many programming languages that stores a memory address. This can be that of another value located in computer memory, or in some cases, that of memory-mapped computer hardware. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer. As an analogy, a page number in a book's index could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number and reading the text found on that page. The actual format and content of a pointer variable is dependent on the underlying computer architecture.

In computer science, a union is a value that may have any of multiple representations or formats within the same area of memory; that consists of a variable that may hold such a data structure. Some programming languages support a union type for such a data type. In other words, a union type specifies the permitted types that may be stored in its instances, e.g., float and integer. In contrast with a record, which could be defined to contain both a float and an integer; a union would hold only one at a time.

Addressing modes are an aspect of the instruction set architecture in most central processing unit (CPU) designs. The various addressing modes that are defined in a given instruction set architecture define how the machine language instructions in that architecture identify the operand(s) of each instruction. An addressing mode specifies how to calculate the effective memory address of an operand by using information held in registers and/or constants contained within a machine instruction or elsewhere.

In the C programming language, struct is the keyword used to define a composite, a.k.a. record, data type – a named set of values that occupy a block of memory. It allows for the different values to be accessed via a single identifier, often a pointer. A struct can contain other data types so is used for mixed-data-type records. For example a bank customer struct might contains fields: name, address, telephone, balance.

A bit array is an array data structure that compactly stores bits. It can be used to implement a simple set data structure. A bit array is effective at exploiting bit-level parallelism in hardware to perform operations quickly. A typical bit array stores kw bits, where w is the number of bits in the unit of storage, such as a byte or word, and k is some nonnegative integer. If w does not divide the number of bits to be stored, some space is wasted due to internal fragmentation.

In the C programming language, data types constitute the semantics and characteristics of storage of data elements. They are expressed in the language syntax in form of declarations for memory locations or variables. Data types also determine the types of operations or methods of processing of data elements.

A bit field is a data structure that maps to one or more adjacent bits which have been allocated for specific purposes, so that any single bit or group of bits within the structure can be set or inspected. A bit field is most commonly used to represent integral types of known, fixed bit-width, such as single-bit Booleans.

A class in C++ is a user-defined type or data structure declared with any of the keywords class, struct or union that has data and functions as its members whose access is governed by the three access specifiers private, protected or public. By default access to members of a C++ class declared with the keyword class is private. The private members are not accessible outside the class; they can be accessed only through member functions of the class. The public members form an interface to the class and are accessible outside the class.

sizeof is a unary operator in the programming languages C and C++. It generates the storage size of an expression or a data type, measured in the number of char-sized units. Consequently, the construct sizeof (char) is guaranteed to be 1. The actual number of bits of type char is specified by the preprocessor macro CHAR_BIT, defined in the standard include file limits.h. On most modern computing platforms this is eight bits. The result of sizeof has an unsigned integer type that is usually denoted by size_t.

In computer science, a type punning is any programming technique that subverts or circumvents the type system of a programming language in order to achieve an effect that would be difficult or impossible to achieve within the bounds of the formal language.

C's offsetof macro is an ANSI C library feature found in stddef.h. It evaluates to the offset of a given member within a struct or union type, an expression of type size_t. The offsetof macro takes two parameters, the first being a structure or union name, and the second being the name of a subobject of the structure/union that is not a bit field. It cannot be described as a C prototype.

<span class="mw-page-title-main">Word addressing</span> Computer memory accessed in word units

In computer architecture, word addressing means that addresses of memory on a computer uniquely identify words of memory. It is usually used in contrast with byte addressing, where addresses uniquely identify bytes. Almost all modern computer architectures use byte addressing, and word addressing is largely only of historical interest. A computer that uses word addressing is sometimes called a word machine.

In computing, a data descriptor is a structure containing information that describes data.

A code sanitizer is a programming tool that detects bugs in the form of undefined or suspicious behavior by a compiler inserting instrumentation code at runtime. The class of tools was first introduced by Google's AddressSanitizer of 2012, which uses directly mapped shadow memory to detect memory corruption such as buffer overflows or accesses to a dangling pointer (use-after-free).

C struct data types may end with a flexible array member with no specified size:

References

  1. "Ada Representation Clauses and Pragmas". GNAT Reference Manual 7.4.0w documentation. Retrieved 2015-08-30.
  2. "F.8 Representation Clauses". SPARCompiler Ada Programmer's Guide (PDF). Retrieved 2015-08-30.
  3. IBM System/360 Operating System PL/I Language Specifications (PDF). IBM. July 1966. pp. 55–56. C28-6571-3.
  4. Niklaus Wirth (July 1973). "The Programming Language Pascal (Revised Report)" (PDF). p. 12.
  5. "Attributes – D Programming Language: Align Attribute" . Retrieved 2012-04-13.
  6. "The Rustonomicon – Alternative Representations" . Retrieved 2016-06-19.
  7. "LayoutKind Enum (System.Runtime.InteropServices)". docs.microsoft.com. Retrieved 2019-04-01.
  8. Kurusa, Levente (2016-12-27). "The curious case of unaligned access on ARM". Medium. Retrieved 2019-08-07.
  9. pack
  10. 6.58.8 Structure-Packing Pragmas
  11. "Working with Packing Structures". MSDN Library . Microsoft. 2007-07-09. Retrieved 2011-01-11.

Further reading