Restrict

Last updated

In the C programming language, restrict is a keyword, introduced by the C99 standard, [1] that can be used in pointer declarations. By adding this type qualifier, a programmer hints to the compiler that for the lifetime of the pointer, no other pointer will be used to access the object to which it points. This allows the compiler to make optimizations (for example, vectorization) that would not otherwise have been possible.

Contents

restrict limits the effects of pointer aliasing, aiding optimizations. If the declaration of intent is not followed and the object is accessed by an independent pointer, this will result in undefined behavior.

Optimization

If the compiler knows that there is only one pointer to a memory block, it can produce better optimized code. For instance:

voidupdatePtrs(size_t*ptrA,size_t*ptrB,size_t*val){*ptrA+=*val;*ptrB+=*val;}

In the above code, the pointers ptrA, ptrB, and valmight refer to the same memory location, so the compiler may generate less optimal code:

; Hypothetical RISC Machine.ldrr12,[val]; Load memory at val to r12.ldrr3,[ptrA]; Load memory at ptrA to r3.addr3,r3,r12 ; Perform addition: r3 = r3 + r12.strr3,[ptrA]; Store r3 to memory location ptrA, updating the value.ldrr3,[ptrB]; 'load' may have to wait until preceding 'store' completes.ldrr12,[val]; Have to load a second time to ensure consistency.addr3,r3,r12strr3,[ptrB]

However, if the restrict keyword is used and the above function is declared as

voidupdatePtrs(size_t*restrictptrA,size_t*restrictptrB,size_t*restrictval);

then the compiler is allowed to assume that ptrA, ptrB, and val point to different locations and updating the memory location referenced by one pointer will not affect the memory locations referenced by the other pointers. The programmer, not the compiler, is responsible for ensuring that the pointers do not point to identical locations. The compiler can e.g. rearrange the code, first loading all memory locations, then performing the operations before committing the results back to memory.

ldrr12,[val]; Note that val is now only loaded once.ldrr3,[ptrA]; Also, all 'load's in the beginning ...ldrr4,[ptrB]addr3,r3,r12addr4,r4,r12strr3,[ptrA]; ... all 'store's in the end.strr4,[ptrB]

The above assembly code is shorter because val is loaded only once. Also, since the compiler can rearrange the code more freely, the compiler can generate code that executes faster. In the second version of the above example, the store operations are all taking place after the load operations, ensuring that the processor won't have to block in the middle of the code to wait until the store operations are complete.

Note that the real generated code may have different behaviors. Benefit with the above mini-example tends to be small, and in real-life cases large loops doing heavy memory access tends to be what is really helped by restrict.

As mentioned above, how incorrect code behaves is undefined, the compiler only ensures the generated code works properly if the code follows the declaration of intent.

Support by C++ compilers

C++ does not have standard support for restrict, but many compilers have equivalents that usually work in both C++ and C, such as the GCC's and Clang's __restrict__, and Visual C++'s __declspec(restrict). In addition, __restrict is supported by those three compilers. The exact interpretation of these alternative keywords vary by the compiler:

Compiler warnings

To help prevent incorrect code, some compilers and other tools try to detect when overlapping arguments have been passed to functions with parameters marked restrict. [3] The CERT C Coding Standard considers misuse of restrict and library functions marked with it (EXP43-C) a probable source of software bugs, although as of November 2019 no vulnerabilities are known to have been caused by this. [4]

Related Research Articles

C is a general-purpose computer programming language. It was created in the 1970s by Dennis Ritchie, and remains very widely used and influential. By design, C's features cleanly reflect the capabilities of the targeted CPUs. It has found lasting use in operating systems, device drivers, protocol stacks, though decreasingly for application software. C is commonly used on computer architectures that range from the largest supercomputers to the smallest microcontrollers and embedded systems.

In computing, a segmentation fault or access violation is a fault, or failure condition, raised by hardware with memory protection, notifying an operating system (OS) the software has attempted to access a restricted area of memory. On standard x86 computers, this is a form of general protection fault. The operating system kernel will, in response, usually perform some corrective action, generally passing the fault on to the offending process by sending the process a signal. Processes can in some cases install a custom signal handler, allowing them to recover on their own, but otherwise the OS default signal handler is used, generally causing abnormal termination of the process, and sometimes a core dump.

Java and C++ are two prominent object-oriented programming languages. By many language popularity metrics, the two languages have dominated object-oriented and high-performance software development for much of the 21st century, and are often directly compared and contrasted. Java appeared about 10 years later and its syntax was based on C/C++.

In computer programming, specifically when using the imperative programming paradigm, an assertion is a predicate connected to a point in the program, that always should evaluate to true at that point in code execution. Assertions can help a programmer read the code, help a compiler compile it, or help the program detect its own defects.

C dynamic memory allocation refers to performing manual memory management for dynamic memory allocation in the C programming language via a group of functions in the C standard library, namely malloc, realloc, calloc, aligned_alloc and free.

The syntax of the C programming language is the set of rules governing writing of software in the C language. It is designed to allow for programs that are extremely terse, have a close relationship with the resulting object code, and yet provide relatively high-level data abstraction. C was the first widely successful high-level language for portable operating-system development.

<span class="mw-page-title-main">Pointer (computer programming)</span> Object which stores memory addresses in a computer program

In computer science, a pointer is an object in many programming languages that stores a memory address. This can be that of another value located in computer memory, or in some cases, that of memory-mapped computer hardware. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer. As an analogy, a page number in a book's index could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number and reading the text found on that page. The actual format and content of a pointer variable is dependent on the underlying computer architecture.

In computer programming, undefined behavior (UB) is the result of executing a program whose behavior is prescribed to be unpredictable, in the language specification to which the computer code adheres. This is different from unspecified behavior, for which the language specification does not prescribe a result, and implementation-defined behavior that defers to the documentation of another component of the platform.

In computing, aliasing describes a situation in which a data location in memory can be accessed through different symbolic names in the program. Thus, modifying the data through one name implicitly modifies the values associated with all aliased names, which may not be expected by the programmer. As a result, aliasing makes it particularly difficult to understand, analyze and optimize programs. Aliasing analysers intend to make and compute useful information for understanding aliasing in programs.

In some programming languages, const is a type qualifier that indicates that the data is read-only. While this can be used to declare constants, const in the C family of languages differs from similar constructs in other languages in being part of the type, and thus has complicated behavior when combined with pointers, references, composite data types, and type-checking. In other languages, the data is not in a single memory location, but copied at compile time on each use. Languages which utilize it include C, C++, D, JavaScript, Julia, and Rust.

typedef is a reserved keyword in the programming languages C, C++, and Objective-C. It is used to create an additional name (alias) for another data type, but does not create a new type, except in the obscure case of a qualified typedef of an array type where the typedef qualifiers are transferred to the array element type. As such, it is often used to simplify the syntax of declaring complex data structures consisting of struct and union types, although it is also commonly used to provide specific descriptive type names for integer data types of varying sizes.

In computer programming, the block starting symbol is the portion of an object file, executable, or assembly language code that contains statically allocated variables that are declared but have not been assigned a value yet. It is often referred to as the "bss section" or "bss segment".

A class in C++ is a user-defined type or data structure declared with keyword class that has data and functions as its members whose access is governed by the three access specifiers private, protected or public. By default access to members of a C++ class is private. The private members are not accessible outside the class; they can be accessed only through methods of the class. The public members form an interface to the class and are accessible outside the class.

Platform Invocation Services, commonly referred to as P/Invoke, is a feature of Common Language Infrastructure implementations, like Microsoft's Common Language Runtime, that enables managed code to call native code.

The C and C++ programming languages are closely related but have many significant differences. C++ began as a fork of an early, pre-standardized C, and was designed to be mostly source-and-link compatible with C compilers of the time. Due to this, development tools for the two languages are often integrated into a single product, with the programmer able to specify C or C++ as their source language.

C++11 is a version of the ISO/IEC 14882 standard for the C++ programming language. C++11 replaced the prior version of the C++ standard, called C++03, and was later replaced by C++14. The name follows the tradition of naming language versions by the publication year of the specification, though it was formerly named C++0x because it was expected to be published before 2010.

Memory ordering describes the order of accesses to computer memory by a CPU. The term can refer either to the memory ordering generated by the compiler during compile time, or to the memory ordering generated by a CPU during runtime.

Dynamic loading is a mechanism by which a computer program can, at run time, load a library into memory, retrieve the addresses of functions and variables contained in the library, execute those functions or access those variables, and unload the library from memory. It is one of the 3 mechanisms by which a computer program can use some other software; the other two are static linking and dynamic linking. Unlike static linking and dynamic linking, dynamic loading allows a computer program to start up in the absence of these libraries, to discover available libraries, and to potentially gain additional functionality.

C's offsetof macro is an ANSI C library feature found in stddef.h. It evaluates to the offset of a given member within a struct or union type, an expression of type size_t. The offsetof macro takes two parameters, the first being a structure name, and the second being the name of a member within the structure. It cannot be described as a C prototype.

<span class="mw-page-title-main">ATS (programming language)</span> Programming language

ATS is a programming language designed to unify programming with formal specification. ATS has support for combining theorem proving with practical programming through the use of advanced type systems. A past version of The Computer Language Benchmarks Game has demonstrated that the performance of ATS is comparable to that of the C and C++ programming languages. By using theorem proving and strict type checking, the compiler can detect and prove that its implemented functions are not susceptible to bugs such as division by zero, memory leaks, buffer overflow, and other forms of memory corruption by verifying pointer arithmetic and reference counting before the program compiles. Additionally, by using the integrated theorem-proving system of ATS (ATS/LF), the programmer may make use of static constructs that are intertwined with the operative code to prove that a function attains its specification.

References

  1. Drepper, Ulrich (October 23, 2007). "Memory part 5: What programmers can do". What every programmer should know about memory. lwn.net. ...The default aliasing rules of the C and C++ languages do not help the compiler making these decisions (unless restrict is used, all pointer accesses are potential sources of aliasing). This is why Fortran is still a preferred language for numeric programming: it makes writing fast code easier. (In theory the restrict keyword introduced into the C language in the 1999 revision should solve the problem. Compilers have not caught up yet, though. The reason is mainly that too much incorrect code exists which would mislead the compiler and cause it to generate incorrect object code.)
  2. "Restricted Pointers". Using the GNU Compiler Collection (GCC).
  3. "Warning Options: -Wrestrict". GCC. Retrieved 19 November 2019.
  4. "EXP43-C. Avoid undefined behavior when using restrict-qualified pointers". SEI CERT C Coding Standard. Retrieved 19 November 2019.