Dangling pointer

Last updated
Dangling pointer Dangling Pointer.svg
Dangling pointer

Dangling pointers and wild pointers in computer programming are pointers that do not point to a valid object of the appropriate type. These are special cases of memory safety violations. More generally, dangling references and wild references are references that do not resolve to a valid destination.

Contents

Dangling pointers arise during object destruction, when an object that has an incoming reference is deleted or deallocated, without modifying the value of the pointer, so that the pointer still points to the memory location of the deallocated memory. The system may reallocate the previously freed memory, and if the program then dereferences the (now) dangling pointer, unpredictable behavior may result, as the memory may now contain completely different data. If the program writes to memory referenced by a dangling pointer, a silent corruption of unrelated data may result, leading to subtle bugs that can be extremely difficult to find. If the memory has been reallocated to another process, then attempting to dereference the dangling pointer can cause segmentation faults (UNIX, Linux) or general protection faults (Windows). If the program has sufficient privileges to allow it to overwrite the bookkeeping data used by the kernel's memory allocator, the corruption can cause system instabilities. In object-oriented languages with garbage collection, dangling references are prevented by only destroying objects that are unreachable, meaning they do not have any incoming pointers; this is ensured either by tracing or reference counting. However, a finalizer may create new references to an object, requiring object resurrection to prevent a dangling reference.

Wild pointers, also called uninitialized pointers, arise when a pointer is used prior to initialization to some known state, which is possible in some programming languages. They show the same erratic behavior as dangling pointers, though they are less likely to stay undetected because many compilers will raise a warning at compile time if declared variables are accessed before being initialized. [1]

Cause of dangling pointers

In many languages (e.g., the C programming language) deleting an object from memory explicitly or by destroying the stack frame on return does not alter associated pointers. The pointer still points to the same location in memory even though that location may now be used for other purposes.

A straightforward example is shown below:

{char*dp=NULL;/* ... */{charc;dp=&c;}/* c falls out of scope *//* dp is now a dangling pointer */}

If the operating system is able to detect run-time references to null pointers, a solution to the above is to assign 0 (null) to dp immediately before the inner block is exited. Another solution would be to somehow guarantee dp is not used again without further initialization.

Another frequent source of dangling pointers is a jumbled combination of malloc() and free() library calls: a pointer becomes dangling when the block of memory it points to is freed. As with the previous example one way to avoid this is to make sure to reset the pointer to null after freeing its reference—as demonstrated below.

#include<stdlib.h>voidfunc(){char*dp=malloc(A_CONST);/* ... */free(dp);/* dp now becomes a dangling pointer */dp=NULL;/* dp is no longer dangling *//* ... */}

An all too common misstep is returning addresses of a stack-allocated local variable: once a called function returns, the space for these variables gets deallocated and technically they have "garbage values".

int*func(void){intnum=1234;/* ... */return&num;}

Attempts to read from the pointer may still return the correct value (1234) for a while after calling func, but any functions called thereafter may overwrite the stack storage allocated for num with other values and the pointer would no longer work correctly. If a pointer to num must be returned, num must have scope beyond the function—it might be declared as static .

Manual deallocation without dangling reference

Antoni Kreczmar  [ pl ] (1945–1996) has created a complete object management system which is free of dangling reference phenomenon. [2] A similar approach was proposed by Fisher and LeBlanc [3] under the name Locks-and-keys .

Cause of wild pointers

Wild pointers are created by omitting necessary initialization prior to first use. Thus, strictly speaking, every pointer in programming languages which do not enforce initialization begins as a wild pointer.

This most often occurs due to jumping over the initialization, not by omitting it. Most compilers are able to warn about this.

intf(inti){char*dp;/* dp is a wild pointer */staticchar*scp;/* scp is not a wild pointer:                        * static variables are initialized to 0                        * at start and retain their values from                        * the last call afterwards.                        * Using this feature may be considered bad                        * style if not commented */}

Security holes involving dangling pointers

Like buffer-overflow bugs, dangling/wild pointer bugs frequently become security holes. For example, if the pointer is used to make a virtual function call, a different address (possibly pointing at exploit code) may be called due to the vtable pointer being overwritten. Alternatively, if the pointer is used for writing to memory, some other data structure may be corrupted. Even if the memory is only read once the pointer becomes dangling, it can lead to information leaks (if interesting data is put in the next structure allocated there) or to privilege escalation (if the now-invalid memory is used in security checks). When a dangling pointer is used after it has been freed without allocating a new chunk of memory to it, this becomes known as a "use after free" vulnerability. [4] For example, CVE - 2014-1776 is a use-after-free vulnerability in Microsoft Internet Explorer 6 through 11 [5] being used by zero-day attacks by an advanced persistent threat. [6]

Avoiding dangling pointer errors

In C, the simplest technique is to implement an alternative version of the free() (or alike) function which guarantees the reset of the pointer. However, this technique will not clear other pointer variables which may contain a copy of the pointer.

#include<assert.h>#include<stdlib.h>/* Alternative version for 'free()' */staticvoidsafefree(void**pp){/* in debug mode, abort if pp is NULL */assert(pp);/* free(NULL) works properly, so no check is required besides the assert in debug mode */free(*pp);/* deallocate chunk, note that free(NULL) is valid */*pp=NULL;/* reset original pointer */}intf(inti){char*p=NULL,*p2;p=malloc(1000);/* get a chunk */p2=p;/* copy the pointer *//* use the chunk here */safefree((void**)&p);/* safety freeing; does not affect p2 variable */safefree((void**)&p);/* this second call won't fail as p is reset to NULL */charc=*p2;/* p2 is still a dangling pointer, so this is undefined behavior. */returni+c;}

The alternative version can be used even to guarantee the validity of an empty pointer before calling malloc():

safefree(&p);/* i'm not sure if chunk has been released */p=malloc(1000);/* allocate now */

These uses can be masked through #define directives to construct useful macros (a common one being #define XFREE(ptr) safefree((void **)&(ptr))), creating something like a metalanguage or can be embedded into a tool library apart. In every case, programmers using this technique should use the safe versions in every instance where free() would be used; failing in doing so leads again to the problem. Also, this solution is limited to the scope of a single program or project, and should be properly documented.

Among more structured solutions, a popular technique to avoid dangling pointers in C++ is to use smart pointers. A smart pointer typically uses reference counting to reclaim objects. Some other techniques include the tombstones method and the locks-and-keys method. [3]

Another approach is to use the Boehm garbage collector, a conservative garbage collector that replaces standard memory allocation functions in C and C++ with a garbage collector. This approach completely eliminates dangling pointer errors by disabling frees, and reclaiming objects by garbage collection.

In languages like Java, dangling pointers cannot occur because there is no mechanism to explicitly deallocate memory. Rather, the garbage collector may deallocate memory, but only when the object is no longer reachable from any references.

In the language Rust, the type system has been extended to include also the variables lifetimes and resource acquisition is initialization. Unless one disables the features of the language, dangling pointers will be caught at compile time and reported as programming errors.

Dangling pointer detection

To expose dangling pointer errors, one common programming technique is to set pointers to the null pointer or to an invalid address once the storage they point to has been released. When the null pointer is dereferenced (in most languages) the program will immediately terminate—there is no potential for data corruption or unpredictable behavior. This makes the underlying programming mistake easier to find and resolve. This technique does not help when there are multiple copies of the pointer.

Some debuggers will automatically overwrite and destroy data that has been freed, usually with a specific pattern, such as 0xDEADBEEF (Microsoft's Visual C/C++ debugger, for example, uses 0xCC, 0xCD or 0xDD depending on what has been freed [7] ). This usually prevents the data from being reused by making it useless and also very prominent (the pattern serves to show the programmer that the memory has already been freed).

Tools such as Polyspace, TotalView, Valgrind, Mudflap, [8] AddressSanitizer, or tools based on LLVM [9] can also be used to detect uses of dangling pointers.

Other tools (SoftBound, Insure++, and CheckPointer) instrument the source code to collect and track legitimate values for pointers ("metadata") and check each pointer access against the metadata for validity.

Another strategy, when suspecting a small set of classes, is to temporarily make all their member functions virtual: after the class instance has been destructed/freed, its pointer to the Virtual Method Table is set to NULL, and any call to a member function will crash the program and it will show the guilty code in the debugger.

See also

Related Research Articles

The Cyclone programming language was intended to be a safe dialect of the C language. It avoids buffer overflows and other vulnerabilities that are possible in C programs by design, without losing the power and convenience of C as a tool for system programming. It is no longer supported by its original developers, with the reference tooling not supporting 64-bit platforms. The Rust language is mentioned by the original developers for having integrated many of the same ideas Cyclone had.

In computing, a segmentation fault or access violation is a fault, or failure condition, raised by hardware with memory protection, notifying an operating system (OS) the software has attempted to access a restricted area of memory. On standard x86 computers, this is a form of general protection fault. The operating system kernel will, in response, usually perform some corrective action, generally passing the fault on to the offending process by sending the process a signal. Processes can in some cases install a custom signal handler, allowing them to recover on their own, but otherwise the OS default signal handler is used, generally causing abnormal termination of the process, and sometimes a core dump.

<span class="mw-page-title-main">Memory management</span> Computer memory management methodology

Memory management is a form of resource management applied to computer memory. The essential requirement of memory management is to provide ways to dynamically allocate portions of memory to programs at their request, and free it for reuse when no longer needed. This is critical to any advanced computer system where more than a single process might be underway at any time.

In computer programming, a reference is a value that enables a program to indirectly access a particular datum, such as a variable's value or a record, in the computer's memory or in some other storage device. The reference is said to refer to the datum, and accessing the datum is called dereferencing the reference. A reference is distinct from the datum itself.

In computer science, a smart pointer is an abstract data type that simulates a pointer while providing added features, such as automatic memory management or bounds checking. Such features are intended to reduce bugs caused by the misuse of pointers, while retaining efficiency. Smart pointers typically keep track of the memory they point to, and may also be used to manage other resources, such as network connections and file handles. Smart pointers were first popularized in the programming language C++ during the first half of the 1990s as rebuttal to criticisms of C++'s lack of automatic garbage collection.

C dynamic memory allocation refers to performing manual memory management for dynamic memory allocation in the C programming language via a group of functions in the C standard library, namely malloc, realloc, calloc, aligned_alloc and free.

<span class="mw-page-title-main">C syntax</span> Set of rules defining correctly structured programs

The syntax of the C programming language is the set of rules governing writing of software in C. It is designed to allow for programs that are extremely terse, have a close relationship with the resulting object code, and yet provide relatively high-level data abstraction. C was the first widely successful high-level language for portable operating-system development.

<span class="mw-page-title-main">Pointer (computer programming)</span> Object which stores memory addresses in a computer program

In computer science, a pointer is an object in many programming languages that stores a memory address. This can be that of another value located in computer memory, or in some cases, that of memory-mapped computer hardware. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer. As an analogy, a page number in a book's index could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number and reading the text found on that page. The actual format and content of a pointer variable is dependent on the underlying computer architecture.

In computing, a null pointer or null reference is a value saved for indicating that the pointer or reference does not refer to a valid object. Programs routinely use null pointers to represent conditions such as the end of a list of unknown length or the failure to perform some action; this use of null pointers can be compared to nullable types and to the Nothing value in an option type.

In some programming languages, const is a type qualifier, which indicates that the data is read-only. While this can be used to declare constants, const in the C family of languages differs from similar constructs in other languages in that it is part of the type, and thus has complicated behavior when combined with pointers, references, composite data types, and type-checking. In other languages, the data is not in a single memory location, but copied at compile time for each use. Languages which use it include C, C++, D, JavaScript, Julia, and Rust.

In computer science, a tagged pointer is a pointer with additional data associated with it, such as an indirection bit or reference count. This additional data is often "folded" into the pointer, meaning stored inline in the data representing the address, taking advantage of certain properties of memory addressing. The name comes from "tagged architecture" systems, which reserved bits at the hardware level to indicate the significance of each word; the additional data is called a "tag" or "tags", though strictly speaking "tag" refers to data specifying a type, not other data; however, the usage "tagged pointer" is ubiquitous.

In computer science, garbage includes data, objects, or other regions of the memory of a computer system, which will not be used in any future computation by the system, or by a program running on it. Because every computer system has a finite amount of memory, and most software produces garbage, it is frequently necessary to deallocate memory that is occupied by garbage and return it to the heap, or memory pool, for reuse.

In computer science, manual memory management refers to the usage of manual instructions by the programmer to identify and deallocate unused objects, or garbage. Up until the mid-1990s, the majority of programming languages used in industry supported manual memory management, though garbage collection has existed since 1959, when it was introduced with Lisp. Today, however, languages with garbage collection such as Java are increasingly popular and the languages Objective-C and Swift provide similar functionality through Automatic Reference Counting. The main manually managed languages still in widespread use today are C and C++ – see C dynamic memory allocation.

sizeof is a unary operator in the programming languages C and C++. It generates the storage size of an expression or a data type, measured in the number of char-sized units. Consequently, the construct sizeof (char) is guaranteed to be 1. The actual number of bits of type char is specified by the preprocessor macro CHAR_BIT, defined in the standard include file limits.h. On most modern computing platforms this is eight bits. The result of sizeof has an unsigned integer type that is usually denoted by size_t.

mtrace is the memory debugger included in the GNU C Library.

Memory safety is the state of being protected from various software bugs and security vulnerabilities when dealing with memory access, such as buffer overflows and dangling pointers. For example, Java is said to be memory-safe because its runtime error detection checks array bounds and pointer dereferences. In contrast, C and C++ allow arbitrary pointer arithmetic with pointers implemented as direct memory addresses with no provision for bounds checking, and thus are potentially memory-unsafe.

In the C++ programming language, placement syntax allows programmers to explicitly specify the memory management of individual objects — i.e. their "placement" in memory. Normally, when an object is created dynamically, an allocation function is invoked in such a way that it will both allocate memory for the object, and initialize the object within the newly allocated memory. The placement syntax allows the programmer to supply additional arguments to the allocation function. A common use is to supply a pointer to a suitable region of storage where the object can be initialized, thus separating memory allocation from object construction.

In C++ computer programming, allocators are a component of the C++ Standard Library. The standard library provides several data structures, such as list and set, commonly referred to as containers. A common trait among these containers is their ability to change size during the execution of the program. To achieve this, some form of dynamic memory allocation is usually required. Allocators handle all the requests for allocation and deallocation of memory for a given container. The C++ Standard Library provides general-purpose allocators that are used by default, however, custom allocators may also be supplied by the programmer.

In computer science, region-based memory management is a type of memory management in which each allocated object is assigned to a region. A region, also called a zone, arena, area, or memory context, is a collection of allocated objects that can be efficiently reallocated or deallocated all at once. Memory allocators using region-based managements are often called area allocators, and when they work by only "bumping" a single pointer, as bump allocators.

<span class="mw-page-title-main">Zig (programming language)</span> A general-purpose programming language, toolchain to build Zig/C/C++ code

Zig is an imperative, general-purpose, statically typed, compiled system programming language designed by Andrew Kelley. It is intended to be a successor to the C programming language, with the intention of being even smaller and simpler to program in while also offering more functionality.

References

  1. "Warning Options - Using the GNU Compiler Collection (GCC)".
  2. Gianna Cioni, Antoni Kreczmar, Programmed deallocation without dangling reference, Information Processing Letters, v. 18, 1984, pp. 179–185
  3. 1 2 C. N. Fisher, R. J. Leblanc, The implementation of run-time diagnostics in Pascal , IEEE Transactions on Software Engineering, 6(4):313–319, 1980.
  4. Dalci, Eric; anonymous author; CWE Content Team (May 11, 2012). "CWE-416: Use After Free". Common Weakness Enumeration. Mitre Corporation . Retrieved April 28, 2014.{{cite web}}: |author2= has generic name (help)
  5. "CVE-2014-1776". Common Vulnerabilities and Exposures (CVE). 2014-01-29. Archived from the original on 2017-04-30. Retrieved 2017-05-16.
  6. Chen, Xiaobo; Caselden, Dan; Scott, Mike (April 26, 2014). "New Zero-Day Exploit targeting Internet Explorer Versions 9 through 11 Identified in Targeted Attacks". FireEye Blog. FireEye . Retrieved April 28, 2014.
  7. Visual C++ 6.0 memory-fill patterns
  8. Mudflap Pointer Debugging
  9. Dhurjati, D. and Adve, V. Efficiently Detecting All Dangling Pointer Uses in Production Servers