Tombstone (programming)

Last updated

Tombstones are a mechanism to detect dangling pointers and mitigate the problems they can cause in computer programs. Dangling pointers can appear in certain computer programming languages, e.g. C, C++ and assembly languages.

A tombstone is a structure that acts as an intermediary between a pointer and its target, often heap-dynamic data in memory. The pointer – sometimes called the handle – points only at tombstones and never to its actual target. When the data is deallocated, the tombstone is set to a null (or, more generally, to a value that is illegal for a pointer in the given runtime environment), indicating that the variable no longer exists. [1] This mechanism prevents the use of invalid pointers, which would otherwise access the memory area that once belonged to the now deallocated variable, although it may already contain other data, in turn leading to corruption of in-memory data. Depending on the operating system, the CPU can automatically detect such an invalid access (e.g. for the null value: a null pointer dereference error). This supports in analyzing the actual reason, a programming error, in debugging, and it can also be used to abort the program in production use, to prevent it from continuing with invalid data structures.

In more generalized terms, a tombstone can be understood as a marker for "this data is no longer here". For example, in filesystems it may be efficient when deleting files to mark them as "dead" instead of immediately reclaiming all their data blocks. [2]

The downsides of using tombstones include a computational overhead and additional memory consumption: extra processing is necessary to follow the path from the pointer to data through the tombstone, and extra memory is necessary to retain tombstones for every pointer throughout the program. One other problem is that all the code that needs to work with the pointers in question needs to be implemented to use the tombstone mechanism. [3]

Among popular programming languages, C++ implements the tombstone pattern in its standard library as a weak pointer using std::weak_ptr. Built–in support by programming languages or the compiler is not necessary to use this mechanism.

See also

Related Research Articles

C is a general-purpose programming language. It was created in the 1970s by Dennis Ritchie and remains very widely used and influential. By design, C's features cleanly reflect the capabilities of the targeted CPUs. It has found lasting use in operating systems code, device drivers, and protocol stacks, but its use in application software has been decreasing. C is commonly used on computer architectures that range from the largest supercomputers to the smallest microcontrollers and embedded systems.

<span class="mw-page-title-main">Garbage collection (computer science)</span> Form of automatic memory management

In computer science, garbage collection (GC) is a form of automatic memory management. The garbage collector attempts to reclaim memory that was allocated by the program, but is no longer referenced; such memory is called garbage. Garbage collection was invented by American computer scientist John McCarthy around 1959 to simplify manual memory management in Lisp.

The Cyclone programming language was intended to be a safe dialect of the C language. It avoids buffer overflows and other vulnerabilities that are possible in C programs by design, without losing the power and convenience of C as a tool for system programming. It is no longer supported by its original developers, with the reference tooling not supporting 64-bit platforms. The Rust language is mentioned by the original developers for having integrated many of the same ideas Cyclone had.

In computer science, a linked list is a linear collection of data elements whose order is not given by their physical placement in memory. Instead, each element points to the next. It is a data structure consisting of a collection of nodes which together represent a sequence. In its most basic form, each node contains data, and a reference to the next node in the sequence. This structure allows for efficient insertion or removal of elements from any position in the sequence during iteration. More complex variants add additional links, allowing more efficient insertion or removal of nodes at arbitrary positions. A drawback of linked lists is that data access time is linear in respect to the number of nodes in the list. Because nodes are serially linked, accessing any node requires that the prior node be accessed beforehand. Faster access, such as random access, is not feasible. Arrays have better cache locality compared to linked lists.

In computing, a segmentation fault or access violation is a fault, or failure condition, raised by hardware with memory protection, notifying an operating system (OS) the software has attempted to access a restricted area of memory. On standard x86 computers, this is a form of general protection fault. The operating system kernel will, in response, usually perform some corrective action, generally passing the fault on to the offending process by sending the process a signal. Processes can in some cases install a custom signal handler, allowing them to recover on their own, but otherwise the OS default signal handler is used, generally causing abnormal termination of the process, and sometimes a core dump.

Java and C++ are two prominent object-oriented programming languages. By many language popularity metrics, the two languages have dominated object-oriented and high-performance software development for much of the 21st century, and are often directly compared and contrasted. Java's syntax was based on C/C++.

<span class="mw-page-title-main">Memory management</span> Computer memory management methodology

Memory management is a form of resource management applied to computer memory. The essential requirement of memory management is to provide ways to dynamically allocate portions of memory to programs at their request, and free it for reuse when no longer needed. This is critical to any advanced computer system where more than a single process might be underway at any time.

In computer programming, a reference is a value that enables a program to indirectly access a particular datum, such as a variable's value or a record, in the computer's memory or in some other storage device. The reference is said to refer to the datum, and accessing the datum is called dereferencing the reference. A reference is distinct from the datum itself.

In computer programming, a weak reference is a reference that does not protect the referenced object from collection by a garbage collector, unlike a strong reference. An object referenced only by weak references – meaning "every chain of references that reaches the object includes at least one weak reference as a link" – is considered weakly reachable, and can be treated as unreachable and so may be collected at any time. Some garbage-collected languages feature or support various levels of weak references, such as C#, Lua, Java, Lisp, OCaml, Perl, Python and PHP since the version 7.4.

In computer science, a smart pointer is an abstract data type that simulates a pointer while providing added features, such as automatic memory management or bounds checking. Such features are intended to reduce bugs caused by the misuse of pointers, while retaining efficiency. Smart pointers typically keep track of the memory they point to, and may also be used to manage other resources, such as network connections and file handles. Smart pointers were first popularized in the programming language C++ during the first half of the 1990s as rebuttal to criticisms of C++'s lack of automatic garbage collection.

C dynamic memory allocation refers to performing manual memory management for dynamic memory allocation in the C programming language via a group of functions in the C standard library, namely malloc, realloc, calloc, aligned_alloc and free.

<span class="mw-page-title-main">C syntax</span> Set of rules defining correctly structured programs

The syntax of the C programming language is the set of rules governing writing of software in C. It is designed to allow for programs that are extremely terse, have a close relationship with the resulting object code, and yet provide relatively high-level data abstraction. C was the first widely successful high-level language for portable operating-system development.

<span class="mw-page-title-main">Pointer (computer programming)</span> Object which stores memory addresses in a computer program

In computer science, a pointer is an object in many programming languages that stores a memory address. This can be that of another value located in computer memory, or in some cases, that of memory-mapped computer hardware. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer. As an analogy, a page number in a book's index could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number and reading the text found on that page. The actual format and content of a pointer variable is dependent on the underlying computer architecture.

In computer science, type safety and type soundness are the extent to which a programming language discourages or prevents type errors. Type safety is sometimes alternatively considered to be a property of facilities of a computer language; that is, some facilities are type-safe and their usage will not result in type errors, while other facilities in the same language may be type-unsafe and a program using them may encounter type errors. The behaviors classified as type errors by a given programming language are usually those that result from attempts to perform operations on values that are not of the appropriate data type, e.g., adding a string to an integer when there's no definition on how to handle this case. This classification is partly based on opinion.

<span class="mw-page-title-main">Dangling pointer</span> Pointer that does not point to a valid object

Dangling pointers and wild pointers in computer programming are pointers that do not point to a valid object of the appropriate type. These are special cases of memory safety violations. More generally, dangling references and wild references are references that do not resolve to a valid destination.

In computing, a page fault is an exception that the memory management unit (MMU) raises when a process accesses a memory page without proper preparations. Accessing the page requires a mapping to be added to the process's virtual address space. Furthermore, the actual page contents may need to be loaded from a back-up, e.g. a disk. The MMU detects the page fault, but the operating system's kernel handles the exception by making the required page accessible in the physical memory or denying an illegal memory access.

Memory safety is the state of being protected from various software bugs and security vulnerabilities when dealing with memory access, such as buffer overflows and dangling pointers. For example, Java is said to be memory-safe because its runtime error detection checks array bounds and pointer dereferences. In contrast, C and C++ allow arbitrary pointer arithmetic with pointers implemented as direct memory addresses with no provision for bounds checking, and thus are potentially memory-unsafe.

In computer programming, a variable is an abstract storage location paired with an associated symbolic name, which contains some known or unknown quantity of data or object referred to as a value; or in simpler terms, a variable is a named container for a particular set of bits or type of data. A variable can eventually be associated with or identified by a memory address. The variable name is the usual way to reference the stored value, in addition to referring to the variable itself, depending on the context. This separation of name and content allows the name to be used independently of the exact information it represents. The identifier in computer source code can be bound to a value during run time, and the value of the variable may thus change during the course of program execution.

Automatic Reference Counting (ARC) is a memory management feature of the Clang compiler providing automatic reference counting for the Objective-C and Swift programming languages. At compile time, it inserts into the object code messages retain and release which increase and decrease the reference count at run time, marking for deallocation those objects when the number of references to them reaches zero.

<span class="mw-page-title-main">Zig (programming language)</span> A general-purpose programming language, toolchain to build Zig/C/C++ code

Zig is an imperative, general-purpose, statically typed, compiled system programming language designed by Andrew Kelley. It is intended as a successor to the language C, with the intent of being even smaller and simpler to program in, while offering more function. It is free and open-source software, released under an MIT License.

References

  1. Michael Lee Scott (2000). Programming Language Pragmatics. Morgan Kaufmann. p. 392. ISBN   9781558604421.
  2. Clifford A. Shaffer (2012). Data Structures and Algorithm Analysis in C++, Third Edition. Dover Publications. p. 344. ISBN   9780486172620.
  3. Maurizio Gabbrielli; Simone Martini (2010). Programming Languages: Principles and Paradigms. Springer London. p. 248. ISBN   9781848829145.