ABA problem

Last updated

In multithreaded computing, the ABA problem occurs during synchronization, when a location is read twice, has the same value for both reads, and the read value being the same twice is used to conclude that nothing has happened in the interim; however, another thread can execute between the two reads and change the value, do other work, then change the value back, thus fooling the first thread into thinking nothing has changed even though the second thread did work that violates that assumption.

Contents

The ABA problem occurs when multiple threads (or processes) accessing shared data interleave. Below is a sequence of events that illustrates the ABA problem:

  1. Process reads value A from some shared memory location,
  2. is preempted, allowing process to run,
  3. writes value B to the shared memory location
  4. writes value A to the shared memory location
  5. is preempted, allowing process to run,
  6. reads value A from the shared memory location,
  7. determines that the shared memory value has not changed and continues.

Although can continue executing, it is possible that the behavior will not be correct due to the "hidden" modification in shared memory.

A common case of the ABA problem is encountered when implementing a lock-free data structure. If an item is removed from the list, deleted, and then a new item is allocated and added to the list, it is common for the allocated object to be at the same location as the deleted object due to MRU memory allocation. A pointer to the new item is thus often equal to a pointer to the old item, causing an ABA problem.

Examples

Consider a software example (written in C++) of ABA using a lock-free stack:

/* Naive lock-free stack which suffers from ABA problem.*/classStack{std::atomic<Obj*>top_ptr;//// Pops the top object and returns a pointer to it.//Obj*Pop(){while(1){Obj*ret_ptr=top_ptr;if(ret_ptr==nullptr)returnnullptr;// For simplicity, suppose that we can ensure that this dereference is safe// (i.e., that no other thread has popped the stack in the meantime).Obj*next_ptr=ret_ptr->next;// If the top node is still ret, then assume no one has changed the stack.// (That statement is not always true because of the ABA problem)// Atomically replace top with next.if(top_ptr.compare_exchange_weak(ret_ptr,next_ptr)){returnret_ptr;}// The stack has changed, start over.}}//// Pushes the object specified by obj_ptr to stack.//voidPush(Obj*obj_ptr){while(1){Obj*next_ptr=top_ptr;obj_ptr->next=next_ptr;// If the top node is still next, then assume no one has changed the stack.// (That statement is not always true because of the ABA problem)// Atomically replace top with obj.if(top_ptr.compare_exchange_weak(next_ptr,obj_ptr)){return;}// The stack has changed, start over.}}};

This code can normally prevent problems from concurrent access, but suffers from ABA problems. Consider the following sequence:

Stack initially contains top → A → B → C

Thread 1 starts running pop:

ret = A; next = B;

Thread 1 gets interrupted just before the compare_exchange_weak...

{// Thread 2 runs pop:ret=A;next=B;compare_exchange_weak(A,B)// Success, top = BreturnA;}// Now the stack is top → B → C{// Thread 2 runs pop again:ret=B;next=C;compare_exchange_weak(B,C)// Success, top = CreturnB;}// Now the stack is top → CdeleteB;{// Thread 2 now pushes A back onto the stack:A->next=C;compare_exchange_weak(C,A)// Success, top = A}

Now the stack is top → A → C

When Thread 1 resumes:

compare_exchange_weak(A, B)

This instruction succeeds because it finds top == ret (both are A), so it sets top to next (which is B). As B has been deleted the program will access freed memory when it tries to look at the first element on the stack. In C++, as shown here, accessing freed memory is undefined behavior: this may result in crashes, data corruption or even just silently appear to work correctly. ABA bugs such as this can be difficult to debug.

Workarounds

Tagged state reference

A common workaround is to add extra "tag" or "stamp" bits to the quantity being considered. For example, an algorithm using compare and swap on a pointer might use the low bits of the address to indicate how many times the pointer has been successfully modified. Because of this, the next compare-and-swap will fail, even if the addresses are the same, because the tag bits will not match. This is sometimes called ABAʹ since the second A is made slightly different from the first. Such tagged state references are also used in transactional memory. Although a tagged pointer can be used for implementation, a separate tag field is preferred if double-width CAS is available.

If "tag" field wraps around, guarantees against ABA do not stand anymore. However, it has been observed that on currently existing CPUs, and using 60-bit tags, no wraparound is possible as long as the program lifetime (that is, without restarting the program) is limited to 10 years; in addition, it was argued that for practical purposes it is usually sufficient to have 40-48 bits of tag to guarantee against wrapping around. As modern CPUs (in particular, all modern x64 CPUs) tend to support 128-bit CAS operations, this can allow firm guarantees against ABA. [1]

Intermediate nodes

A correct but expensive approach is to use intermediate nodes that are not data elements and thus assure invariants as elements are inserted and removed [Valois].

Deferred reclamation

Another approach is to defer reclamation of removed data elements. One way to defer reclamation is to run the algorithm in an environment featuring an automatic garbage collector; a problem here however is that if the GC is not lock-free, then the overall system is not lock-free, even though the data structure itself is.

Another way to defer reclamation is to use one or more hazard pointers, which are pointers to locations that otherwise cannot appear in the list. Each hazard pointer represents an intermediate state of an in-progress change; the presence of the pointer assures further synchronization [Doug Lea]. Hazard pointers are lock-free, but can only track at most a fixed number of elements per thread as being in-use.

Yet another way to defer reclamation is to use read-copy update (RCU), which involves enclosing the update in an RCU read-side critical section and then waiting for an RCU grace period before freeing any removed data elements. Using RCU in this way guarantees that any data element removed cannot reappear until all currently executing operations have completed. RCU is lock-free, but isn't suitable for all workloads.

Alternate instructions

Rather than using a single pointer-wide compare-and-swap instructions, some processors have other instructions intended to be more resistant or immune to the ABA problem.

Some architectures provide "larger" atomic operations such that, as example, both forward and backward links in a doubly linked list can be updated atomically; while this feature is architecture-dependent, it, in particular, is available for x86/x64 architectures (x86 allows for 64-bit CAS, and all modern x64 CPUs allow for 128-bit CAS) and IBM's z/Architecture (which allows for up to 128-bit CAS).

Some architectures provide a load linked, store conditional instruction in which the store is performed only when there are no other stores of the indicated location. This effectively separates the notion of "storage contains value" from "storage has been changed". Examples include DEC Alpha, MIPS, PowerPC, RISC-V and ARM (v6 and later). Since these instructions provide atomicity using the address rather than the value, routines using these instructions are immune to the ABA problem. [2]

See also

Related Research Articles

<span class="mw-page-title-main">FIFO (computing and electronics)</span> Scheduling algorithm, the first piece of data inserted into a queue is processed first

In computing and in systems theory, first in, first out, acronymized as FIFO, is a method for organizing the manipulation of a data structure where the oldest (first) entry, or "head" of the queue, is processed first.

In computer science, reference counting is a programming technique of storing the number of references, pointers, or handles to a resource, such as an object, a block of memory, disk space, and others.

In computer science, threaded code is a programming technique where the code has a form that essentially consists entirely of calls to subroutines. It is often used in compilers, which may generate code in that form or be implemented in that form themselves. The code may be processed by an interpreter or it may simply be a sequence of machine code call instructions.

x86 assembly language is the name for the family of assembly languages which provide some level of backward compatibility with CPUs back to the Intel 8008 microprocessor, which was launched in April 1972. It is used to produce object code for the x86 class of processors.

In computer programming, a weak reference is a reference that does not protect the referenced object from collection by a garbage collector, unlike a strong reference. An object referenced only by weak references – meaning "every chain of references that reaches the object includes at least one weak reference as a link" – is considered weakly reachable, and can be treated as unreachable and so may be collected at any time. Some garbage-collected languages feature or support various levels of weak references, such as C#, Java, Lisp, OCaml, Perl, Python and PHP since the version 7.4.

In software engineering, a spinlock is a lock that causes a thread trying to acquire it to simply wait in a loop ("spin") while repeatedly checking whether the lock is available. Since the thread remains active but is not performing a useful task, the use of such a lock is a kind of busy waiting. Once acquired, spinlocks will usually be held until they are explicitly released, although in some implementations they may be automatically released if the thread being waited on blocks or "goes to sleep".

In computer science, read-copy-update (RCU) is a synchronization mechanism that avoids the use of lock primitives while multiple threads concurrently read and update elements that are linked through pointers and that belong to shared data structures.

In computer science, a smart pointer is an abstract data type that simulates a pointer while providing added features, such as automatic memory management or bounds checking. Such features are intended to reduce bugs caused by the misuse of pointers, while retaining efficiency. Smart pointers typically keep track of the memory they point to, and may also be used to manage other resources, such as network connections and file handles. Smart pointers were first popularized in the programming language C++ during the first half of the 1990s as rebuttal to criticisms of C++'s lack of automatic garbage collection.

C dynamic memory allocation refers to performing manual memory management for dynamic memory allocation in the C programming language via a group of functions in the C standard library, namely malloc, realloc, calloc, aligned_alloc and free.

In computer science, the test-and-set instruction is an instruction used to write (set) 1 to a memory location and return its old value as a single atomic operation. The caller can then "test" the result to see if the state was changed by the call. If multiple processes may access the same memory location, and if a process is currently performing a test-and-set, no other process may begin another test-and-set until the first process's test-and-set is finished. A central processing unit (CPU) may use a test-and-set instruction offered by another electronic component, such as dual-port RAM; a CPU itself may also offer a test-and-set instruction.

<span class="mw-page-title-main">Pointer (computer programming)</span> Object which stores memory addresses in a computer program

In computer science, a pointer is an object in many programming languages that stores a memory address. This can be that of another value located in computer memory, or in some cases, that of memory-mapped computer hardware. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer. As an analogy, a page number in a book's index could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number and reading the text found on that page. The actual format and content of a pointer variable is dependent on the underlying computer architecture.

In computer science, compare-and-swap (CAS) is an atomic instruction used in multithreading to achieve synchronization. It compares the contents of a memory location with a given value and, only if they are the same, modifies the contents of that memory location to a new given value. This is done as a single atomic operation. The atomicity guarantees that the new value is calculated based on up-to-date information; if the value had been updated by another thread in the meantime, the write would fail. The result of the operation must indicate whether it performed the substitution; this can be done either with a simple boolean response, or by returning the value read from the memory location.

<span class="mw-page-title-main">Linearizability</span> Property of some operation(s) in concurrent programming

In concurrent programming, an operation is linearizable if it consists of an ordered list of invocation and response events, that may be extended by adding response events such that:

  1. The extended list can be re-expressed as a sequential history.
  2. That sequential history is a subset of the original unextended list.

In a multithreaded computing environment, hazard pointers are one approach to solving the problems posed by dynamic memory management of the nodes in a lock-free data structure. These problems generally arise only in environments that don't have automatic garbage collection.

In computer science, load-linked/store-conditional (LL/SC), sometimes known as load-reserved/store-conditional (LR/SC), are a pair of instructions used in multithreading to achieve synchronization. Load-link returns the current value of a memory location, while a subsequent store-conditional to the same memory location will store a new value only if no updates have occurred to that location since the load-link. Together, this implements a lock-free, atomic, read-modify-write operation.

In computer science, read–modify–write is a class of atomic operations that both read a memory location and write a new value into it simultaneously, either with a completely new value or some function of the previous value. These operations prevent race conditions in multi-threaded applications. Typically they are used to implement mutexes or semaphores. These atomic operations are also heavily used in non-blocking synchronization.

A concurrent hash-trie or Ctrie is a concurrent thread-safe lock-free implementation of a hash array mapped trie. It is used to implement the concurrent map abstraction. It has particularly scalable concurrent insert and remove operations and is memory-efficient. It is the first known concurrent data-structure that supports O(1), atomic, lock-free snapshots.

In computer science, persistent memory is any method or apparatus for efficiently storing data structures such that they can continue to be accessed using memory instructions or memory APIs even after the end of the process that created or last modified them.

A non-blocking linked list is an example of non-blocking data structures designed to implement a linked list in shared memory using synchronization primitives:

The Java programming language's Java Collections Framework version 1.5 and later defines and implements the original regular single-threaded Maps, and also new thread-safe Maps implementing the java.util.concurrent.ConcurrentMap interface among other concurrent interfaces. In Java 1.6, the java.util.NavigableMap interface was added, extending java.util.SortedMap, and the java.util.concurrent.ConcurrentNavigableMap interface was added as a subinterface combination.

References

  1. 'No Bugs' Hare, CAS (Re)Actor for Non-Blocking Multithreaded Primitives, reprinted from Overload #142, 2017
  2. John Goodacre and Andrew N. Sloss. "Parallelism and the ARM Instruction Set Architecture". p. 46.