This article needs additional citations for verification . (July 2010) (Learn how and when to remove this template message)
A race condition or race hazard is the condition of an electronics, software, or other system where the system's substantive behavior is dependent on the sequence or timing of other uncontrollable events. It becomes a bug when one or more of the possible behaviors is undesirable.
The term race condition was already in use by 1954, for example in David A. Huffman's doctoral thesis "The synthesis of sequential switching circuits".
Race conditions can occur especially in logic circuits, multithreaded or distributed software programs.
A typical example of a race condition may occur when a logic gate combines signals that have traveled along different paths from the same source. The inputs to the gate can change at slightly different times in response to a change in the source signal. The output may, for a brief period, change to an unwanted state before settling back to the designed state. Certain systems can tolerate such glitches but if this output functions as a clock signal for further systems that contain memory, for example, the system can rapidly depart from its designed behaviour (in effect, the temporary glitch becomes a permanent glitch).
Consider, for example, a two-input AND gate fed with a logic signal A on one input and its negation, NOT A, on another input. In theory the output (A AND NOT A) should never be true. If, however, changes in the value of A take longer to propagate to the second input than the first when A changes from false to true then a brief period will ensue during which both inputs are true, and so the gate's output will also be true.
Design techniques such as Karnaugh maps encourage designers to recognize and eliminate race conditions before they cause problems. Often logic redundancy can be added to eliminate some kinds of races.
As well as these problems, some logic elements can enter metastable states, which create further problems for circuit designers.
A critical race condition occurs when the order in which internal variables are changed determines the eventual state that the state machine will end up in.
A non-critical race condition occurs when the order in which internal variables are changed does not determine the eventual state that the state machine will end up in.
A static race condition occurs when a signal and its complement are combined together.
A dynamic race condition occurs when it results in multiple transitions when only one is intended. They are due to interaction between gates. It can be eliminated by using no more than two levels of gating.
An essential race condition occurs when an input has two transitions in less than the total feedback propagation time. Sometimes they are cured using inductive delay line elements to effectively increase the time duration of an input signal.
A race condition arises in software when a computer program, to operate properly, depends on the sequence or timing of the program's processes or threads . Critical race conditions cause invalid execution and software bugs. Critical race conditions often happen when the processes or threads depend on some shared state. Operations upon shared states are critical sections that must be mutually exclusive. Failure to obey this rule can corrupt the shared state.
A data race is a type of race condition. Data races are important parts of various formal memory models. The memory model defined in the C11 and C++11 standards specify that a C or C++ program containing a data race has undefined behavior.
A race condition can be difficult to reproduce and debug because the end result is nondeterministic and depends on the relative timing between interfering threads. Problems that occur when using systems can therefore disappear when running in debug mode, when additional logging is added, or when attaching a debugger, often referred to as a "Heisenbug". It is therefore better to avoid race conditions by careful software design.
Assume that two threads each increment the value of a global integer variable by 1. Ideally, the following sequence of operations would take place:
|Thread 1||Thread 2||Integer value|
In the case shown above, the final value is 2, as expected. However, if the two threads run simultaneously without locking or synchronization, the outcome of the operation could be wrong. The alternative sequence of operations below demonstrates this scenario:
|Thread 1||Thread 2||Integer value|
In this case, the final value is 1 instead of the correct result of 2. This occurs because here the increment operations are not mutually exclusive. Mutually exclusive operations are those that cannot be interrupted while accessing some resource such as a memory location.
Not all regard data races as a subset of race conditions.The precise definition of data race is specific to the formal concurrency model being used, but typically it refers to a situation where a memory operation in one thread could potentially attempt to access a memory location at the same time that a memory operation in another thread is writing to that memory location, in a context where this is dangerous. This implies that a data race is different from a race condition as it is possible to have nondeterminism due to timing even in a program without data races, for example, in a program in which all memory accesses use only atomic operations.
This can be dangerous because on many platforms, if two threads write to a memory location at the same time, it may be possible for the memory location to end up holding a value that is some arbitrary and meaningless combination of the bits representing the values that each thread was attempting to write; this could result in memory corruption if the resulting value is one that neither thread attempted to write (sometimes this is called a 'torn write'). Similarly, if one thread reads from a location while another thread is writing to it, it may be possible for the read to return a value that is some arbitrary and meaningless combination of the bits representing the value that the memory location held before the write, and of the bits representing the value being written.
On many platforms, special memory operations are provided for simultaneous access; in such cases, typically simultaneous access using these special operations is safe, but simultaneous access using other memory operations is dangerous. Sometimes such special operations (which are safe for simultaneous access) are called atomic or synchronization operations, whereas the ordinary operations (which are unsafe for simultaneous access) are called data operations. This is probably why the term is data race; on many platforms, where there is a race condition involving only synchronization operations, such a race may be nondeterministic but otherwise safe; but a data race could lead to memory corruption or undefined behavior.
The precise definition of data race differs across formal concurrency models. This matters because concurrent behavior is often non-intuitive and so formal reasoning is sometimes applied.
The C++ standard, in draft N4296 (2014-11-19)], defines data race as follows in section 1.10.23 (page 14)
Two actions are potentially concurrent if
- they are performed by different threads, or
- they are unsequenced, and at least one is performed by a signal handler.
The execution of a program contains a data race if it contains two potentially concurrent conflicting actions, at least one of which is not atomic, and neither happens before the other, except for the special case for signal handlers described below [omitted]. Any such data race results in undefined behavior.
The parts of this definition relating to signal handlers are idiosyncratic to C++ and are not typical of definitions of data race.
The paper Detecting Data Races on Weak Memory Systemsprovides a different definition:
"two memory operations conflict if they access the same location and at least one of them is a write operation... "Two memory operations, x and y, in a sequentially consistent execution form a race 〈x,y〉, iff x and y conflict, and they are not ordered by the hb1 relation of the execution. The race 〈x,y〉, is a data race iff at least one of x or y is a data operation.
Here we have two memory operations accessing the same location, one of which is a write.
The hb1 relation is defined elsewhere in the paper, and is an example of a typical "happens-before" relation; intuitively, if we can prove that we are in a situation where one memory operation X is guaranteed to be executed to completion before another memory operation Y begins, then we say that "X happens-before Y". If neither "X happens-before Y" nor "Y happens-before X", then we say that X and Y are "not ordered by the hb1 relation". So, the clause "...and they are not ordered by the hb1 relation of the execution" can be intuitively translated as "...and X and Y are potentially simultaneous".
The paper considers dangerous only those situations in which at least one of the memory operations is a "data operation"; in other parts of this paper, the paper also defines a class of "synchronization operations" which are safe for potentially simultaneous use, in contrast to "data operations".
The Java Language Specificationprovides a different definition:
Two accesses to (reads of or writes to) the same variable are said to be conflicting if at least one of the accesses is a write...When a program contains two conflicting accesses (§17.4.1) that are not ordered by a happens-before relationship, it is said to contain a data race...a data race cannot cause incorrect behavior such as returning the wrong length for an array.
A critical difference between the C++ approach and the Java approach is that in C++, a data race is undefined behavior, whereas in Java, a data race merely affects "inter-thread actions".This means that in C++, an attempt to execute a program containing a data race could (while still adhering to the spec) crash or could exhibit insecure or bizarre behavior, whereas in Java, an attempt to execute a program containing a data race may produce undesired concurrency behavior but is otherwise (assuming that the implementation adheres to the spec) safe.
An important facet of data races is that in some contexts, a program that is free of data races is guaranteed to execute in a sequentially consistent manner, greatly easing reasoning about the concurrent behavior of the program. Formal memory models that provide such a guarantee are said to exhibit an "SC for DRF" (Sequential Consistency for Data Race Freedom) property. This approach has been said to have achieved recent consensus (presumably compared to approaches which guarantee sequential consistency in all cases, or approaches which do not guarantee it at all).
For example, in Java, this guarantee is directly specified:
A program is correctly synchronized if and only if all sequentially consistent executions are free of data races.
If a program is correctly synchronized, then all executions of the program will appear to be sequentially consistent (§17.4.3).
This is an extremely strong guarantee for programmers. Programmers do not need to reason about reorderings to determine that their code contains data races. Therefore they do not need to reason about reorderings when determining whether their code is correctly synchronized. Once the determination that the code is correctly synchronized is made, the programmer does not need to worry that reorderings will affect his or her code.
A program must be correctly synchronized to avoid the kinds of counterintuitive behaviors that can be observed when code is reordered. The use of correct synchronization does not ensure that the overall behavior of a program is correct. However, its use does allow a programmer to reason about the possible behaviors of a program in a simple way; the behavior of a correctly synchronized program is much less dependent on possible reorderings. Without correct synchronization, very strange, confusing and counterintuitive behaviors are possible.
By contrast, a draft C++ specification does not directly require an SC for DRF property, but merely observes that there exists a theorem providing it:
[Note:It can be shown that programs that correctly use mutexes and memory_order_seq_cst operations to prevent all data races and use no other synchronization operations behave as if the operations executed by their constituent threads were simply interleaved, with each value computation of an object being taken from the last side effect on that object in that interleaving. This is normally referred to as “sequential consistency”. However, this applies only to data-race-free programs, and data-race-free programs cannot observe most program transformations that do not change single-threaded program semantics. In fact, most single-threaded program transformations continue to be allowed, since any program that behaves differently as a result must perform an undefined operation.— end note
Note that the C++ draft specification admits the possibility of programs that are valid but use synchronization operations with a memory_order other than memory_order_seq_cst, in which case the result may be a program which is correct but for which no guarantee of sequentially consistency is provided. In other words, in C++, some correct programs are not sequentially consistent. This approach is thought to give C++ programmers the freedom to choose faster program execution at the cost of giving up ease of reasoning about their program.
There are various theorems, often provided in the form of memory models, that provide SC for DRF guarantees given various contexts. The premises of these theorems typically place constraints upon both the memory model (and therefore upon the implementation), and also upon the programmer; that is to say, typically it is the case that there are programs which do not meet the premises of the theorem and which could not be guaranteed to execute in a sequentially consistent manner.
The DRF1 memory modelprovides SC for DRF and allows the optimizations of the WO (weak ordering), RCsc (Release Consistency with sequentially consistent special operations), VAX memory model, and data-race-free-0 memory models. The PLpc memory model provides SC for DRF and allows the optimizations of the TSO (Total Store Order), PSO, PC (Processor Consistency), and RCpc (Release Consistency with processor consistency special operations) models. DRFrlx provides a sketch of an SC for DRF theorem in the presence of relaxed atomics.
Many software race conditions have associated computer security implications. A race condition allows an attacker with access to a shared resource to cause other actors that utilize that resource to malfunction, resulting in effects including denial of serviceand privilege escalation.
A specific kind of race condition involves checking for a predicate (e.g. for authentication), then acting on the predicate, while the state can change between the time of check and the time of use. When this kind of bug exists in security-sensitive code, a security vulnerability called a time-of-check-to-time-of-use (TOCTTOU) bug is created.
Race conditions are also intentionally used to create hardware random number generators and physically unclonable functions. [ citation needed ] PUFs can be created by designing circuit topologies with identical paths to a node and relying on manufacturing variations to randomly determine which paths will complete first. By measuring each manufactured circuit's specific set of race condition outcomes, a profile can be collected for each circuit and kept secret in order to later verify a circuit's identity.
Two or more programs may collide in their attempts to modify or access a file system, which can result in data corruption or privilege escalation.File locking provides a commonly used solution. A more cumbersome remedy involves organizing the system in such a way that one unique process (running a daemon or the like) has exclusive access to the file, and all other processes that need to access the data in that file do so only via interprocess communication with that one process. This requires synchronization at the process level.
A different form of race condition exists in file systems where unrelated programs may affect each other by suddenly using up available resources such as disk space, memory space, or processor cycles. Software not carefully designed to anticipate and handle this race situation may then become unpredictable. Such a risk may be overlooked for a long time in a system that seems very reliable. But eventually enough data may accumulate or enough other software may be added to critically destabilize many parts of a system. An example of this occurred with the near loss of the Mars Rover "Spirit" not long after landing. A solution is for software to request and reserve all the resources it will need before beginning a task; if this request fails then the task is postponed, avoiding the many points where failure could have occurred. Alternatively, each of those points can be equipped with error handling, or the success of the entire task can be verified afterwards, before continuing. A more common approach is to simply verify that enough system resources are available before starting a task; however, this may not be adequate because in complex systems the actions of other running programs can be unpredictable.
In networking, consider a distributed chat network like IRC, where a user who starts a channel automatically acquires channel-operator privileges. If two users on different servers, on different ends of the same network, try to start the same-named channel at the same time, each user's respective server will grant channel-operator privileges to each user, since neither server will yet have received the other server's signal that it has allocated that channel. (This problem has been largely solved by various IRC server implementations.)
In this case of a race condition, the concept of the "shared resource" covers the state of the network (what channels exist, as well as what users started them and therefore have what privileges), which each server can freely change as long as it signals the other servers on the network about the changes so that they can update their conception of the state of the network. However, the latency across the network makes possible the kind of race condition described. In this case, heading off race conditions by imposing a form of control over access to the shared resource—say, appointing one server to control who holds what privileges—would mean turning the distributed network into a centralized one (at least for that one part of the network operation).
Race conditions can also exist when a computer program is written with non-blocking sockets, in which case the performance of the program can be dependent on the speed of the network link.
Software flaws in life-critical systems can be disastrous. Race conditions were among the flaws in the Therac-25 radiation therapy machine, which led to the death of at least three patients and injuries to several more.
Another example is the Energy Management System provided by GE Energy and used by Ohio-based FirstEnergy Corp (among other power facilities). A race condition existed in the alarm subsystem; when three sagging power lines were tripped simultaneously, the condition prevented alerts from being raised to the monitoring technicians, delaying their awareness of the problem. This software flaw eventually led to the North American Blackout of 2003.GE Energy later developed a software patch to correct the previously undiscovered error.
This section needs expansion. You can help by adding to it.(October 2016)
Neuroscience is demonstrating that race conditions can occur in mammal (rat) brains as well.
Many software tools exist to help detect race conditions in software. They can be largely categorized into two groups: static analysis tools and dynamic analysis tools.
Thread Safety Analysis is a static analysis tool for annotation-based intra-procedural static analysis, originally implemented as a branch of gcc, and now reimplemented in Clang, supporting PThreads. [ non-primary source needed ]
Dynamic analysis tools include:
There are several benchmarks designed to evaluate the effectiveness of data race detection tools
In computer science, a thread of execution is the smallest sequence of programmed instructions that can be managed independently by a scheduler, which is typically a part of the operating system. The implementation of threads and processes differs between operating systems, but in most cases a thread is a component of a process. Multiple threads can exist within one process, executing concurrently and sharing resources such as memory, while different processes do not share these resources. In particular, the threads of a process share its executable code and the values of its dynamically allocated variables and non-thread-local global variables at any given time.
Thread safety is a computer programming concept applicable to multi-threaded code. Thread-safe code only manipulates shared data structures in a manner that ensures that all threads behave properly and fulfill their design specifications without unintended interaction. There are various strategies for making thread-safe data structures.
Parallel computing is a type of computation in which many calculations or the execution of processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different forms of parallel computing: bit-level, instruction-level, data, and task parallelism. Parallelism has long been employed in high-performance computing, but it's gaining broader interest due to the physical constraints preventing frequency scaling. As power consumption by computers has become a concern in recent years, parallel computing has become the dominant paradigm in computer architecture, mainly in the form of multi-core processors.
In computer science, a lock or mutex is a synchronization mechanism for enforcing limits on access to a resource in an environment where there are many threads of execution. A lock is designed to enforce a mutual exclusion concurrency control policy.
In computer science, consistency models are used in distributed systems like distributed shared memory systems or distributed data stores. The system is said to support a given model if operations on memory follow specific rules. The data consistency model specifies a contract between programmer and system, wherein the system guarantees that if the programmer follows the rules, memory will be consistent and the results of reading, writing, or updating memory will be predictable. This is different from coherence, which occurs in systems that are cached or cache-less, and is consistency of data with respect to all processors. Coherence deals with maintaining a global order in which writes to a single location or single variable are seen by all processors. Consistency deals with the ordering of operations to multiple locations with respect to all processors.
In computer science, distributed shared memory (DSM) is a form of memory architecture where physically separated memories can be addressed as one logically shared address space. Here, the term "shared" does not mean that there is a single centralized memory, but that the address space is "shared". Distributed global address space (DGAS), is a similar term for a wide class of software and hardware implementations, in which each node of a cluster has access to shared memory in addition to each node's non-shared private memory.
JCSP is an implementation of Communicating Sequential Processes (CSP) for the Java programming language.
In computer science, an algorithm is called non-blocking if failure or suspension of any thread cannot cause failure or suspension of another thread; for some operations, these algorithms provide a useful alternative to traditional blocking implementations. A non-blocking algorithm is lock-free if there is guaranteed system-wide progress, and wait-free if there is also guaranteed per-thread progress.
Release consistency is one of the synchronization-based consistency models used in concurrent programming.
A memory barrier, also known as a membar, memory fence or fence instruction, is a type of barrier instruction that causes a central processing unit (CPU) or compiler to enforce an ordering constraint on memory operations issued before and after the barrier instruction. This typically means that operations issued prior to the barrier are guaranteed to be performed before operations issued after the barrier.
In concurrent programming, an operation is linearizable if it consists of an ordered list of invocation and response events (callbacks), that may be extended by adding response events such that:
In concurrent programming, a monitor is a synchronization construct that allows threads to have both mutual exclusion and the ability to wait (block) for a certain condition to become false. Monitors also have a mechanism for signaling other threads that their condition has been met. A monitor consists of a mutex (lock) object and condition variables. A condition variable essentially is a container of threads that are waiting for a certain condition. Monitors provide a mechanism for threads to temporarily give up exclusive access in order to wait for some condition to be met, before regaining exclusive access and resuming their task.
Concurrent computing is a form of computing in which several computations are executed concurrently——during overlapping time periods——instead of sequentially, with one completing before the next starts.
In computer science and engineering, transactional memory attempts to simplify concurrent programming by allowing a group of load and store instructions to execute in an atomic way. It is a concurrency control mechanism analogous to database transactions for controlling access to shared memory in concurrent computing. Transactional memory systems provide high-level abstraction as an alternative to low-level thread synchronization. This abstraction allows for coordination between concurrent reads and writes of shared data in parallel systems.
In computer science, synchronization refers to one of two distinct but related concepts: synchronization of processes, and synchronization of data. Process synchronization refers to the idea that multiple processes are to join up or handshake at a certain point, in order to reach an agreement or commit to a certain sequence of action. Data synchronization refers to the idea of keeping multiple copies of a dataset in coherence with one another, or to maintain data integrity. Process synchronization primitives are commonly used to implement data synchronization.
The Java programming language and the Java virtual machine (JVM) have been designed to support concurrent programming, and all execution takes place in the context of threads. Objects and resources can be accessed by many separate threads; each thread has its own path of execution but can potentially access any object in the program. The programmer must ensure read and write access to objects is properly coordinated between threads. Thread synchronization ensures that objects are modified by only one thread at a time and that threads are prevented from accessing partially updated objects during modification by another thread. The Java language has built-in constructs to support this coordination.
In computing, a memory model describes the interactions of threads through memory and their shared use of the data.
In computer science, a concurrent data structure is a particular way of storing and organizing data for access by multiple computing threads on a computer.
High performance computing applications run on massively parallel supercomputers consist of concurrent programs designed using multi-threaded, multi-process models. The applications may consist of various constructs with varying degree of parallelism. Although high performance concurrent programs use similar design patterns, models and principles as that of sequential programs, unlike sequential programs, they typically demonstrate non-deterministic behavior. The probability of bugs increases with the number of interactions between the various parallel constructs. Race conditions, data races, deadlocks, missed signals and live lock are common error types.
Processor Consistency is one of the consistency models used in the domain of concurrent computing.