Processor consistency

Last updated

Processor consistency is one of the consistency models used in the domain of concurrent computing (e.g. in distributed shared memory, distributed transactions, etc.).

Contents

A system exhibits processor consistency if the order in which other processors see the writes from any individual processor is the same as the order they were issued. Because of this, processor consistency is only applicable to systems with multiple processors. It is weaker than the causal consistency model because it does not require writes from all processors to be seen in the same order, but stronger than the PRAM consistency model because it requires cache coherence. [1] Another difference between causal consistency and processor consistency is that processor consistency removes the requirements for loads to wait for stores to complete, and for Write Atomicity. [1] processor consistency is also stronger than cache consistency because processor consistency requires all writes by a processor to be seen in order, not just writes to the same memory location. [1]

Examples of processor consistency

Example 1: Processor Consistent
P1W(x)1W(x)3
P2R(x)1R(x)3
P3W(y)1W(y)2
P4R(y)1R(y)2
Example 2: Not Processor Consistent
P1W(x)1W(x)3
P2R(x)3R(x)1
P3W(y)1W(y)2
P4R(y)2R(y)1

In Example 1 to the right, the simple system follows processor consistency, as all the writes by each processor are seen in the order they occurred in by the other processors, and the transactions are coherent.

Example 2 is not processor consistent, as the writes by P1 and P3 are seen out of order by P2 and P4 respectively.

Example 3 is processor consistent and not causally consistent because R(y)3,R(x)1 in P3: for causal consistency it should be R(y)3,R(x)2 since W(x)2 in P1 causally precedes W(y)3 in P2.

Example 4 is not processor consistent because R(y)3,R(x)1 in P2: for processor consistency it should be R(y)3,R(x)2 because W(x)2 is the latest write to x preceding W(y)3 in P1.
This example cache consistent because P2 sees writes to individual memory locations in the order they were issued in P1.

Example 3: Causal: No; Processor: Yes
P1W(x)1W(x)2
P2R(x)2W(y)3
P3R(y)3R(x)1
Example 4: Processor: No; Cache: Yes
P1W(x)1W(x)2W(y)3
P2R(y)3R(x)1

Processor consistency vs. sequential consistency

Processor consistency (PC) relaxes the ordering between older stores and younger loads that is enforced in sequential consistency (SC). [2] This allows loads to be issued to the cache and potentially complete before older stores, meaning that stores can be queued in a write buffer without the need for load speculation to be implemented (the loads can continue freely). [3] In this regard, PC performs better than SC because recovery techniques for failed speculations are not necessary, which means fewer pipeline flushes. [3] The prefetching optimization that SC systems employ is also applicable to PC systems. [3] Prefetching is the act of fetching data in advance for upcoming loads and stores before it is actually needed, to cut down on load/store latency. Since PC reduces load latency by allowing loads to be re-ordered before corresponding stores, the need for prefetching is somewhat reduced, as the prefetched data will be used more for stores than for loads. [3]

Programmer's intuition

In terms of how well a PC system follows a programmer's intuition, it turns out that in properly synchronized systems, the outcomes of PC and SC are the same. [3] Programmer's intuition is essentially how the programmer expects the instructions to execute, usually in what is referred to as "program order". Program order in a multiprocessor system is the execution of instructions resulting in the same outcome as a sequential execution. The fact that PC and SC both follow this expectation is a direct consequence of the fact that corresponding loads and stores in PC systems are still ordered with respect to each other. [3] For example, in lock synchronization, the only operation whose behavior is not fully defined by PC is the lock-acquire store, where subsequent loads are in the critical section and their order affects the outcome. [3] This operation, however, is usually implemented with a store conditional or atomic instruction, so that if the operation fails it will be repeated later and all the younger loads will also be repeated. [3] All loads occurring before this store are still ordered with respect to the loads occurring in the critical section, and as such all the older loads have to complete before loads in the critical section can run.

Processor consistency vs. other relaxed consistency models

Processor consistency, while weaker than sequential consistency, is still in most cases a stronger consistency model than is needed. This is due to the number of synchronization points inherent to programs that run on multiprocessor systems. [4] This means that no data races can occur (a data race being multiple simultaneous accesses to memory location where at least one access is a write). [3] With this in mind, it is clear to see that a model could allow for reorganization of all memory operations, as long as no operation crosses a synchronization point [3] and one does, called Weak Ordering. However, weak ordering does impose some of the same restrictions as processor consistency, namely that the system must remain coherent and thus all writes to the same memory location must be seen by all processors in the same order. [4] Similar to weak ordering, the release consistency model allows reordering of all memory operations, but it gets even more specific and breaks down synchronization operations to allow more relaxation of reorders. [3] Both of these models assume proper synchronization of code and in some cases hardware synchronization support, and so processor consistency is a safer model to adhere to if one is unsure about the reliability of the programs to be run using the model.

Similarity to SPARC V8 TSO, IBM-370, and x86-TSO memory models

One of the main components of processor consistency is that if a write followed by a read is allowed to execute out of program order. This essentially results in the hiding of write latency when loads are allowed to go ahead of stores. Since many applications function correctly with this structure, systems that implement this type of relaxed ordering typically appear sequentially consistent. Two other models that conform to this specification are the SPARC V8 TSO (Total Store Ordering) and the IBM-370. [4]

The IBM-370 model follows the specification of allowing a write followed by a read to execute out of program order, with a few exceptions. The first is that if the operations are to the same location, they must be in program order. The second is that if either operation is part of a serialization instruction or there is a serialization instruction between the two operations, then the operations must execute in program order. [4] This model is perhaps the strictest of the three models being considered, as the TSO model removes one of the exceptions mentioned.

The SPARC V8 TSO model is very similar to the IBM-370 model with the key difference that it allows operations to the same location to complete out of program order. With this, it is possible that a load returns a store that occurred that is "out of date" in terms of program order. [4] These models are similar to processor consistency, but whereas these models only have one copy of memory, processor consistency has no such restriction. This suggests a system in which each processor has its own memory, which emphasizes upon processor consistency the "coherence requirement. [4] "

The x86-TSO model has a number of different definitions. The total store model, as the name suggests, is very similar to the SPARC V8. The other definition is based on local write buffers. The differences in the x86 and SPARC TSO models is in the omission of some instructions and inclusion of others, but the models themselves are very similar. [5] The write buffer definition utilizes various states and locks to determine whether a particular value can be read/written to. In addition, this particular model for the x86 architecture is not plagued by the issues of previous (weaker consistency) models, and provides a more intuitive base for programmers to build upon. [5]

See also

Related Research Articles

<span class="mw-page-title-main">Endianness</span> Order of bytes in a computer word

In computing, endianness is the order in which bytes within a word of digital data are transmitted over a data communication medium or addressed in computer memory, counting only byte significance compared to earliness. Endianness is primarily expressed as big-endian (BE) or little-endian (LE), terms introduced by Danny Cohen into computer science for data ordering in an Internet Experiment Note published in 1980. The adjective endian has its origin in the writings of 18th century Anglo-Irish writer Jonathan Swift. In the 1726 novel Gulliver's Travels, he portrays the conflict between sects of Lilliputians divided into those breaking the shell of a boiled egg from the big end or from the little end. By analogy, a CPU may read a digital word big end first, or little end first.

In computer science, an instruction set architecture (ISA) is an abstract model that generally defines how software controls the CPU in a computer or a family of computers. A device or program that executes instructions described by that ISA, such as a central processing unit (CPU), is called an implementation of that ISA.

In computer architecture, 32-bit computing refers to computer systems with a processor, memory, and other major system components that operate on data in 32-bit units. Compared to smaller bit widths, 32-bit computers can perform large calculations more efficiently and process more data per clock cycle. Typical 32-bit personal computers also have a 32-bit address bus, permitting up to 4 GB of RAM to be accessed, far more than previous generations of system architecture allowed.

<span class="mw-page-title-main">Cache coherence</span> Computer architecture term concerning shared resource data

In computer architecture, cache coherence is the uniformity of shared resource data that ends up stored in multiple local caches. When clients in a system maintain caches of a common memory resource, problems may arise with incoherent data, which is particularly the case with CPUs in a multiprocessing system.

x86 assembly language is the name for the family of assembly languages which provide some level of backward compatibility with CPUs back to the Intel 8008 microprocessor, which was launched in April 1972. It is used to produce object code for the x86 class of processors.

In computer science, a consistency model specifies a contract between the programmer and a system, wherein the system guarantees that if the programmer follows the rules for operations on memory, memory will be consistent and the results of reading, writing, or updating memory will be predictable. Consistency models are used in distributed systems like distributed shared memory systems or distributed data stores. Consistency is different from coherence, which occurs in systems that are cached or cache-less, and is consistency of data with respect to all processors. Coherence deals with maintaining a global order in which writes to a single location or single variable are seen by all processors. Consistency deals with the ordering of operations to multiple locations with respect to all processors.

Peterson's algorithm is a concurrent programming algorithm for mutual exclusion that allows two or more processes to share a single-use resource without conflict, using only shared memory for communication. It was formulated by Gary L. Peterson in 1981. While Peterson's original formulation worked with only two processes, the algorithm can be generalized for more than two.

The name weak consistency can be used in two senses. In the first sense, strict and more popular, weak consistency is one of the consistency models used in the domain of concurrent programming.

<span class="mw-page-title-main">Race condition</span> When a systems behavior depends on timing of uncontrollable events

A race condition or race hazard is the condition of an electronics, software, or other system where the system's substantive behavior is dependent on the sequence or timing of other uncontrollable events, leading to unexpected or inconsistent results. It becomes a bug when one or more of the possible behaviors is undesirable.

<span class="mw-page-title-main">Sequential consistency</span> Consistency model in concurrent computing

Sequential consistency is a consistency model used in the domain of concurrent computing.

Release consistency is one of the synchronization-based consistency models used in concurrent programming.

In computer science, compare-and-swap (CAS) is an atomic instruction used in multithreading to achieve synchronization. It compares the contents of a memory location with a given value and, only if they are the same, modifies the contents of that memory location to a new given value. This is done as a single atomic operation. The atomicity guarantees that the new value is calculated based on up-to-date information; if the value had been updated by another thread in the meantime, the write would fail. The result of the operation must indicate whether it performed the substitution; this can be done either with a simple boolean response, or by returning the value read from the memory location.

In computing, a memory barrier, also known as a membar, memory fence or fence instruction, is a type of barrier instruction that causes a central processing unit (CPU) or compiler to enforce an ordering constraint on memory operations issued before and after the barrier instruction. This typically means that operations issued prior to the barrier are guaranteed to be performed before operations issued after the barrier.

In computer engineering, out-of-order execution is a paradigm used in high-performance central processing units to make use of instruction cycles that would otherwise be wasted. In this paradigm, a processor executes instructions in an order governed by the availability of input data and execution units, rather than by their original order in a program. In doing so, the processor can avoid being idle while waiting for the preceding instruction to complete and can, in the meantime, process the next instructions that are able to run immediately and independently.

Causal consistency is one of the major memory consistency models. In concurrent programming, where concurrent processes are accessing a shared memory, a consistency model restricts which accesses are legal. This is useful for defining correct data structures in distributed shared memory or distributed transactions.

Scratchpad memory (SPM), also known as scratchpad, scratchpad RAM or local store in computer terminology, is an internal memory, usually high-speed, used for temporary storage of calculations, data, and other work in progress. In reference to a microprocessor, scratchpad refers to a special high-speed memory used to hold small items of data for rapid retrieval. It is similar to the usage and size of a scratchpad in life: a pad of paper for preliminary notes or sketches or writings, etc. When the scratchpad is a hidden portion of the main memory then it is sometimes referred to as bump storage.

Memory ordering is the order of accesses to computer memory by a CPU. Memory ordering depends on both the order of the instructions generated by the compiler at compile time and the execution order of the CPU at runtime. However, memory order is of little concern outside of multithreading and memory-mapped I/O, because if the compiler or CPU changes the order of any operations, it must necessarily ensure that the reordering does not change the output of ordinary single-threaded code.

An instruction set architecture (ISA) is an abstract model of a computer, also referred to as computer architecture. A realization of an ISA is called an implementation. An ISA permits multiple implementations that may vary in performance, physical size, and monetary cost ; because the ISA serves as the interface between software and hardware. Software that has been written for an ISA can run on different implementations of the same ISA. This has enabled binary compatibility between different generations of computers to be easily achieved, and the development of computer families. Both of these developments have helped to lower the cost of computers and to increase their applicability. For these reasons, the ISA is one of the most important abstractions in computing today.

This glossary of computer hardware terms is a list of definitions of terms and concepts related to computer hardware, i.e. the physical and structural components of computers, architectural issues, and peripheral devices.

In computing, a cache control instruction is a hint embedded in the instruction stream of a processor intended to improve the performance of hardware caches, using foreknowledge of the memory access pattern supplied by the programmer or compiler. They may reduce cache pollution, reduce bandwidth requirement, bypass latencies, by providing better control over the working set. Most cache control instructions do not affect the semantics of a program, although some can.

References

  1. 1 2 3 David Mosberger (1992). "Memory Consistency Models" (PDF). University of Arizona. Retrieved 2015-04-01.{{cite journal}}: Cite journal requires |journal= (help)
  2. Kourosh Gharachorloo; Daniel Lenoski; James Laudon; Phillip Gibbons; Anoop Gupta; John Hennessy (1 August 1998). "Memory consistency and event ordering in scalable shared-memory multiprocessors" (PDF). 25 years of the international symposia on Computer architecture (Selected papers). ACM. pp. 376–387. doi:10.1145/285930.285997. ISBN   1581130589. S2CID   47089892 . Retrieved 2015-04-01.
  3. 1 2 3 4 5 6 7 8 9 10 11 Solihin, Yan (2009). Fundamentals of parallel computer architecture : multichip and multicore systems. Solihin Pub. pp. 297–299. ISBN   978-0-9841630-0-7.
  4. 1 2 3 4 5 6 Kourosh Gharachorloo (1995). "Memory Consistency Models for Shared-Memory Multiprocessors" (PDF). Western Research Laboratory. Retrieved 2015-04-07.{{cite journal}}: Cite journal requires |journal= (help)
  5. 1 2 Scott Owens; Susmit Sarkar; Peter Sewell (2009). "A better x86 memory model: x86-TSO (extended version)" (PDF). University of Cambridge. doi:10.48456/tr-745 . Retrieved 2015-04-08.{{cite journal}}: Cite journal requires |journal= (help)