This article has multiple issues. Please help improve it or discuss these issues on the talk page . (Learn how and when to remove these template messages)
|
Examples of coherency protocols for cache memory are listed here. For simplicity, all "miss" Read and Write status transactions which obviously come from state "I" (or miss of Tag), in the diagrams are not shown. They are shown directly on the new state. Many of the following protocols have only historical value. At the moment the main protocols used are the R-MESI type / MESIF protocols and the HRT-ST-MESI (MOESI type) or a subset or an extension of these.
In systems as Multiprocessor system, multi-core and NUMA system, where a dedicated cache for each processor, core or node is used, a consistency problem may occur when a same data is stored in more than one cache. This problem arises when a data is modified in one cache. This problem can be solved in two ways:
Note: Coherency generally applies only to data (as operands) and not to instructions (see Self-Modifying Code).
The schemes can be classified based on:
Three approaches are adopted to maintain the coherency of data.
Protocol used in bus-based systems like a SMP systems
Systems operating under a single OS (Operating System) with two or more homogeneous processors and with a centralized shared Main Memory
Each processor has its own cache that acts as a bridge between processor and Main Memory. The connection is made using a System Bus or a Crossbar ("xbar") or a mix of two previously approach, bus for address and crossbar for Data (Data crossbar). [1] [2] [3]
The bottleneck of these systems is the traffic and the Memory bandwidth. Bandwidth can be increasing by using large data bus path, data crossbar, memory interleaving (multi-bank parallel access) and out of order data transaction. The traffic can reduced by using a cache that acts as a "filter" versus the shared memory, that is the cache is an essential element for shared-memory in SMP systems.
In multiprocessor systems with separate caches that share a common memory, a same datum can be stored in more than one cache. A data consistency problem may occur when a data is modified in one cache only.
The protocols to maintain the coherency for multiple processors are called cache-coherency protocols.
Usually in SMP the coherency is based on the "Bus watching" or "Snoopy" (after the Peanuts' character Snoopy ) approach.
In a snooping system, all the caches monitor (or snoop) the bus transactions to intercept the data and determine if they have a copy on its cache.
Various cache-coherency protocols are used to maintain data coherency between caches. [4]
These protocols are generally classified based only on the cache states (from 3 to 5 and 7 or more) and the transactions between them, but this could create some confusion.
This definition is incomplete because it lacks important and essential information as the actions that these produce. These actions can be invoked by the processor or the bus controller (e.g. intervention, invalidation, broadcasting, etc.). The type of actions are implementation dependent. The states and transaction rules do not capture everything about a protocol. For instance protocol MESI with shared-intervention on unmodified data and MESI without intervention are different (see below). At the same time, some protocols with different states can be practically the same. For instance, the 4-state MESI Illinois and 5-state MERSI (R-MESI) IBM / MESIF-Intel protocols are only different implementations of the same functionality (see below).
The most common protocols are the 4-state MESI and the 5-state MOESI, each letter standing for one of the possible states of the cache. Other protocols use some proper subset of these but with different implementations along with their different but equivalent terminology. The terms MESI, MOESI or any subset of them generally refer to a class of protocols instead of a specific one.
The states MESI and MOESI are often and more commonly called by different names.
Special states:
Protocols | |
---|---|
SI protocol | Write Through |
MSI protocol | Synapse protocol [4] |
MEI protocol | IBM PowerPC 750, [13] MPC7400 [6] |
MES protocol | Firefly protocol [4] |
MESI protocol | Pentium II, [14] PowerPC, Intel Harpertown (Xeon 5400) |
MOSI protocol | Berkeley protocol [4] |
MOESI protocol | AMD64, [15] MOESI, [16] T-MESI IBM [12] |
Terminology used | |
---|---|
Illinois protocol | D-VE-S-I (= extended MESI) [4] [17] |
Write-once or Write-first | D-R-V-I (= MESI) [4] [18] [19] |
Berkeley protocol | D-SD-V-I (= MOSI) [4] |
Synapse protocol | D-V-I (= MSI) [4] |
Firefly protocol | D-VE-S (= MES) DEC [4] |
Dragon protocol | D-SD (SM ?)-SC-VE (= MOES) Xerox [4] |
Bull HN ISI protocol | D-SD-R-V-I (= MOESI) [20] |
MERSI (IBM) / MESIF (Intel) protocol | |
HRT-ST-MESI protocol | H=Hover, R=Recent,T=Tagged, ST=Shared-Tagged – IBM [11] [12] – Note: The main terminologies are SD-D-R-V-I and MOESI and so they will be used both. |
POWER4 IBM protocol | Mu-T-Me-M-S-SL-I ( L2 seven states) [9]
(*) Special state – Asking for a reservation for load and store doubleword (for 64-bit implementations). |
The main operations are:
Write Through
Write-Back
Write Allocate
Write-no-Allocate
There are three characteristics of cached data:
(*) – Implementation depending.
Note: Not to confuse the more restrictive "owner" definition in MOESI protocol with this more general definition.
The cache operations are:
————————————————————————————————————————
States MESI = D-R-V-I
————————————————————————————————————————
States MEOSI = D-R-SD-V-I = T-MESI IBM [12]
————————————————————————————————————————
States MESI = D-R-V-I [4]
————————————————————————————————————————
States D-R-V-I (MESI) [4] [18] [19]
————————————————————————————————————————
(Bull-Honeywell Italia)
States D-SD-R-V-I (MOESI)
Patented protocol (F. Zulian) [20]
————————————————————————————————————————
States D-V-I (MSI) [4]
————————————————————————————————————————
States D-SD-V-I (MOSI) [4]
————————————————————————————————————————
States D-VE-S (MES) [4]
————————————————————————————————————————
States D-SD-VE-SC (MOES) [4]
Note – the state SC, despite the term "clean", can be "clean" or "dirty" as the S state of the other protocols. SC and S are equivalents
————————————————————————————————————————
States MERSI or R-MESI
States MESIF
Patented protocols – IBM (1997) [6] – Intel (2002) [8]
————————————————————————————————————————
MESI and MOESI are the most popular protocols
It is common opinion that MOESI is an extension of MESI protocol and therefore it is more sophisticate and more performant. This is true only if compared with standard MESI, that is MESI with "not sharing intervention". MESI with "sharing intervention", as MESI Illinois like or the equivalent 5-state protocols MERSI / MESIF , are much more performant than the MOESI protocol.
In MOESI, cache-to-cache operations is made only on modified data. Instead in MESI Illinois type and MERSI / MESIF protocols, the cache-to-cache operations are always performed both with clean that with modified data. In case of modified data, the intervention is made by the "owner" M, but the ownership is not loosed because it is migrated in another cache (R/F cache in MERSI / MESIF or a selected cache as Illinois type). The only difference is that the MM must be updated. But also in MOESI this transaction should be done later in case of replacement, if no other modification occurs meanwhile. However this it is a smaller limit compared to the memory transactions due to the not-intervention, as in case of clean data for MOESI protocol. (see e.g. "Performance evaluation between MOESI (Shanghai) and MESIF Nehalem-EP" [21] )
The most advance systems use only R-MESI / MESIF protocol or the more complete RT-MESI, HRT-ST-MESI and POWER4 IBM protocols that are an enhanced merging of MESI and MOESI protocols
Note: Cache-to-cache is an efficient approach in multiprocessor/multicore systems direct connected between them, but less in Remote cache as in NUMA systems where a standard MESI is preferable. Example in POWER4 IBM protocol "shared intervention" is made only "local" and not between remote module.
————————————————————————————————————————
States RT-MESI
IBM patented protocol [11] [12]
Processor operations
————————————————————————————————————————
It is an improvement of RT-MESI protocol [12] and it is a subset of HRT-ST-MESI protocol [11]
An additional improvement can be obtained using more than a ST state, ST1, ST2, … STn.
————————————————————————————————————————
IBM patented full HRT-ST-MESI protocol [11] [12]
- I state = Invalid Tag (*) – Invalid Data
- H state = Valid Tag – Invalid Data
- I state is set at the cache initialization and its state changes only after a processor Read or Write miss. After it will not return more in this state.
- H has the same functionality of I state but in addition with the ability to capture any bus transaction that match the Tag of the directory and to update the data cache.
- After the first utilization I is replaced by H in its functions
(*) – Note: The Tag for definition is always valid, but until the first updating of the cache line it is considered invalid in order to avoid to update the cache also when this line has been not still required and used.
————————————————————————————————————————
States M-T-Me-S-I -Mu-SL = RT-MESI+Mu [9]
————————————————————————————————————————
Under some conditions the most efficient and complete protocol turns out to be the HRT-ST-MESI protocol.
In computing, a cache is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewhere. A cache hit occurs when the requested data can be found in a cache, while a cache miss occurs when it cannot. Cache hits are served by reading data from the cache, which is faster than recomputing a result or reading from a slower data store; thus, the more requests that can be served from the cache, the faster the system performs.
Peripheral Component Interconnect (PCI) is a local computer bus for attaching hardware devices in a computer and is part of the PCI Local Bus standard. The PCI bus supports the functions found on a processor bus but in a standardized format that is independent of any given processor's native bus. Devices connected to the PCI bus appear to a bus master to be connected directly to its own bus and are assigned addresses in the processor's address space. It is a parallel bus, synchronous to a single bus clock. Attached devices can take either the form of an integrated circuit fitted onto the motherboard or an expansion card that fits into a slot. The PCI Local Bus was first implemented in IBM PC compatibles, where it displaced the combination of several slow Industry Standard Architecture (ISA) slots and one fast VESA Local Bus (VLB) slot as the bus configuration. It has subsequently been adopted for other computer types. Typical PCI cards used in PCs include: network cards, sound cards, modems, extra ports such as Universal Serial Bus (USB) or serial, TV tuner cards and hard disk drive host adapters. PCI video cards replaced ISA and VLB cards until rising bandwidth needs outgrew the abilities of PCI. The preferred interface for video cards then became Accelerated Graphics Port (AGP), a superset of PCI, before giving way to PCI Express.
In computer architecture, cache coherence is the uniformity of shared resource data that ends up stored in multiple local caches. When clients in a system maintain caches of a common memory resource, problems may arise with incoherent data, which is particularly the case with CPUs in a multiprocessing system.
The MESI protocol is an Invalidate-based cache coherence protocol, and is one of the most common protocols that support write-back caches. It is also known as the Illinois protocol due to its development at the University of Illinois at Urbana-Champaign. Write back caches can save considerable bandwidth generally wasted on a write through cache. There is always a dirty state present in write-back caches that indicates that the data in the cache is different from that in the main memory. The Illinois Protocol requires a cache-to-cache transfer on a miss if the block resides in another cache. This protocol reduces the number of main memory transactions with respect to the MSI protocol. This marks a significant improvement in performance.
In computer science, a consistency model specifies a contract between the programmer and a system, wherein the system guarantees that if the programmer follows the rules for operations on memory, memory will be consistent and the results of reading, writing, or updating memory will be predictable. Consistency models are used in distributed systems like distributed shared memory systems or distributed data stores. Consistency is different from coherence, which occurs in systems that are cached or cache-less, and is consistency of data with respect to all processors. Coherence deals with maintaining a global order in which writes to a single location or single variable are seen by all processors. Consistency deals with the ordering of operations to multiple locations with respect to all processors.
Bus snooping or bus sniffing is a scheme by which a coherency controller (snooper) in a cache monitors or snoops the bus transactions, and its goal is to maintain a cache coherency in distributed shared memory systems. This scheme was introduced by Ravishankar and Goodman in 1983, under the name "write-once" cache coherency. A cache containing a coherency controller (snooper) is called a snoopy cache.
Memory coherence is an issue that affects the design of computer systems in which two or more processors or cores share a common area of memory.
In computer science, distributed shared memory (DSM) is a form of memory architecture where physically separated memories can be addressed as a single shared address space. The term "shared" does not mean that there is a single centralized memory, but that the address space is shared—i.e., the same physical address on two processors refers to the same location in memory. Distributed global address space (DGAS), is a similar term for a wide class of software and hardware implementations, in which each node of a cluster has access to shared memory in addition to each node's private memory.
In computing, the MSI protocol - a basic cache-coherence protocol - operates in multiprocessor systems. As with other cache coherency protocols, the letters of the protocol name identify the possible states in which a cache line can be.
The MOSI protocol is an extension of the basic MSI cache coherency protocol. It adds the Owned state, which indicates that the current processor owns this block, and will service requests from other processors for the block.
(For a detailed description see Cache coherency protocols )
In cache coherency protocol literature, Write-Once was the first MESI protocol defined. It has the optimization of executing write-through on the first write and a write-back on all subsequent writes, reducing the overall bus traffic in consecutive writes to the computer memory. It was first described by James R. Goodman in (1983). Cache coherence protocols are an important issue in Symmetric multiprocessing systems, where each CPU maintains a cache of the memory.
The Firefly cache coherence protocol is the schema used in the DEC Firefly multiprocessor workstation, developed by DEC Systems Research Center. This protocol is a 3 State Write Update Cache Coherence Protocol. Unlike the Dragon protocol, the Firefly protocol updates the Main Memory as well as the Local caches on Write Update Bus Transition. Thus the Shared Clean and Shared Modified States present in case of Dragon Protocol, are not distinguished between in case of Firefly Protocol.
Database caching is a process included in the design of computer applications which generate web pages on-demand (dynamically) by accessing backend databases.
The Dragon Protocol is an update based cache coherence protocol used in multi-processor systems. Write propagation is performed by directly updating all the cached values across multiple processors. Update based protocols such as the Dragon protocol perform efficiently when a write to a cache block is followed by several reads made by other processors, since the updated cache block is readily available across caches associated with all the processors.
The MERSI protocol is a cache coherency and memory coherence protocol used by the PowerPC G4. The protocol consists of five states, Modified (M), Exclusive (E), Read Only or Recent (R), Shared (S) and Invalid (I). The M, E, S and I states are the same as in the MESI protocol. The R state is similar to the E state in that it is constrained to be the only clean, valid, copy of that data in the computer system. Unlike the E state, the processor is required to initially request ownership of the cache line in the R state before the processor may modify the cache line and transition to the M state. In both the MESI and MERSI protocols, the transition from the E to M is silent.
The MESIF protocol is a cache coherency and memory coherence protocol developed by Intel for cache coherent non-uniform memory architectures. The protocol consists of five states, Modified (M), Exclusive (E), Shared (S), Invalid (I) and Forward (F).
Power consumption is becoming increasingly important for both embedded, mobile computing and high-performance systems. Off-chip data bus consumes a significant part of system power. It is observed that the off-chip data bus consumes between 9.8% and 23.2% of the total power consumed by the system depending on the system. So, reducing the power consumption of the off-chip data bus would reduce the overall power consumption.
A CPU cache is a piece of hardware that reduces access time to data in memory by keeping some part of the frequently used data of the main memory in a 'cache' of smaller and faster memory.
Directory-based coherence is a mechanism to handle cache coherence problem in distributed shared memory (DSM) a.k.a. non-uniform memory access (NUMA). Another popular way is to use a special type of computer bus between all the nodes as a "shared bus". Directory-based coherence uses a special directory to serve instead of the shared bus in the bus-based coherence protocols. Both of these designs use the corresponding medium as a tool to facilitate the communication between different nodes, and to guarantee that the coherence protocol is working properly along all the communicating nodes. In directory based cache coherence, this is done by using this directory to keep track of the status of all cache blocks, the status of each block includes in which cache coherence "state" that block is, and which nodes are sharing that block at that time, which can be used to eliminate the need to broadcast all the signals to all nodes, and only send it to the nodes that are interested in this single block.