Firefly (cache coherence protocol)

Last updated

The Firefly cache coherence protocol is the schema used in the DEC Firefly multiprocessor workstation, developed by DEC Systems Research Center. This protocol is a 3 State Write Update Cache Coherence Protocol. Unlike the Dragon protocol, the Firefly protocol updates the Main Memory as well as the Local caches on Write Update Bus Transition. Thus the Shared Clean and Shared Modified States present in case of Dragon Protocol, are not distinguished between in case of Firefly Protocol.

Contents

States

In this protocol, the following states can be assigned to each block:

These states correspond to the Exclusive, Shared, and Modified states of the MESI protocol. This protocol never causes invalidation, so the Invalid state is not listed here.

Processor-side Requests

Processor-side requests or CPU requests are the accesses that the processor makes to its own caches. These may be classified into 4 types of requests namely:

  1. PrRdMiss: Processor side request to read a cache block that does not reside in the cache.
  2. PrRdHit: Processor side request to read a cache block that already resides in the cache.
  3. PrWtHit: Processor side request to write to a cache block that already resides in the cache.
  4. PrWtMiss: Processor side request to write to a cache block that does not reside in the cache.

Bus-Side Requests

Bus-side requests are the requests generated in response to the processor - side requests to maintain cache coherence. These are snooped by the snooper of caches and memory and appropriate action is taken. These are classified into two types in the Firefly protocol, namely:

1. BusRd: Request that indicates there is a read request to a cache block made by another processor and that processor doesn't have the data.

2. BusWr/BusUpdt: Request that indicates there is a write request to a cache block made by another processor and all other caches must update their copies of the block.

Transitions

In order to identify which transitions must be made, the protocol detects sharing using a special bus line named CopiesExist. All other caches snoop all memory operations and raise the CopiesExist(C) if they detect a "snoop hit", i.e. if they have a copy of the data in their own cache.

An arrow that goes from nowhere to a state represents a newly loaded block.

Processor-Initiated Transitions

State Diagram for Firefly protocol. Firefly State Transition Diagram.png
State Diagram for Firefly protocol.

In case of processor read miss to a block, and if there is no copy of the block in any other cache, CopiesExist (C) line is checked and C is LOW, then the block is placed in the cache and state is set as Valid. If there is already a copy in some caches (C is HIGH), then the block is placed in the cache in the Shared state.

On a write miss to a block, if there is no copy of the block in any cache (C is LOW), the block is placed in the cache in the Dirty state. If there is already a copy of the block in some caches (C is HIGH), then the block is placed in the cache in the state Shared state and the changes are reflected in the memory.

If a block is already cached in the Valid state, a processor write hit changes the state to Dirty state as no other cache has the copy of data. This write is not written through to memory.

If a block is cached in the Dirty state and a processor write hit occurs, then the state remains as Dirty state.

If the block is in Shared state and there's a processor write hit, and if there is already a copy in some caches (C), the block stays in Shared state. If there is no copy of the block in any cache (!C), the block is written in Valid state as it is the only ‘valid’ copy present in caches.

If there's a CPU read hit, the block stays in whatever state it is already in—just like in the Dragon protocol.

Bus-Initiated Transitions

If the block is in Shared State, and there's a BusRd or BusWr request, the block stays in Shared state.

If the block is in Dirty State, and another processor reads or writes it, request from another processor, it transitions into Shared state and changes are reflected in main memory.

If the block is in Valid State and another processor reads it, it transitions into Shared state. If another processor write request is snooped, the block will be updated, and since it is now being shared, it also moves into Shared state.

Unlike MESI, in the Firefly update protocol, write propagation is ensured by directly updating all other copies on a write request by the processors (PrWr).

Comparison with other Policies

1. Due to the fact that updated copies of the data exist in caches, there are fewer coherence misses than in Write – Invalidate policies.

2. Higher bus bandwidth is required than in invalidate protocols because invalidate protocols just send a signal/command on the bus which is snooped at other processors, causing them to invalidate their own copies of the data. In update protocols, by contrast, the new data value has to be sent along with the BusUpdate signal to allow memory and other caches to snoop and update their data.

3. Updating the data on every write causes some no-longer-needed data to remain in the cache, which may cause some ‘useful’ data to be evicted.

See also

Related Research Articles

<span class="mw-page-title-main">Cache (computing)</span> Additional storage that enables faster access to main storage

In computing, a cache is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewhere. A cache hit occurs when the requested data can be found in a cache, while a cache miss occurs when it cannot. Cache hits are served by reading data from the cache, which is faster than recomputing a result or reading from a slower data store; thus, the more requests that can be served from the cache, the faster the system performs.

Direct memory access (DMA) is a feature of computer systems that allows certain hardware subsystems to access main system memory independently of the central processing unit (CPU).

<span class="mw-page-title-main">Scalable Coherent Interface</span> High-speed interconnect standard for shared memory multiprocessing and message passing

The Scalable Coherent Interface or Scalable Coherent Interconnect (SCI), is a high-speed interconnect standard for shared memory multiprocessing and message passing. The goal was to scale well, provide system-wide memory coherence and a simple interface; i.e. a standard to replace existing buses in multiprocessor systems with one with no inherent scalability and performance limitations.

<span class="mw-page-title-main">Cache coherence</span> Computer architecture term concerning shared resource data

In computer architecture, cache coherence is the uniformity of shared resource data that ends up stored in multiple local caches. When clients in a system maintain caches of a common memory resource, problems may arise with incoherent data, which is particularly the case with CPUs in a multiprocessing system.

The MESI protocol is an Invalidate-based cache coherence protocol, and is one of the most common protocols that support write-back caches. It is also known as the Illinois protocol. Write back caches can save a lot of bandwidth that is generally wasted on a write through cache. There is always a dirty state present in write back caches that indicates that the data in the cache is different from that in main memory. The Illinois Protocol requires a cache to cache transfer on a miss if the block resides in another cache. This protocol reduces the number of main memory transactions with respect to the MSI protocol. This marks a significant improvement in performance.

In computer science, a consistency model specifies a contract between the programmer and a system, wherein the system guarantees that if the programmer follows the rules for operations on memory, memory will be consistent and the results of reading, writing, or updating memory will be predictable. Consistency models are used in distributed systems like distributed shared memory systems or distributed data stores. Consistency is different from coherence, which occurs in systems that are cached or cache-less, and is consistency of data with respect to all processors. Coherence deals with maintaining a global order in which writes to a single location or single variable are seen by all processors. Consistency deals with the ordering of operations to multiple locations with respect to all processors.

Bus snooping or bus sniffing is a scheme by which a coherency controller (snooper) in a cache monitors or snoops the bus transactions, and its goal is to maintain a cache coherency in distributed shared memory systems. This scheme was introduced by Ravishankar and Goodman in 1983, under the name "write-once" cache coherency. A cache containing a coherency controller (snooper) is called a snoopy cache.

Memory coherence is an issue that affects the design of computer systems in which two or more processors or cores share a common area of memory.

In computer science, distributed shared memory (DSM) is a form of memory architecture where physically separated memories can be addressed as a single shared address space. The term "shared" does not mean that there is a single centralized memory, but that the address space is shared—i.e., the same physical address on two processors refers to the same location in memory. Distributed global address space (DGAS), is a similar term for a wide class of software and hardware implementations, in which each node of a cluster has access to shared memory in addition to each node's private memory.

A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, which stores copies of the data from frequently used main memory locations. Most CPUs have a hierarchy of multiple cache levels, with different instruction-specific and data-specific caches at level 1. The cache memory is typically implemented with static random-access memory (SRAM), in modern CPUs by far the largest part of them by chip area, but SRAM is not always used for all levels, or even any level, sometimes some latter or all levels are implemented with eDRAM.

In computing, the MSI protocol - a basic cache-coherence protocol - operates in multiprocessor systems. As with other cache coherency protocols, the letters of the protocol name identify the possible states in which a cache line can be.

The MOSI protocol is an extension of the basic MSI cache coherency protocol. It adds the Owned state, which indicates that the current processor owns this block, and will service requests from other processors for the block.

(For a detailed description see Cache coherency protocols )

In cache coherency protocol literature, Write-Once was the first MESI protocol defined. It has the optimization of executing write-through on the first write and a write-back on all subsequent writes, reducing the overall bus traffic in consecutive writes to the computer memory. It was first described by James R. Goodman in (1983). Cache coherence protocols are an important issue in Symmetric multiprocessing systems, where each CPU maintains a cache of the memory.

The Dragon Protocol is an update based cache coherence protocol used in multi-processor systems. Write propagation is performed by directly updating all the cached values across multiple processors. Update based protocols such as the Dragon protocol perform efficiently when a write to a cache block is followed by several reads made by other processors, since the updated cache block is readily available across caches associated with all the processors.

The MERSI protocol is a cache coherency and memory coherence protocol used by the PowerPC G4. The protocol consists of five states, Modified (M), Exclusive (E), Read Only or Recent (R), Shared (S) and Invalid (I). The M, E, S and I states are the same as in the MESI protocol. The R state is similar to the E state in that it is constrained to be the only clean, valid, copy of that data in the computer system. Unlike the E state, the processor is required to initially request ownership of the cache line in the R state before the processor may modify the cache line and transition to the M state. In both the MESI and MERSI protocols, the transition from the E to M is silent.

The MESIF protocol is a cache coherency and memory coherence protocol developed by Intel for cache coherent non-uniform memory architectures. The protocol consists of five states, Modified (M), Exclusive (E), Shared (S), Invalid (I) and Forward (F).

In computer engineering, directory-based cache coherence is a type of cache coherence mechanism, where directories are used to manage caches in place of bus snooping. Bus snooping methods scale poorly due to the use of broadcasting. These methods can be used to target both performance and scalability of directory systems.

Directory-based coherence is a mechanism to handle Cache coherence problem in Distributed shared memory (DSM) a.k.a. Non-Uniform Memory Access (NUMA). Another popular way is to use a special type of computer bus between all the nodes as a "shared bus". Directory-based coherence uses a special directory to serve instead of the shared bus in the bus-based coherence protocols. Both of these designs use the corresponding medium as a tool to facilitate the communication between different nodes, and to guarantee that the coherence protocol is working properly along all the communicating nodes. In directory based cache coherence, this is done by using this directory to keep track of the status of all cache blocks, the status of each block includes in which cache coherence "state" that block is, and which nodes are sharing that block at that time, which can be used to eliminate the need to broadcast all the signals to all nodes, and only send it to the nodes that are interested in this single block.

Examples of coherency protocols for cache memory are listed here. For simplicity, all "miss" Read and Write status transactions which obviously come from state "I", in the diagrams are not shown. They are shown directly on the new state. Many of the following protocols have only historical value. At the moment the main protocols used are the R-MESI type / MESIF protocols and the HRT-ST-MESI or a subset or an extension of these.

References