MOSI protocol

Last updated May 02, 2022

The MOSI protocol is an extension of the basic MSI cache coherency protocol. It adds the Owned state, which indicates that the current processor owns this block, and will service requests from other processors for the block.

Overview of States

Following are the permitted states of a given cache line:

Modified (M) - Only one cache has a valid copy of the block and the value is likely to be different from the one in main memory. It has almost the same meaning as a dirty state in a write back cache except for the difference that modified state also implies exclusive ownership of that block. Dirty state just means that the value of the block is different from the one in main memory, whereas, modified implies that the value is different than that of the main memory and that it is cached in only one location.

Owned (O) - Multiple caches may hold the most recent and correct value of a block and the value in main memory may or may not be correct. At a time, only one cache can have the owned state for a block. All the other caches with the same block must be in shared state.^[1]

Shared (S) - Cache block is valid, could be shared by multiple caches, and may or may not have the same value as the main memory. Other processors can read from this, but do not have write permissions.

Invalid (I) - Cache block is invalid.

For any given pair of caches, the permitted states of a given cache line are as follows:

	M	O	S	I
M	N	N	N	Y
O	N	N	Y	Y
S	N	Y	Y	Y
I	Y	Y	Y	Y

Operations

In MOSI protocol, each cache has the following requests:

PrRd - Processor request to read a cache block.
PrWr - Processor request to write into a cache block.
BusRd - Snooped request indicating that there is a read request to a cache block made by another processor.
BusRdX - Snooped request indicating that there is a write request to a cache block made by another processor that does not have the block.
BusUpgr - Snooped request depicting that there is a write request to a cache block made by another processor that already has the block in its cache.
Flush - Snooped request after which the cache block is placed on the bus for a cache to cache transfer.^[2]

Processor Transactions

Looking at the case for processor transactions, when the block is in the Invalid (I) state, either the cache block was never fetched from the memory or it was invalidated. When there is a processor read (PrRd), the state changes from invalid (I) to shared (S), thereby generating a bus read (BusRd). At the same time, if it is a processor write request (PrWr), then the state of the block changes to modified (M) along with a snooped write request (BusRdX).

Once the block is in the Owned (O) state, then a processor read (PrRd) does not generate any snooped signal and the block remains in the same state. Whereas, a write request from the processor (PrWr) results in changing the state of the block from owned (O) to modified (M) along with generating a snooped write request (BusUpgr).^[3]

When the block is in the Modified (M) state, neither a processor read (PrRd) nor a processor write (PrWr) request generates a snooped signal since the block already indicates that the most recent and correct value resides only in that cache. Hence, it does not change the state and stays in modified (M) state.

While the block is in the Shared (S) state and there is processor read (PrRd) request, since the value of the cache block is the same in every other processor and in the main memory, there is no bus signal that is generated after a processor read (PrRd). A bus write request (BusUpgr) is generated once there is a processor write (PrWr) request to a block in the shared (S) state because the cache block is now no longer valid in all the other caches and the state of the block changes from shared (S) to being modified (M).

Bus Transactions

Considering the behavior of the finite state machine to snooped bus transactions, if the cache block is in Invalid (I) state then no snooped bus request will affect the block in any way, so even if it is a bus read (BusRd) or bus write request from a processor that has or does not have the block (BusRdX or BusUpgr), the block remains in the same invalid (I) state and does not generate any further actions.

When the cache block is in the Shared (S) state and there is a snooped bus read (BusRd) transaction, then the block stays in the same state and generates no more transactions as all the cache blocks have the same value including the main memory and it is only being read, not written into. If there is snooped write request (BusRdX or BusUpgr), then the state of the block changes from shared (S) to invalid (I) as the value of the block has been modified in one of the other cache blocks and all the other copies now must be invalidated.

Once the cache block is in the Modified (M) state and there is a bus read (BusRd) request, the block flushes (Flush) the modified data and changes the state to owned (O), thus making it the sole owner for that particular cache block. At the same time, when it is in the modified (M) state, there is never going to be a bus write request (BusUpgr) from another processor as it does not have the cache block. With a write request from another processor that doesn't have the block (BusRdX), the block changes its state to invalid (I) as another processor is writing to the block and hence will have the ownership for that block.

While a cache block is in the modified state, there is no possibility of a BusUpgr request from any other processor as none of them will have the block. By the definition of the modified (M) state, only that processor has the block, rest all are invalidated and hence cannot initiate a BusUpgr request.

While in the Owner (O) state and there is a snooped read request (BusRd), the block remains in the same state while flushing (Flush) the data for the other processor to read from it. With a snooped write request (BusRdX), the block changes state to invalid (I) along with flushing (Flush) the data as another processor is writing to it, thereby losing its ownership on that block. Whenever another processor tries to access that block, instead of going to the memory to access it, the processor takes it from other cache which already has that block in the owned (O) state. With a BusUpgr, it just changes the state from owner (O) to invalid (I).^[3]

Comparison to MSI Protocol

The obvious difference between the MSI protocol and the MOSI protocol, also known as the Berkeley protocol^[4] is the presence of an extra state (owned) in MOSI in addition to having just a modified (M) state.

In the MSI protocol, whenever there is a read miss request to block which is in the modified (M) state, it writes back to the main memory while changing the status of the block to shared (S). But in the case of MOSI protocol, where we have an additional state (owner), whenever another processor requests for a read operation, the block changes from modified to the owned (O) state and so retains the dirty block of cache, thereby removing the need to write back to the main memory immediately.

This deferral can save bus traffic and main memory writes in certain sequences of transactions. Consider for example if a cache is M for processor 1, then processor 2 reads from it, and then processor 1 writes again to it. In MSI, the MS transition of processor 1 from the read leads to one memory write, and then the SM transition leads to a BusUpgr. On MOSI, the MO transition generates no traffic, and the OM transition also generates one BusUpgr as before. MOSI therefore dispensed the initial memory write back and associated bus traffic which MSI would do.

Comparison to MESI Protocol

Both MESI (also known as Illinois)^[4] and MOSI protocols, are extensions of the MSI protocol to improve different functionalities. MOSI focuses on reducing write backs and MESI attempts to reduce the number of bus transactions required after a read and write request from another processor. The exclusive (E) state in MESI protocol implies that the cache block is valid, clean (same value as in the main memory) and cached only in one cache whereas the owned (O) state in MOSI protocol implies that the cache block is valid, potentially dirty, writable and could be present in more than one cache (all caches have the same value).

Related Research Articles

Cache (computing) Data storage that enables faster access

In computing, a cache is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewhere. A cache hit occurs when the requested data can be found in a cache, while a cache miss occurs when it cannot. Cache hits are served by reading data from the cache, which is faster than recomputing a result or reading from a slower data store; thus, the more requests that can be served from the cache, the faster the system performs.

Direct memory access (DMA) is a feature of computer systems that allows certain hardware subsystems to access main system memory independently of the central processing unit (CPU).

In computer architecture, cache coherence is the uniformity of shared resource data that ends up stored in multiple local caches. When clients in a system maintain caches of a common memory resource, problems may arise with incoherent data, which is particularly the case with CPUs in a multiprocessing system.

The MESI protocol is an Invalidate-based cache coherence protocol, and is one of the most common protocols that support write-back caches. It is also known as the Illinois protocol. Write back caches can save a lot on bandwidth that is generally wasted on a write through cache. There is always a dirty state present in write back caches that indicates that the data in the cache is different from that in main memory. The Illinois Protocol requires a cache to cache transfer on a miss if the block resides in another cache. This protocol reduces the number of main memory transactions with respect to the MSI protocol. This marks a significant improvement in performance.

In computer science, consistency models are used in distributed systems like distributed shared memory systems or distributed data stores. The system is said to support a given model if operations on memory follow specific rules. The data consistency model specifies a contract between programmer and system, wherein the system guarantees that if the programmer follows the rules, memory will be consistent and the results of reading, writing, or updating memory will be predictable. This is different from coherence, which occurs in systems that are cached or cache-less, and is consistency of data with respect to all processors. Coherence deals with maintaining a global order in which writes to a single location or single variable are seen by all processors. Consistency deals with the ordering of operations to multiple locations with respect to all processors.

Bus snooping or bus sniffing is a scheme by which a coherency controller (snooper) in a cache monitors or snoops the bus transactions, and its goal is to maintain a cache coherency in distributed shared memory systems. A cache containing a coherency controller (snooper) is called a snoopy cache. This scheme was introduced by Ravishankar and Goodman in 1983.

Memory coherence is an issue that affects the design of computer systems in which two or more processors or cores share a common area of memory.

In computer science, distributed shared memory (DSM) is a form of memory architecture where physically separated memories can be addressed as a single shared address space. The term "shared" does not mean that there is a single centralized memory, but that the address space is shared—i.e., the same physical address on two processors refers to the same location in memory. Distributed global address space (DGAS), is a similar term for a wide class of software and hardware implementations, in which each node of a cluster has access to shared memory in addition to each node's private memory.

A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, which stores copies of the data from frequently used main memory locations. Most CPUs have a hierarchy of multiple cache levels, with different instruction-specific and data-specific caches at level 1.

In computing, the MSI protocol - a basic cache-coherence protocol - operates in multiprocessor systems. As with other cache coherency protocols, the letters of the protocol name identify the possible states in which a cache line can be.

(For a detailed description see Cache coherency protocols )

In computing, a page cache, sometimes also called disk cache, is a transparent cache for the pages originating from a secondary storage device such as a hard disk drive (HDD) or a solid-state drive (SSD). The operating system keeps a page cache in otherwise unused portions of the main memory (RAM), resulting in quicker access to the contents of cached pages and overall performance improvements. A page cache is implemented in kernels with the paging memory management, and is mostly transparent to applications.

In cache coherency protocol literature, Write-Once was the first MESI protocol defined. It has the optimization of executing write-through on the first write and a write-back on all subsequent writes, reducing the overall bus traffic in consecutive writes to the computer memory. It was first described by James R. Goodman in (1983). Cache coherence protocols are an important issue in Symmetric multiprocessing systems, where each CPU maintains a cache of the memory.

The Firefly cache coherence protocol is the schema used in the DEC Firefly multiprocessor workstation, developed by DEC Systems Research Center. This protocol is a 3 State Write Update Cache Coherence Protocol. Unlike the Dragon protocol, the Firefly protocol updates the Main Memory as well as the Local caches on Write Update Bus Transition. Thus the Shared Clean and Shared Modified States present in case of Dragon Protocol, are not distinguished between in case of Firefly Protocol.

The Dragon Protocol is an update based cache coherence protocol used in multi-processor systems. Write propagation is performed by directly updating all the cached values across multiple processors. Update based protocols such as the Dragon protocol perform efficiently when a write to a cache block is followed by several reads made by other processors, since the updated cache block is readily available across caches associated with all the processors.

libtorrent is an open-source implementation of the BitTorrent protocol. It is written in and has its main library interface in C++. Its most notable features are support for Mainline DHT, IPv6, HTTP seeds and μTorrent's peer exchange. libtorrent uses Boost, specifically Boost.Asio to gain its platform independence. It is known to build on Windows and most Unix-like operating systems.

The MERSI protocol is a cache coherency and memory coherence protocol used by the PowerPC G4. The protocol consists of five states, Modified (M), Exclusive (E), Read Only or Recent (R), Shared (S) and Invalid (I). The M, E, S and I states are the same as in the MESI protocol. The R state is similar to the E state in that it is constrained to be the only clean, valid, copy of that data in the computer system. Unlike the E state, the processor is required to initially request ownership of the cache line in the R state before the processor may modify the cache line and transition to the M state. In both the MESI and MERSI protocols, the transition from the E to M is silent.

The MESIF protocol is a cache coherency and memory coherence protocol developed by Intel for cache coherent non-uniform memory architectures. The protocol consists of five states, Modified (M), Exclusive (E), Shared (S), Invalid (I) and Forward (F).

Directory-based coherence is a mechanism to handle Cache coherence problem in Distributed shared memory (DSM) a.k.a. Non-Uniform Memory Access (NUMA). Another popular way is to use a special type of computer bus between all the nodes as a "shared bus". Directory-based coherence uses a special directory to serve instead of the shared bus in the bus-based coherence protocols. Both of these designs use the corresponding medium as tool to facilitate the communication between different nodes, and to guarantee that the coherence protocol is working properly along all the communicating nodes. In directory based cache coherence, this is done by using this directory to keep track of the status of all cache blocks, the status of each block includes in which cache coherence "state" that block is, and which nodes are sharing that block at that time, which can be used to eliminate the need to broadcast all the signals to all nodes, and only send it to the nodes that are interested in this single block.

Examples of coherency protocols for cache memory are listed here. For simplicity, all "miss" Read and Write status transactions which obviously come from state "I", in the diagrams are not shown. They are shown directly on the new state. Many of the following protocols have only historical value. At the moment the main protocols used are the R-MESI type / MESIF protocols and the HRT-ST-MESI or a subset or an extension of these.

References

↑ Sorin, Daniel; Hill, Mark; Wood, David (2011). A Primer on Memory Consistency and Cache Coherence. Morgan & Claypool. pp. 119–122. ISBN 9781608455645.
↑ Solihin, Yan (2016). Fundamentals of parallel multi core architecture. RC Press, Taylor & Francis Group. ISBN 9781482211184.
1 2 "An Evaluation of Snoop-Based Cache Coherence Protocols" (PDF).
1 2 "Analysis and Comparison of Cache Coherence Protocols for a Packet-Switched Multiprocessor". IEEE Transactions on Computers. 38. doi:10.1109/12.30868.