Cache-only memory architecture

Last updated February 07, 2025

Cache only memory architecture (COMA) is a computer memory organization for use in multiprocessors in which the local memories (typically DRAM) at each node are used as cache. This is in contrast to using the local memories as actual main memory, as in NUMA organizations.

In NUMA, each address in the global address space is typically assigned a fixed home node. When processors access some data, a copy is made in their local cache, but space remains allocated in the home node. Instead, with COMA, there is no home. An access from a remote node may cause that data to migrate. Compared to NUMA, this reduces the number of redundant copies and may allow more efficient use of the memory resources. On the other hand, it raises problems of how to find a particular data (there is no longer a home node) and what to do if a local memory fills up (migrating some data into the local memory then needs to evict some other data, which doesn't have a home to go to). Hardware memory coherence mechanisms are typically used to implement the migration.

A huge body of research has explored these issues. Various forms of directories, policies for maintaining free space in the local memories, migration policies, and policies for read-only copies have been developed. Hybrid NUMA-COMA organizations have also been proposed, such as Reactive NUMA, which allows pages to start in NUMA mode and switch to COMA mode if appropriate and is implemented in the Sun Microsystems's WildFire.^[1]^[2] A software-based Hybrid NUMA-COMA implementation was proposed and implemented by ScaleMP,^[3] allowing for the creation of a shared-memory multiprocessor system out of a cluster of commodity nodes.

References

↑ WildFire: A Scalable Path for SMPs (PDF).
↑ Noordergraaf, Lisa; van der Pas, Ruud (1999). "Performance experiences on Sun's Wildfire prototype". Proceedings of the 1999 ACM/IEEE conference on Supercomputing. pp. 38–es. CiteSeerX 10.1.1.22.6994 . doi:10.1145/331532.331570. ISBN 1581130910. S2CID 17739.
↑ "United States Patent: Cluster-based operating system-agnostic virtual computing system". Archived from the original on 2019-02-24. Retrieved 2014-04-10.

Dahlgren, F.; Torrellas, J. (June 1999). "Cache-only memory architectures". Computer. 32 (6): 72–79. CiteSeerX 10.1.1.34.7679 . doi:10.1109/2.769448.
Hagersten, E.; Landin, A.; Haridi, S. (September 1992). "DDM-a cache-only memory architecture". Computer. 25 (9): 44–54. doi:10.1109/2.156381.
Falsafi, Babak; Wood, David A. (June 1997). "Reactive NUMA: a design for unifying S-COMA and CC-NUMA". Proceedings of the 24th Annual International Symposium on Computer Architecture (ISCA). pp. 229–40.

This computer-storage-related article is a stub. You can help Wikipedia by expanding it.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] WildFire: A Scalable Path for SMPs (PDF).

[2] Noordergraaf, Lisa; van der Pas, Ruud (1999). "Performance experiences on Sun's Wildfire prototype". Proceedings of the 1999 ACM/IEEE conference on Supercomputing. pp. 38–es. CiteSeerX 10.1.1.22.6994 . doi:10.1145/331532.331570. ISBN 1581130910. S2CID 17739.

[3] "United States Patent: Cluster-based operating system-agnostic virtual computing system". Archived from the original on 2019-02-24. Retrieved 2014-04-10.

[1]

[2]

[3]

v t e Parallel computing
General	Distributed computing Parallel computing Massively parallel Cloud computing High-performance computing Multiprocessing Manycore processor GPGPU Computer network Systolic array
Levels	Bit Instruction Thread Task Data Memory Loop Pipeline
Multithreading	Temporal Simultaneous (SMT) Simultaneous and heterogenous Speculative (SpMT) Preemptive Cooperative Clustered multi-thread (CMT) Hardware scout
Theory	PRAM model PEM model Analysis of parallel algorithms Amdahl's law Gustafson's law Cost efficiency Karp–Flatt metric Slowdown Speedup
Elements	Process Thread Fiber Instruction window Array
Coordination	Multiprocessing Memory coherence Cache coherence Cache invalidation Barrier Synchronization Application checkpointing
Programming	Stream processing Dataflow programming Models Implicit parallelism Explicit parallelism Concurrency Non-blocking algorithm
Hardware	Flynn's taxonomy SISD SIMD Array processing (SIMT) Pipelined processing Associative processing MISD MIMD Dataflow architecture Pipelined processor Superscalar processor Vector processor Multiprocessor symmetric asymmetric Memory shared distributed distributed shared UMA NUMA COMA Massively parallel computer Computer cluster Beowulf cluster Grid computer Hardware acceleration
APIs	Ateji PX Boost Chapel HPX Charm++ Cilk Coarray Fortran CUDA Dryad C++ AMP Global Arrays GPUOpen MPI OpenMP OpenCL OpenHMPP OpenACC Parallel Extensions PVM pthreads RaftLib ROCm UPC TBB ZPL
Problems	Automatic parallelization Deadlock Deterministic algorithm Embarrassingly parallel Parallel slowdown Race condition Software lockout Scalability Starvation
Category: Parallel computing

Cache-only memory architecture

See also

References