Stanford DASH

Last updated

Stanford DASH was a cache coherent multiprocessor developed in the late 1980s by a group led by Anoop Gupta, John L. Hennessy, Mark Horowitz, and Monica S. Lam at Stanford University. [1] It was based on adding a pair of directory boards designed at Stanford to up to 16 SGI IRIS 4D Power Series machines and then cabling the systems in a mesh topology using a Stanford-modified version of the Torus Routing Chip. [2] The boards designed at Stanford implemented a directory-based cache coherence protocol [3] allowing Stanford DASH to support distributed shared memory for up to 64 processors. Stanford DASH was also notable for both supporting and helping to formalize weak memory consistency models, including release consistency. [4] Because Stanford DASH was the first operational machine to include scalable cache coherence, [5] it influenced subsequent computer science research as well as the commercially available SGI Origin 2000. Stanford DASH is included in the 25th anniversary retrospective of selected papers from the International Symposium on Computer Architecture [6] and several computer science books, [7] [8] [9] [10] [11] has been simulated by the University of Edinburgh, [12] and is used as a case study in contemporary computer science classes. [13] [14]

Related Research Articles

Non-uniform memory access Computer memory design used in multiprocessing

Non-uniform memory access (NUMA) is a computer memory design used in multiprocessing, where the memory access time depends on the memory location relative to the processor. Under NUMA, a processor can access its own local memory faster than non-local memory. The benefits of NUMA are limited to particular workloads, notably on servers where the data is often associated strongly with certain tasks or users.

Symmetric multiprocessing The equal sharing of all resources by multiple identical processors

Symmetric multiprocessing or shared-memory multiprocessing (SMP) involves a multiprocessor computer hardware and software architecture where two or more identical processors are connected to a single, shared main memory, have full access to all input and output devices, and are controlled by a single operating system instance that treats all processors equally, reserving none for special purposes. Most multiprocessor systems today use an SMP architecture. In the case of multi-core processors, the SMP architecture applies to the cores, treating them as separate processors.

Scalable Coherent Interface High-speed interconnect standard for shared memory multiprocessing and message passing

The Scalable Coherent Interface or Scalable Coherent Interconnect (SCI), is a high-speed interconnect standard for shared memory multiprocessing and message passing. The goal was to scale well, provide system-wide memory coherence and a simple interface; i.e. a standard to replace existing buses in multiprocessor systems with one with no inherent scalability and performance limitations.

Cache coherence Computer architecture term concerning shared resource data

In computer architecture, cache coherence is the uniformity of shared resource data that ends up stored in multiple local caches. When clients in a system maintain caches of a common memory resource, problems may arise with incoherent data, which is particularly the case with CPUs in a multiprocessing system.

In computer science, consistency models are used in distributed systems like distributed shared memory systems or distributed data stores. The system is said to support a given model if operations on memory follow specific rules. The data consistency model specifies a contract between programmer and system, wherein the system guarantees that if the programmer follows the rules, memory will be consistent and the results of reading, writing, or updating memory will be predictable. This is different from coherence, which occurs in systems that are cached or cache-less, and is consistency of data with respect to all processors. Coherence deals with maintaining a global order in which writes to a single location or single variable are seen by all processors. Consistency deals with the ordering of operations to multiple locations with respect to all processors.

Memory coherence is an issue that affects the design of computer systems in which two or more processors or cores share a common area of memory.

In computer science, distributed shared memory (DSM) is a form of memory architecture where physically separated memories can be addressed as a single shared address space. The term "shared" does not mean that there is a single centralized memory, but that the address space is shared—i.e., the same physical address on two processors refers to the same location in memory. Distributed global address space (DGAS), is a similar term for a wide class of software and hardware implementations, in which each node of a cluster has access to shared memory in addition to each node's private memory.

John L. Hennessy American computer scientist

John Leroy Hennessy is an American computer scientist, academician and businessman who serves as Chairman of Alphabet Inc. Hennessy is one of the founders of MIPS Computer Systems Inc. as well as Atheros and served as the tenth President of Stanford University. Hennessy announced that he would step down in the summer of 2016. He was succeeded as President by Marc Tessier-Lavigne. Marc Andreessen called him "the godfather of Silicon Valley."

Release consistency is one of the synchronization-based consistency models used in concurrent programming.

Cache only memory architecture (COMA) is a computer memory organization for use in multiprocessors in which the local memories at each node are used as cache. This is in contrast to using the local memories as actual main memory, as in NUMA organizations.

The International Symposium on Computer Architecture (ISCA) is an annual academic conference on computer architecture, generally viewed as the top-tier in the field. Association for Computing Machinery's Special Interest Group on Computer Architecture and Institute of Electrical and Electronics Engineers Computer Society are technical sponsors.

The Firefly cache coherence protocol is the schema used in the DEC Firefly multiprocessor workstation, developed by DEC Systems Research Center. This protocol is a 3 State Write Update Cache Coherence Protocol. Unlike the Dragon protocol, the Firefly protocol updates the Main Memory as well as the Local caches on Write Update Bus Transition. Thus the Shared Clean and Shared Modified States present in case of Dragon Protocol, are not distinguished between in case of Firefly Protocol.

James R. Goodman

James Richard "Jim" Goodman is a professor of computer science at the University of Auckland in Auckland, New Zealand, and emeritus professor at the University of Wisconsin–Madison.

SGI Origin 2000 Series of server computers

The SGI Origin 2000 is a family of mid-range and high-end server computers developed and manufactured by Silicon Graphics (SGI). They were introduced in 1996 to succeed the SGI Challenge and POWER Challenge. At the time of introduction, these ran the IRIX operating system, originally version 6.4 and later, 6.5. A variant of the Origin 2000 with graphics capability is known as the Onyx2. An entry-level variant based on the same architecture but with a different hardware implementation is known as the Origin 200. The Origin 2000 was succeeded by the Origin 3000 in July 2000, and was discontinued on June 30, 2002.

Anna R. Karlin is an American computer scientist, the Microsoft Professor of Computer Science & Engineering at the University of Washington.

Processor Consistency is one of the consistency models used in the domain of concurrent computing.

In computer engineering, directory-based cache coherence is a type of cache coherence mechanism, where directories are used to manage caches in place of snoopy methods due to their scalability. Bus snooping methods scale poorly due to the use of broadcasting. These methods can be used to target both performance and scalability of directory systems.

Cache hierarchy Memory hierarchy concept applied to CPU caches with multiple levels

Cache hierarchy, or multi-level caches, refers to a memory architecture that uses a hierarchy of memory stores based on varying access speeds to cache data. Highly requested data is cached in high-speed access memory stores, allowing swifter access by central processing unit (CPU) cores.

Directory-based coherence is a mechanism to handle Cache coherence problem in Distributed shared memory (DSM) a.k.a. Non-Uniform Memory Access (NUMA). Another popular way is to use a special type of computer bus between all the nodes as a "shared bus". Directory-based coherence uses a special directory to serve instead of the shared bus in the bus-based coherence protocols. Both of these designs use the corresponding medium as tool to facilitate the communication between different nodes, and to guarantee that the coherence protocol is working properly along all the communicating nodes. In directory based cache coherence, this is done by using this directory to keep track of the status of all cache blocks, the status of each block includes in which cache coherence "state" that block is, and which nodes are sharing that block at that time, which can be used to eliminate the need to broadcast all the signals to all nodes, and only send it to the nodes that are interested in this single block.

The Association for Computing Machinery SIGARCH Maurice Wilkes Award is given annually for outstanding contribution to computer architecture within the last 20 years. The award is named after Maurice Wilkes, a computer scientist credited with several important developments in computing such as microprogramming. The award is presented at the International Symposium on Computer Architecture. Prior recipients include:

References

  1. Lenoski, Daniel; Laudon, James; Gharachorloo, Kourosh; Weber, Wolf-Dietrich; Gupta, Anoop; Hennessy, John; Horowitz, Mark; Lam, Monica S. (1992). "The Stanford Dash Multiprocessor". Computer. 25 (3): 63–79. doi:10.1109/2.121510. S2CID   9731523.
  2. Dally, William J.; Seitz, Charles L. (1986). "The torus routing chip". Distributed Computing. 1 (4): 187–196. doi:10.1007/BF01660031. S2CID   10500442.
  3. Lenoski, Daniel; Laudon, James; Gharachorloo, Kourosh; Gupta, Anoop; Hennessy, John (1990). "The directory-based cache coherence protocol for the DASH multiprocessor". Proceedings of the 17th Annual International symposium on Computer Architecture. ACM. pp. 148–159. doi:10.1145/325164.325132.
  4. Gharachorloo, Kourosh; Lenoski, Daniel; Laudon, James; Gibbons, Phillip; Gupta, Anoop; Hennessy, John (1990). "Memory consistency and event ordering in scalable shared-memory multiprocessors". Proceedings of the 17th annual international symposium on Computer Architecture. pp. 15–26. doi:10.1145/325096.325102.
  5. Hennessy, John; Patterson, David (2003). Computer Architecture: A Quantitative Approach (Third ed.). Morgan Kaufmann. pp.  655. ISBN   978-1-558-60596-1.
  6. Lenoski, Daniel; Laudon, James; Joe, Truman; Nakahira, David; Stevens, Luis; Gupta, Anoop; Hennessy, John (1998). "The DASH prototype: Implementation and Performance". In Sohi, Gurindar (ed.). 25 years of the International Symposia on Computer Architecture (Selected Papers). pp. 418–429.
  7. Suzuki, Norihisa (1992). Shared Memory Multiprocessing. The MIT Press. pp. 391–406. ISBN   978-0-262-19322-1.
  8. Loshin, David (1994). High Performance Computing Demystified . Academic Press. pp.  80, 91. ISBN   978-0-124-55825-0.
  9. Parhami, Behrooz (1999). Introduction to Parallel Processing: Algorithms and Architectures. Springer. pp. 450–451. ISBN   978-0-306-45970-2.
  10. Hill, Mark; Jouppi, Norman; Sohi, Gurindar (2000). Readings in Computer Architecture. Morgan Kaufmann. pp. 583–599. ISBN   978-1-55860-539-8.
  11. Dandamudi, Sivarama (2003). Hierarchical Scheduling in Parallel and Cluster Systems . Series in Computer Science. Springer US. pp.  21–22. doi:10.1007/978-1-4615-0133-6. ISBN   978-1-4613-4938-9. S2CID   46434929.
  12. Institute for Computing Systems Architecture, School of Informatics, University of Edinburgh "Stanford DASH Architecture: Cluster Simulation Model", Retrieved on 3 November 2015.
  13. Carl Olson and Mattan Erez, The University of Texas at Austin (2007) "The Stanford Dash Multiprocessor", Retrieved on 3 November 2015.
  14. Meng Zhang, Duke University (2010) "The Stanford Dash Multiprocessor", Retrieved on 3 November 2015.