Designer | Cray |
---|---|
Bits | 64-bit |
Introduced | 2005 |
Version | 3rd generation of Tera MTA |
Endianness | Big-endian |
Predecessor | Cray MTA-2 |
Successor | Cray XMT2 |
Registers | |
32 general-purpose per stream (4096 per CPU) 8 target per stream (1024 per CPU) |
Cray XMT (Cray eXtreme MultiThreading, [1] codenamed Eldorado [2] ) is a scalable multithreaded shared memory supercomputer architecture by Cray, based on the third generation of the Tera MTA architecture, targeted at large graph problems (e.g. semantic databases, big data, pattern matching). [3] [4] [5] Presented in 2005, it supersedes the earlier unsuccessful Cray MTA-2. It uses the Threadstorm3 CPUs inside Cray XT3 blades. Designed to make use of commodity parts and existing subsystems for other commercial systems, it alleviated the shortcomings of Cray MTA-2's high cost of fully custom manufacture and support. [2] It brought various substantial improvements over Cray MTA-2, most notably nearly tripling the peak performance, and vastly increased maximum CPU count to 8,192 and maximum memory to 128 TB, with a data TLB of maximal 512 TB. [2] [3]
Cray XMT uses a scrambled [3] content-addressable memory [6] model on DDR1 ECC modules to implicitly load-balance memory access across the whole shared global address space of the system. [5] Use of 4 additional Extended Memory Semantics bits (full/empty, forwarding and 2 trap bits) per 64-bit memory word enables lightweight, fine-grained synchronization on all memory. [7] There are no hardware interrupts and hardware threads are allocated by an instruction, not the OS. [5] [7]
Front-end (login, I/O, and other service nodes, utilizing AMD Opteron processors and running SLES Linux) and back-end (compute nodes, utilizing Threadstorm3 processors and running MTK, a simple BSD Unix-based microkernel [3] ) communicate through the LUC (Lightweight User Communication) interface, a RPC-style bidirectional client/server interface. [1] [5]
General information | |
---|---|
Launched | 2005 |
Discontinued | 2011 |
Designed by | Cray |
Performance | |
Max. CPU clock rate | 500 MHz |
HyperTransport speeds | to 300 GT/s |
Architecture and classification | |
Instruction set | MTA ISA |
Physical specifications | |
Cores |
|
Socket(s) | |
History | |
Predecessor | Cray MTA-2 CPU |
Successor | Threadstorm4 |
Threadstorm3 (referred to as "MT processor" [2] and Threadstorm before XMT2 [8] ) is a 64-bit single-core VLIW barrel processor (compatible with 940-pin Socket 940 used by AMD Opteron processors) with 128 hardware streams, onto each a software thread can be mapped (effectively creating 128 hardware threads per CPU), running at 500 MHz and using the MTA instruction set or a superset of it. [7] [9] [nb 1] It has a 128KB, 4-way associative data buffer. Each Threadstorm3 has 128 separate register sets and program counters (one per each stream), which are fairly [10] fully context-switched at each cycle. [5] Its estimated peak performance is 1.5 GFLOPS. It has 3 functional units (memory, fused multiply-add and control), which receive operations from the same MTA instruction and operate within the same cycle. [7] Each stream has 32 general-purpose registers, 8 target registers and a status word, containing the program counter. [6] High-level control of job allocation across threads is not possible. [5] [nb 2] Due to the MTA's pipeline length of 21, each stream is selected to execute instructions again no prior than 21 cycles later. [11] The TDP of the processor package is 30 W. [12]
Due to their thread-level context switch at each cycle, performance of Threadstorm CPUs is not constrained by memory access time. In a simplified model, at each clock cycle an instruction from one of the threads is executed and another memory request is queued with the understanding that by the time the next round of execution is ready the requested data has arrived. [13] This is contrary to many conventional architectures which stall on memory access. The architecture excels in data walking schemes where subsequent memory access cannot be easily predicted and thus wouldn't be well suited to a conventional cache model. [1] Threadstorm's principal architect was Burton J. Smith. [1]
Designer | Cray |
---|---|
Bits | 64-bits |
Introduced | 2011 |
Version | 4th generation of Tera MTA |
Endianness | Big-endian |
Predecessor | Cray XMT |
Registers | |
32 general-purpose per stream (4096 per CPU) 8 target per stream (1024 per CPU) |
Cray XMT2 [3] (also "next-generation XMT" [8] or simply XMT [6] ) is a scalable multithreaded shared memory supercomputer by Cray, based on the fourth generation of the Tera MTA architecture. [5] Presented in 2011, it supersedes Cray XMT, which had issues with memory hotspots. [8] It uses Threadstorm4 CPUs inside Cray XT5 blades and increases memory capacity eightfold to 512 TB and memory bandwidth trifold (300 MHz instead 200 MHz) compared to XMT by using twice the memory modules per node and DDR2. [6] [8] It introduces the Node Pair Link inter-Threadstorm connect, as well as memory-only nodes, with Threadstorm4 packages having their CPU and HyperTransport 1.x components disabled. [5] The underlying scrambled content-addressable memory model has been inherited from XMT. XMT2 uses 2 additional EMS bits (full/empty and extended) instead of 4 as in XMT.
General information | |
---|---|
Launched | 2011 |
Discontinued | 2015? |
Designed by | Cray |
Performance | |
Max. CPU clock rate | 500 MHz |
HyperTransport speeds | to 400 GT/s |
Architecture and classification | |
Instruction set | MTA ISA |
Physical specifications | |
Cores |
|
Socket(s) | |
History | |
Predecessor | Threadstorm3 |
Threadstorm4 (also "Threadstorm IV" [1] and "Threadstorm 4.0" [nb 3] ) is a 64-bit single-core VLIW barrel processor (compatible with 1207-pin Socket F used by AMD Opteron processors) with 128 hardware streams, very similar to its predecessor, Threadstorm3. It features an improved, DDR2-capable memory controller and additional 8 trap registers per stream. Cray intentionally decided against a DDR3 controller, citing the reusing of existing Cray XT5 infrastructure [nb 4] and a shorter burst length than DDR3. [nb 5] Though the longer burst length could be compensated by higher speeds of DDR3, it would also require more power, which Cray engineers wanted to avoid. [8]
After launching XMT, Cray researched a possible multi-core variant of the Threadstorm3, dubbed Scorpio. Most of Threadstorm3's features would be retained, including the multiplexing of many hardware streams onto an execution pipeline and the implementation of additional state bits for every 64-bit memory word. Cray later abandoned Scorpio, and the project yielded no manufactured chip. [3]
Development on Threadstorm4, as well as the whole MTA architecture, ended silently after XMT2, probably due to competition from commodity processors such as Intel's Xeon [14] and possibly Xeon Phi, even though Cray never officially discontinued neither XMT nor XMT2. As of 2020, Cray has removed all customer documentation on both XMT and XMT2 from its online catalogue.
Cray XMT2 was bought by several federal laboratories and academic facilities, as well as some commercial HPC clients: e.g. CSCS (2 TB global memory with 64 Threadstorm4 CPUs), [15] Noblis CAHPC. [16] Most of XMT and XMT2-based systems have been decommissioned by 2020.
x86 is a family of instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was introduced in 1978 as a fully 16-bit extension of Intel's 8-bit 8080 microprocessor, with memory segmentation as a solution for addressing more memory than can be covered by a plain 16-bit address. The term "x86" came into being because the names of several successors to Intel's 8086 processor end in "86", including the 80186, 80286, 80386 and 80486 processors.
Multiprocessing is the use of two or more central processing units (CPUs) within a single computer system. The term also refers to the ability of a system to support more than one processor or the ability to allocate tasks between them. There are many variations on this basic theme, and the definition of multiprocessing can vary with context, mostly as a function of how CPUs are defined.
Opteron is AMD's x86 former server and workstation processor line, and was the first processor which supported the AMD64 instruction set architecture. It was released on April 22, 2003, with the SledgeHammer core (K8) and was intended to compete in the server and workstation markets, particularly in the same segment as the Intel Xeon processor. Processors based on the AMD K10 microarchitecture were announced on September 10, 2007, featuring a new quad-core configuration. The most-recently released Opteron CPUs are the Piledriver-based Opteron 4300 and 6300 series processors, codenamed "Seoul" and "Abu Dhabi" respectively.
Cray Inc., a subsidiary of Hewlett Packard Enterprise, is an American supercomputer manufacturer headquartered in Seattle, Washington. It also manufactures systems for data storage and analytics. Several Cray supercomputer systems are listed in the TOP500, which ranks the most powerful supercomputers in the world.
Xeon is a brand of x86 microprocessors designed, manufactured, and marketed by Intel, targeted at the non-consumer workstation, server, and embedded system markets. It was introduced in June 1998. Xeon processors are based on the same architecture as regular desktop-grade CPUs, but have advanced features such as support for ECC memory, higher core counts, more PCI Express lanes, support for larger amounts of RAM, larger cache memory and extra provision for enterprise-grade reliability, availability and serviceability (RAS) features responsible for handling hardware exceptions through the Machine Check Architecture. They are often capable of safely continuing execution where a normal processor cannot due to these extra RAS features, depending on the type and severity of the machine-check exception (MCE). Some also support multi-socket systems with two, four, or eight sockets through use of the Quick Path Interconnect (QPI) bus.
The Heterogeneous Element Processor (HEP) was introduced by Denelcor, Inc. in 1982. The HEP's architect was Burton Smith. The machine was designed to solve fluid dynamics problems for the Ballistic Research Laboratory. A HEP system, as the name implies, was pieced together from many heterogeneous components -- processors, data memory modules, and I/O modules. The components were connected via a switched network.
A barrel processor is a CPU that switches between threads of execution on every cycle. This CPU design technique is also known as "interleaved" or "fine-grained" temporal multithreading. Unlike simultaneous multithreading in modern superscalar architectures, it generally does not allow execution of multiple instructions in one cycle.
In the fields of digital electronics and computer hardware, multi-channel memory architecture is a technology that increases the data transfer rate between the DRAM memory and the memory controller by adding more channels of communication between them. Theoretically, this multiplies the data rate by exactly the number of channels present. Dual-channel memory employs two channels. The technique goes back as far as the 1960s having been used in IBM System/360 Model 91 and in CDC 6600.
The Intel Core microarchitecture is a multi-core processor microarchitecture unveiled by Intel in Q1 2006. It is based on the Yonah processor design and can be considered an iteration of the P6 microarchitecture introduced in 1995 with Pentium Pro. High power consumption and heat intensity, the resulting inability to effectively increase clock rate, and other shortcomings such as an inefficient pipeline were the primary reasons why Intel abandoned the NetBurst microarchitecture and switched to a different architectural design, delivering high efficiency through a small pipeline rather than high clock rates. The Core microarchitecture initially did not reach the clock rates of the NetBurst microarchitecture, even after moving to 45 nm lithography. However after many generations of successor microarchitectures which used Core as their basis, Intel managed to eventually surpass the clock rates of Netburst with the Devil's Canyon microarchitecture reaching a base frequency of 4 GHz and a maximum tested frequency of 4.4 GHz using 22 nm lithography.
The AMD Family 10h, or K10, is a microprocessor microarchitecture by AMD based on the K8 microarchitecture. Though there were once reports that the K10 had been canceled, the first third-generation Opteron products for servers were launched on September 10, 2007, with the Phenom processors for desktops following and launching on November 11, 2007 as the immediate successors to the K8 series of processors.
A multi-core processor is a computer processor on a single integrated circuit with two or more separate processing units, called cores, each of which reads and executes program instructions. The instructions are ordinary CPU instructions but the single processor can run instructions on separate cores at the same time, increasing overall speed for programs that support multithreading or other parallel computing techniques. Manufacturers typically integrate the cores onto a single integrated circuit die or onto multiple dies in a single chip package. The microprocessors currently used in almost all personal computers are multi-core.
Nehalem is the codename for Intel's 45 nm microarchitecture released in November 2008. It was used in the first-generation of the Intel Core i5 and i7 processors, and succeeds the older Core microarchitecture used on Core 2 processors. The term "Nehalem" comes from the Nehalem River.
The Cray XT4 is an updated version of the Cray XT3 supercomputer. It was released on November 18, 2006. It includes an updated version of the SeaStar interconnect router called SeaStar2, processor sockets for Socket AM2 Opteron processors, and 240-pin unbuffered DDR2 memory. The XT4 also includes support for FPGA coprocessors that plug into riser cards in the Service and IO blades. The interconnect, cabinet, system software and programming environment remain unchanged from the Cray XT3. It was superseded in 2007 by the Cray XT5.
The AMD Bulldozer Family 15h is a microprocessor microarchitecture for the FX and Opteron line of processors, developed by AMD for the desktop and server markets. Bulldozer is the codename for this family of microarchitectures. It was released on October 12, 2011, as the successor to the K10 microarchitecture.
The National Center for Computational Sciences (NCCS) is a United States Department of Energy (DOE) Leadership Computing Facility that houses the Oak Ridge Leadership Computing Facility (OLCF), a DOE Office of Science User Facility charged with helping researchers solve challenging scientific problems of global interest with a combination of leading high-performance computing (HPC) resources and international expertise in scientific computing.
Jaguar or OLCF-2 was a petascale supercomputer built by Cray at Oak Ridge National Laboratory (ORNL) in Oak Ridge, Tennessee. The massively parallel Jaguar had a peak performance of just over 1,750 teraFLOPS. It had 224,256 x86-based AMD Opteron processor cores, and operated with a version of Linux called the Cray Linux Environment. Jaguar was a Cray XT5 system, a development from the Cray XT4 supercomputer.
The Cray MTA, formerly known as the Tera MTA, is a supercomputer architecture based on thousands of independent threads, fine-grain communication and synchronization between threads, and latency tolerance for irregular computations.
Skylake is the codename used by Intel for a processor microarchitecture that was launched in August 2015 succeeding the Broadwell microarchitecture. Skylake is a microarchitecture redesign using the same 14 nm manufacturing process technology as its predecessor, serving as a "tock" in Intel's "tick–tock" manufacturing and design model. According to Intel, the redesign brings greater CPU and GPU performance and reduced power consumption. Skylake CPUs share their microarchitecture with Kaby Lake, Coffee Lake, Cannon Lake, Whiskey Lake, and Comet Lake CPUs.
Xeon Phi is a series of x86 manycore processors designed and made by Intel. It is intended for use in supercomputers, servers, and high-end workstations. Its architecture allows use of standard programming languages and application programming interfaces (APIs) such as OpenMP.
Steve Scott: You can do it just great with a Xeon. We are not planning on doing another ThreadStorm processor. But it does take some software technology that comes out of the ThreadStorm legacy.