IWarp

Last updated

iWarp was an experimental parallel supercomputer architecture developed as a joint project by Intel and Carnegie Mellon University. The project started in 1988, as a follow-up to CMU's previous WARP research project, in order to explore building an entire parallel-computing "node" in a single microprocessor, complete with memory and communications links. In this respect the iWarp is very similar to the INMOS transputer and nCUBE. [1]

Contents

Intel announced iWarp in 1989. The first iWarp prototype was delivered to Carnegie Mellon in summer of 1990, and in fall they received the first 64-cell production systems, followed by two more in 1991. With the creation of the Intel Supercomputing Systems Division in the summer of 1992, the iWarp was merged into the iPSC product line. Intel kept iWarp as a product but stopped actively marketing it. [2]

Each iWarp CPU included a 32-bit ALU with a 64-bit FPU running at 20 MHz. It was purely scalar and completed one instruction per cycle, so the performance was 20 MIPS or 20 megaflops for single precision and 10 MFLOPS for double. [3] [4] The communications were handled by a separate unit on the CPU that drove four serial channels at 40 MB/s, and included networking support in hardware that allowed for up to 20 virtual channels (similar to the system added to the INMOS T9000).

iWarp processors were combined onto boards along with memory, but unlike other systems Intel chose the faster, but more expensive, static RAM for use on the iWarp. Boards typically included four CPUs and anywhere from 512 kB to 4 MB of SRAM.

Another difference in the iWarp was that the systems were connected together as a n-by-m torus, instead of the more common hypercube. A typical system included 64 CPUs connected as an 8×8 torus, which could deliver 1.2 gigaflops peak.

George Cox was the lead architect of the iWarp project. Steven McGeady (later an Intel Vice-president and witness in the Microsoft antitrust case) wrote an innovative development environment that allowed software to be written for the array before it was completed. Each node of the array was represented by a different Sun workstation on a LAN, with the iWarp's unique inter-node communication protocol simulated over sockets. Unlike the chip-level simulator, which could not simulate a multi-node array, and which ran very slowly, this environment allowed in-depth development of array software to begin.

The production compiler for iWarp was a C and Fortran compiler based on the AT&T pcc compiler for UNIX, ported under contract for Intel by the Canadian firm HCR Corporation and then extensively modified and extended by Intel. [5] [6]

See also

Notes

  1. Encyclopedia of Parallel Computing, Padua, David (Ed.), 2011, ISBN   978-0-387-09765-7
  2. Thomas Gross and David R. O'Hallaron. iWarp: anatomy of a parallel computing system, MIT Press, Cambridge, MA, 1998.
  3. Shekhar Borkar, Robert Cohn, George Cox, Sha Gleason, and Thomas Gross. iWarp: an integrated solution of high-speed parallel computing, Proceedings of the 1988 ACM/IEEE conference on Supercomputing, p.330-339, November 12–
  4. Intel Corp. iWarp Microprocessor (Part Number 318153), Hillsboro, Oregon, 1991. Technical Information, Order Number 281006.
  5. Reinders, James R. (2011). "Warp and iWarp". In Padua, David (ed.). Encyclopedia of Parallel Computing. New York: Springer. p. 2158.
  6. Ali-Reza Adl-Tabatabai, Thomas Gross, Guei-Yuan Lueh and James Reinders. Modeling Instruction-Level Parallelism for Software Pipelining. In Proceedings of the IFIP WG10.3 Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism, Orlando, FL, pages 321-330.

Related Research Articles

<span class="mw-page-title-main">Supercomputer</span> Type of extremely powerful computer

A supercomputer is a computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instructions per second (MIPS). Since 2017, supercomputers have existed which can perform over 1017 FLOPS (a hundred quadrillion FLOPS, 100 petaFLOPS or 100 PFLOPS). For comparison, a desktop computer has performance in the range of hundreds of gigaFLOPS (1011) to tens of teraFLOPS (1013). Since November 2017, all of the world's fastest 500 supercomputers run on Linux-based operating systems. Additional research is being conducted in the United States, the European Union, Taiwan, Japan, and China to build faster, more powerful and technologically superior exascale supercomputers.

Very long instruction word (VLIW) refers to instruction set architectures designed to exploit instruction level parallelism (ILP). Whereas conventional central processing units mostly allow programs to specify instructions to execute in sequence only, a VLIW processor allows programs to explicitly specify instructions to execute in parallel. This design is intended to allow higher performance without the complexity inherent in some other designs.

<span class="mw-page-title-main">Transputer</span> Series of pioneering microprocessors from the 1980s

The transputer is a series of pioneering microprocessors from the 1980s, intended for parallel computing. To support this, each transputer had its own integrated memory and serial communication links to exchange data with other transputers. They were designed and produced by Inmos, a semiconductor company based in Bristol, United Kingdom.

nCUBE was a series of parallel computing computers from the company of the same name. Early generations of the hardware used a custom microprocessor. With its final generations of servers, nCUBE no longer designed custom microprocessors for machines, but used server-class chips manufactured by a third party in massively parallel hardware deployments, primarily for the purposes of on-demand video.

<span class="mw-page-title-main">Parallel computing</span> Programming paradigm in which many processes are executed simultaneously

Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different forms of parallel computing: bit-level, instruction-level, data, and task parallelism. Parallelism has long been employed in high-performance computing, but has gained broader interest due to the physical constraints preventing frequency scaling. As power consumption by computers has become a concern in recent years, parallel computing has become the dominant paradigm in computer architecture, mainly in the form of multi-core processors.

Reconfigurable computing is a computer architecture combining some of the flexibility of software with the high performance of hardware by processing with flexible hardware platforms like field-programmable gate arrays (FPGAs). The principal difference when compared to using ordinary microprocessors is the ability to add custom computational blocks using FPGAs. On the other hand, the main difference from custom hardware, i.e. application-specific integrated circuits (ASICs) is the possibility to adapt the hardware during runtime by "loading" a new circuit on the reconfigurable fabric, thus providing new computational blocks without the need to manufacture and add new chips to the existing system.

<span class="mw-page-title-main">Quadrics (company)</span>

Quadrics was a supercomputer company formed in 1996 as a joint venture between Alenia Spazio and the technical team from Meiko Scientific. They produced hardware and software for clustering commodity computer systems into massively parallel systems. Their highpoint was in June 2003 when six out of the ten fastest supercomputers in the world were based on Quadrics' interconnect. They officially closed on June 29, 2009.

In parallel computer architectures, a systolic array is a homogeneous network of tightly coupled data processing units (DPUs) called cells or nodes. Each node or DPU independently computes a partial result as a function of the data received from its upstream neighbours, stores the result within itself and passes it downstream. Systolic arrays were first used in Colossus, which was an early computer used to break German Lorenz ciphers during World War II. Due to the classified nature of Colossus, they were independently invented or rediscovered by H. T. Kung and Charles Leiserson who described arrays for many dense linear algebra computations for banded matrices. Early applications include computing greatest common divisors of integers and polynomials. They are sometimes classified as multiple-instruction single-data (MISD) architectures under Flynn's taxonomy, but this classification is questionable because a strong argument can be made to distinguish systolic arrays from any of Flynn's four categories: SISD, SIMD, MISD, MIMD, as discussed later in this article.

SIMD within a register (SWAR), also known by the name "packed SIMD" is a technique for performing parallel operations on data contained in a processor register. SIMD stands for single instruction, multiple data. Flynn's 1972 taxonomy categorises SWAR as "pipelined processing".

The Pittsburgh Supercomputing Center (PSC) is a high performance computing and networking center founded in 1986 and one of the original five NSF Supercomputing Centers. PSC is a joint effort of Carnegie Mellon University and the University of Pittsburgh in Pittsburgh, Pennsylvania, United States.

<span class="mw-page-title-main">Intel Paragon</span>

The Intel Paragon is a discontinued series of massively parallel supercomputers that was produced by Intel in the 1990s. The Paragon XP/S is a productized version of the experimental Touchstone Delta system that was built at Caltech, launched in 1992. The Paragon superseded Intel's earlier iPSC/860 system, to which it is closely related.

<span class="mw-page-title-main">Intel iPSC</span>

The Intel Personal SuperComputer was a product line of parallel computers in the 1980s and 1990s. The iPSC/1 was superseded by the Intel iPSC/2, and then the Intel iPSC/860.

Task parallelism is a form of parallelization of computer code across multiple processors in parallel computing environments. Task parallelism focuses on distributing tasks—concurrently performed by processes or threads—across different processors. In contrast to data parallelism which involves running the same task on different components of data, task parallelism is distinguished by running many different tasks at the same time on the same data. A common type of task parallelism is pipelining, which consists of moving a single set of data through a series of separate tasks where each task can execute independently of the others.

<span class="mw-page-title-main">History of general-purpose CPUs</span> History of processors used in general purpose computers

The history of general-purpose CPUs is a continuation of the earlier history of computing hardware.

<span class="mw-page-title-main">Finite element machine</span> Project

The Finite Element Machine (FEM) was a late 1970s-early 1980s NASA project to build and evaluate the performance of a parallel computer for structural analysis. The FEM was completed and successfully tested at the NASA Langley Research Center in Hampton, Virginia. The motivation for FEM arose from the merger of two concepts: the finite element method of structural analysis and the introduction of relatively low-cost microprocessors.

Intel oneAPI Math Kernel Library is a library of optimized math routines for science, engineering, and financial applications. Core math functions include BLAS, LAPACK, ScaLAPACK, sparse solvers, fast Fourier transforms, and vector math.

The Warp machines were a series of increasingly general-purpose systolic array processors, created by Carnegie Mellon University (CMU), in conjunction with industrial partners G.E., Honeywell and Intel, and funded by the U.S. Defense Advanced Research Projects Agency (DARPA).

<span class="mw-page-title-main">Xeon Phi</span> Series of x86 manycore processors from Intel

Xeon Phi was a series of x86 manycore processors designed and made by Intel. It was intended for use in supercomputers, servers, and high-end workstations. Its architecture allowed use of standard programming languages and application programming interfaces (APIs) such as OpenMP.

<span class="mw-page-title-main">Supercomputer architecture</span> Design of high-performance computers

Approaches to supercomputer architecture have taken dramatic turns since the earliest systems were introduced in the 1960s. Early supercomputer architectures pioneered by Seymour Cray relied on compact innovative designs and local parallelism to achieve superior computational peak performance. However, in time the demand for increased computational power ushered in the age of massively parallel systems.

Isra Vision Parsytec AG is a company of Isra Vision, founded in 1985 as Parsytec in Aachen, Germany.