WARP (systolic array)

Last updated

The Warp machines were a series of increasingly general-purpose systolic array processors, created by Carnegie Mellon University (CMU), in conjunction with industrial partners G.E., Honeywell and Intel, and funded by the U.S. Defense Advanced Research Projects Agency (DARPA). [1]

In parallel computer architectures, a systolic array is a homogeneous network of tightly coupled data processing units (DPUs) called cells or nodes. Each node or DPU independently computes a partial result as a function of the data received from its upstream neighbors, stores the result within itself and passes it downstream. Systolic arrays were invented by H. T. Kung and Charles Leiserson who described arrays for many dense linear algebra computations for banded matrices. Early applications include computing greatest common divisors of integers and polynomials. They are sometimes classified as multiple-instruction single-data (MISD) architectures under Flynn's taxonomy, but this classification is questionable because a strong argument can be made to distinguish systolic arrays from any of Flynn's four categories: SISD, SIMD, MISD, MIMD, as discussed later in this article.

Carnegie Mellon University private research university in Pittsburgh, Pennsylvania, United States

Carnegie Mellon University (CMU) is a private research university based in Pittsburgh, Pennsylvania. Founded in 1900 by Andrew Carnegie as the Carnegie Technical Schools, the university became the Carnegie Institute of Technology in 1912 and began granting four-year degrees. In 1967, the Carnegie Institute of Technology merged with the Mellon Institute of Industrial Research to form Carnegie Mellon University. With its main campus located 3 miles (5 km) from Downtown Pittsburgh, Carnegie Mellon has grown into an international university with over a dozen degree-granting locations in six continents, including campuses in Qatar and Silicon Valley, and more than 20 research partnerships.

Honeywell Multinational conglomerate

Honeywell International Inc. is an American multinational conglomerate company that makes a variety of commercial and consumer products, engineering services and aerospace systems for a wide variety of customers, from private consumers to major corporations and governments. The company operates four business units, known as Strategic Business Units – Honeywell Aerospace, Home and Building Technologies (HBT), Safety and Productivity Solutions (SPS), and Honeywell Performance Materials and Technologies.

Contents

The Warp projects were started in 1984 by H. T. Kung at Carnegie Mellon University. The Warp projects yielded research results, publications and advancements in general purpose systolic hardware design, compiler design and systolic software algorithms. There were three distinct machine designs known as the WW-Warp (Wire Wrap Warp), PC-Warp (Printed Circuit Warp), and iWarp (integrated circuit Warp, conveniently also a play on the “i” for Intel). [2]

H. T. Kung is a Taiwanese American computer scientist. His research is primarily in the area of machine learning, signal processing and compressive sensing, and parallel computing, but his interests have been broad-ranging, including computational complexity theory, database theory, VLSI design, and parallel computing.

iWarp was an experimental parallel supercomputer architecture developed as a joint project by Intel and Carnegie Mellon University. The project started in 1988, as a follow-up to CMU's previous WARP research project, in order to explore building an entire parallel-computing "node" in a single microprocessor, complete with memory and communications links. In this respect the iWarp is very similar to the INMOS transputer and nCUBE.

Each successive generation became increasingly general-purpose by increasing memory capacity and loosening the coupling between processors. Only the original WW-Warp forced a truly lock step sequencing of stages, which severely restricted its programmability but was in a sense the purest “systolic-array” design.

Warp machines were attached to Sun workstations (UNIX based). Software development for all models of Warp machines was done on Sun workstations.

Sun Microsystems defunct computer hardware and software company which was based in Santa Clara

Sun Microsystems, Inc. was an American company that sold computers, computer components, software, and information technology services and created the Java programming language, the Solaris operating system, ZFS, the Network File System (NFS), and SPARC. Sun contributed significantly to the evolution of several key computing technologies, among them Unix, RISC processors, thin client computing, and virtualized computing. Sun was founded on February 24, 1982. At its height, the Sun headquarters were in Santa Clara, California, on the former west campus of the Agnews Developmental Center.

A research compiler, for a language known as “W2,” targeted all three machines and was the only compiler for the WW-Warp and PC-Warp while it served as an early compiler during development of the iWarp. [3] The production compiler for iWarp was a C and Fortran compiler based on the AT&T pcc compiler for UNIX, ported under contract for Intel and then extensively modified and extend by Intel. [4]

AT&T American multinational conglomerate holding company

AT&T Inc. is an American multinational conglomerate holding company headquartered at Whitacre Tower in Downtown Dallas, Texas. It is the world's largest telecommunications company, the second largest provider of mobile telephone services, and the largest provider of fixed telephone services in the United States through AT&T Communications. Since June 14, 2018, it is also the parent company of mass media conglomerate WarnerMedia, making it the world's largest media and entertainment company in terms of revenue. As of 2018, AT&T is ranked #9 on the Fortune 500 rankings of the largest United States corporations by total revenue.

The WW-Warp and PC-Warp machines were systolic array computers with a linear array of ten or more cells, each of which is a programmable processor capable of performing 10 million single precision floating-point operations per second (10 MFLOPS). A 10-cell machine had a peak performance of 100 MFLOPS. The iWarp machines doubled this performance, delivering 20 MFLOPS single precision and supporting double precision floating point at half the performance. [5]

A two cell prototype of WW-Warp was complete at Carnegie Mellon in June 1985. Two essentially identical ten-cell WW-Warp were produced in 1986, one by Honeywell and one by G.E., for use at Carnegie Mellon University. The system from G.E. was delivered in February 1986; the system from Honeywell was delivered in June 1986. The first of the significantly redesign production model, the PC-Warp, was delivered by G.E. in April 1987. About twenty production models of the PC-Warp were produced and sold by G.E. during 1987-1989.

The iWarp machines were based on a single-chip custom 700,000 transistor microprocessor, designed specifically for the Warp project, that utilized long-instruction-word (LIW) format instructions and tightly integrated communications with the computational processor. The standard iWarp machines configuration arranged iWarp nodes in a 2m x 2n torus. All iWarp machines included the “backedges” and, therefore, were tori. [6]

In 1986, Intel was selected, as a result of competitive bidding, to be the industrial partner for the integrated circuit implementation of Warp. The first iWarp system, a twelve node system, became operational in March 1990. After a number of stepping of the part, about 39 machines, consisting of ten or more C-Step iWarp chips running at 20 MHz, were produced and sold by Intel in 1992 and 1993 to universities, government agencies and industrial research laboratories. [7]

See also

Notes

  1. Thomas Gross and Monica Lam. 1998. Retrospective: a retrospective on the Warp machines. In 25 years of the international symposia on Computer architecture (selected papers) (ISCA '98), Gurindar S. Sohi (Ed.). ACM, New York, NY, USA, 45-47.
  2. Thomas Gross and David R. O'Hallaron. iWarp: anatomy of a parallel computing system, MIT Press, Cambridge, MA, 1998.
  3. Monica S. Lam. A Systolic Array Optimizing Compiler, Dordrecht, The Netherlands: Kluwer Academic Publishers, 1989.
  4. Ali-Reza Adl-Tabatabai, Thomas Gross, Guei-Yuan Lueh and James Reinders. Modeling Instruction-Level Parallelism for Software Pipelining. In Proceedings of the IFIP WG10.3 Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism, Orlando, FL, pages 321-330.
  5. Intel Corp. iWarp Microprocessor (Part Number 318153), Hillsboro, Oregon, 1991. Technical Information, Order Number 281006.
  6. Shekhar Borkar, Robert Cohn, George Cox, Sha Gleason, and Thomas Gross. iWarp: an integrated solution of high-speed parallel computing, Proceedings of the 1988 ACM/IEEE conference on Supercomputing, p.330-339, November 12–17, 1988.
  7. Encyclopedia of Parallel Computing, Padua, David (Ed.), 2011, ISBN   978-0-387-09765-7

Related Research Articles

Intel 80186

The Intel 80186, also known as the iAPX 186, or just 186, is a microprocessor and microcontroller introduced in 1982. It was based on the Intel 8086 and, like it, had a 16-bit external data bus multiplexed with a 20-bit address bus. It was also available as the 80188, with an 8-bit external data bus.

Microprocessor computer processor contained on an integrated-circuit chip

A microprocessor is a computer processor that incorporates the functions of a central processing unit on a single integrated circuit (IC), or at most a few integrated circuits. The microprocessor is a multipurpose, clock driven, register based, digital integrated circuit that accepts binary data as input, processes it according to instructions stored in its memory, and provides results as output. Microprocessors contain both combinational logic and sequential digital logic. Microprocessors operate on numbers and symbols represented in the binary number system.

PowerPC RISC instruction set architecture by AIM alliance

PowerPC is a reduced instruction set computing (RISC) instruction set architecture (ISA) created by the 1991 Apple–IBM–Motorola alliance, known as AIM. PowerPC, as an evolving instruction set, has since 2006 been named Power ISA, while the old name lives on as a trademark for some implementations of Power Architecture-based processors.

x86 family of instruction set architectures

x86 is a family of instruction set architectures based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was introduced in 1978 as a fully 16-bit extension of Intel's 8-bit 8080 microprocessor, with memory segmentation as a solution for addressing more memory than can be covered by a plain 16-bit address. The term "x86" came into being because the names of several successors to Intel's 8086 processor end in "86", including the 80186, 80286, 80386 and 80486 processors.

SIMD class of parallel computers in Flynns taxonomy, with multiple processing elements that perform the same operation on multiple data points simultaneously

Single instruction, multiple data (SIMD) is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data points simultaneously. Such machines exploit data level parallelism, but not concurrency: there are simultaneous (parallel) computations, but only a single process (instruction) at a given moment. SIMD is particularly applicable to common tasks such as adjusting the contrast in a digital image or adjusting the volume of digital audio. Most modern CPU designs include SIMD instructions to improve the performance of multimedia use. SIMD is not to be confused with SIMT, which utilizes threads.

In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set containing instructions that operate on one-dimensional arrays of data called vectors, compared to the scalar processors, whose instructions operate on single data items. Vector processors can greatly improve performance on certain workloads, notably numerical simulation and similar tasks. Vector machines appeared in the early 1970s and dominated supercomputer design through the 1970s into the 1990s, notably the various Cray platforms. The rapid fall in the price-to-performance ratio of conventional microprocessor designs led to the vector supercomputer's demise in the later 1990s.

Convex Computer Corporation was a company that developed, manufactured and marketed vector minisupercomputers and supercomputers for small-to-medium-sized businesses. Their later Exemplar series of parallel computing machines were based on the Hewlett-Packard (HP) PA-RISC microprocessors, and in 1995, HP bought the company. Exemplar machines were offered for sale by HP for some time, and Exemplar technology was used in HP's V-Class machines.

The Intel i860 was a RISC microprocessor design introduced by Intel in 1989. It was one of Intel's first attempts at an entirely new, high-end instruction set architecture since the failed Intel iAPX 432 from the 1980s. It was released with considerable fanfare, slightly obscuring the earlier Intel i960, which was successful in some niches of embedded systems, and which many considered to be a better design. The i860 never achieved commercial success and the project was terminated in the mid-1990s.

Parallel computing programming paradigm in which many calculations or the execution of processes are carried out simultaneously

Parallel computing is a type of computation in which many calculations or the execution of processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different forms of parallel computing: bit-level, instruction-level, data, and task parallelism. Parallelism has long been employed in high-performance computing, but it's gaining broader interest due to the physical constraints preventing frequency scaling. As power consumption by computers has become a concern in recent years, parallel computing has become the dominant paradigm in computer architecture, mainly in the form of multi-core processors.

The history of computing hardware starting at 1960 is marked by the conversion from vacuum tube to solid-state devices such as the transistor and later the integrated circuit. By 1959 discrete transistors were considered sufficiently reliable and economical that they made further vacuum tube computers uncompetitive. Computer main memory slowly moved away from magnetic core memory devices to solid-state static and dynamic semiconductor memory, which greatly reduced the cost, size and power consumption of computers.

MAJC was a Sun Microsystems multi-core, multithreaded, very long instruction word (VLIW) microprocessor design from the mid-to-late 1990s. Originally called the UltraJava processor, the MAJC processor was targeted at running Java programs, whose "late compiling" allowed Sun to make several favourable design decisions. The processor was released into two commercial graphical cards from Sun. Lessons learned regarding multi-threads on a multi-core processor provided a basis for later OpenSPARC implementations such as the UltraSPARC T1.

SIMD within a register (SWAR) is a technique for performing parallel operations on data contained in a processor register. SIMD stands for single instruction, multiple data.

Monica Sin-Ling Lam is a professor in the Computer Science Department at Stanford, and founder of Moka5 and Omlet.

The history of general-purpose CPUs is a continuation of the earlier history of computing hardware.

In computer architecture, 16-bit integers, memory addresses, or other data units are those that are 16 bits wide. Also, 16-bit CPU and ALU architectures are those that are based on registers, address buses, or data buses of that size. 16-bit microcomputers are computers in which 16-bit microprocessors were the norm.

Intel Array Building Blocks was a C++ library developed by Intel Corporation for exploiting data parallel portions of programs to take advantage of multi-core processors, graphics processing units and Intel Many Integrated Core Architecture processors. ArBB provides a generalized vector parallel programming solution designed to avoid direct dependencies on particular low-level parallelism mechanisms or hardware architectures. ArBB is oriented to applications that require data-intensive mathematical computations. By default, ArBB programs cannot create data races or deadlocks.

Xeon Phi

Xeon Phi is a series of x86 manycore processors designed and made by Intel. It is intended for use in supercomputers, servers, and high-end workstations. Its architecture allows use of standard programming languages and APIs such as OpenMP.