Intel iPSC

Last updated

The Intel Personal SuperComputer (Intel iPSC) was a product line of parallel computers in the 1980s and 1990s. The iPSC/1 was superseded by the Intel iPSC/2, and then the Intel iPSC/860.

Contents

iPSC/1

In 1984, Justin Rattner became manager of the Intel Scientific Computers group in Beaverton, Oregon. He hired a team that included mathematician Cleve Moler. [1] The iPSC used a Hypercube internetwork topology of connections between the processors internally inspired by the Caltech Cosmic Cube research project. For that reason, it was configured with nodes numbering with power of two, which correspond to the corners of hypercubes of increasing dimension. [2]

Intel iPSC-1 (1985) at Computer History Museum (S see also other photo) Intel iPSC-1 (1985) - Computer History Museum (2007-11-10 22.58.31 by Carlo Nardone) edit1.jpg
Intel iPSC-1 (1985) at Computer History Museum (S see also other photo)

Intel announced the iPSC/1 in 1985, with 32 to 128 nodes connected with Ethernet into a hypercube. The system was managed by a personal computer of the PC/AT era running Xenix, the "cube manager". [3] Each node had a 80286 CPU with 80287 math coprocessor, 512K of RAM, and eight Ethernet ports (seven for the hypercube interconnect, and one to talk to the cube manager). [1]

A message passing interface called NX that was developed by Paul Pierce evolved throughout the life of the iPSC line. [4] Because only the cube manager had connections to the outside world, developing and debugging applications was difficult. [5]

The basic models were the iPSC/d5 (five-dimension hypercube with 32 nodes), iPSC/d6 (six dimensions with 64 nodes), and iPSC/d7 (seven dimensions with 128 nodes). Each cabinet had 32 nodes, and prices ranged up to about half a million dollars for the four-cabinet iPSC/d7 model. [1] Extra memory (iPSC-MX) and vector processor (iPSC-VX) models were also available, in the three sizes. A four-dimensional hypercube was also available (iPSC/d4), with 16 nodes. [6]

iPSC/1 was called the first parallel computer built from commercial off-the-shelf parts. [7] This allowed it to reach the market about the same time as its competitor from nCUBE, even though the nCUBE project had started earlier. Each iPSC cabinet was (overall) 127 cm x 41 cm x 43 cm. Total computer performance was estimated at 2 MFLOPS. Memory width was 16-bit.

Serial #1 iPSC/1 with 32 nodes was delivered to Oak Ridge National Laboratory in 1985. [8] [9]

iPSC/2

Intel iPSC/2 16-node parallel computer. August 22, 1995. Intel iPSC 2 16-node parallel computer.gif
Intel iPSC/2 16-node parallel computer. August 22, 1995.

The Intel iPSC/2 was announced in 1987. It was available in several configurations, the base setup being one cabinet with 16 Intel 80386 processors at 16 MHz, each with 4 MB of memory and a 80387 coprocessor on the same module. [10] The operating system and user programs were loaded from a management PC. This PC was typically an Intel 301 with a special interface card. Instead of Ethernet, a custom Direct-Connect Module with eight channels of about 2.8 Mbyte/s data rate each was used for hypercube interconnection. [10] The custom interconnect hardware resulting in higher cost, but reduced communication delays. [11] The software in the management processor was called the System Resource Manager instead of "cube manager". The system allows for expansion up to 128 nodes, each with processor and coprocessor. [12]

The base modules could be upgraded to the SX (Scalar eXtension) version by adding a Weitek 1167 floating point unit. [13] Another configuration allowed for each processor module to be paired with a VX (Vector eXtension) module with a dedicated multiplication and addition units. This has the downside that the number of available interface card slots is halved. Having multiple cabinets as part of the same iPSC/2 system is necessary to run the maximum number of nodes and allow them to connect to VX modules. [14]

The nodes of iPSC/2 ran the proprietary NX/2 operating system, while the host machine ran System V or Xenix. [15] Nodes could be configured like the iPSC/1 without and local disk storage, or use one of the Direct-Connect Module connections with a clustered file system (called concurrent file system at the time). [14] [16] Using both faster node computing elements and the interconnect system improved application performance over the iPSC/1. [17] [18] An estimated 140 iPSC/2 systems were built. [19]

iPSC/860

Intel iPSC/860 32-node parallel computer front panel, while running the Tachyon parallel ray tracing engine. August 22, 1995. Intel iPSC 860 32-node parallel computer front panel.gif
Intel iPSC/860 32-node parallel computer front panel, while running the Tachyon parallel ray tracing engine. August 22, 1995.

Intel announced the iPSC/860 in 1990. The iPSC/860 consisted of up to 128 processing elements connected in a hypercube, each element consisting of an Intel i860 at 40–50 MHz or Intel 80386 microprocessor. [20] Memory per node was increased to 8 MB and a similar Direct-Connect Module was used, which limited the size to 128 nodes. [21]

Intel iPSC/860 32-node parallel computer with front door open, showing compute nodes, I/O nodes, and system management boards. August 22, 1995. Intel iPSC 860 32-node parallel computer (door open).gif
Intel iPSC/860 32-node parallel computer with front door open, showing compute nodes, I/O nodes, and system management boards. August 22, 1995.

One customer was the Oak Ridge National Laboratory. [20] The performance of the iPSC/860 was analyzed in several research projects. [22] [23] The iPSC/860 was also the original development platform for the Tachyon parallel ray tracing engine [24] [25] that became part of the SPEC MPI 2007 benchmark, and is still widely used today. [26] The iPSC line was superseded by a research project called the Touchstone Delta at the California Institute of Technology which evolved into the Intel Paragon.

Related Research Articles

In computing, a virtual machine (VM) is the virtualization or emulation of a computer system. Virtual machines are based on computer architectures and provide the functionality of a physical computer. Their implementations may involve specialized hardware, software, or a combination of the two. Virtual machines differ and are organized by their function, shown here:

<span class="mw-page-title-main">Connection Machine</span> Supercomputer

A Connection Machine (CM) is a member of a series of massively parallel supercomputers that grew out of doctoral research on alternatives to the traditional von Neumann architecture of computers by Danny Hillis at Massachusetts Institute of Technology (MIT) in the early 1980s. Starting with CM-1, the machines were intended originally for applications in artificial intelligence (AI) and symbolic processing, but later versions found greater success in the field of computational science.

nCUBE was a series of parallel computing computers from the company of the same name. Early generations of the hardware used a custom microprocessor. With its final generations of servers, nCUBE no longer designed custom microprocessors for machines, but used server-class chips manufactured by a third party in massively parallel hardware deployments, primarily for the purposes of on-demand video.

iWarp was an experimental parallel supercomputer architecture developed as a joint project by Intel and Carnegie Mellon University. The project started in 1988, as a follow-up to CMU's previous WARP research project, in order to explore building an entire parallel-computing "node" in a single microprocessor, complete with memory and communications links. In this respect the iWarp is very similar to the INMOS transputer and nCUBE.

<span class="mw-page-title-main">Parallel computing</span> Programming paradigm in which many processes are executed simultaneously

Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different forms of parallel computing: bit-level, instruction-level, data, and task parallelism. Parallelism has long been employed in high-performance computing, but has gained broader interest due to the physical constraints preventing frequency scaling. As power consumption by computers has become a concern in recent years, parallel computing has become the dominant paradigm in computer architecture, mainly in the form of multi-core processors.

<span class="mw-page-title-main">Multiple instruction, multiple data</span> Computing technique employed to achieve parallelism

In computing, multiple instruction, multiple data (MIMD) is a technique employed to achieve parallelism. Machines using MIMD have a number of processors that function asynchronously and independently. At any time, different processors may be executing different instructions on different pieces of data.

Message Passing Interface (MPI) is a standardized and portable message-passing standard designed to function on parallel computing architectures. The MPI standard defines the syntax and semantics of library routines that are useful to a wide range of users writing portable message-passing programs in C, C++, and Fortran. There are several open-source MPI implementations, which fostered the development of a parallel software industry, and encouraged development of portable and scalable large-scale parallel applications.

<span class="mw-page-title-main">Fat tree</span> Universal network for provably efficient communication

The fat tree network is a universal network for provably efficient communication. It was invented by Charles E. Leiserson of the Massachusetts Institute of Technology in 1985. k-ary n-trees, the type of fat-trees commonly used in most high-performance networks, were initially formalized in 1997.

<span class="mw-page-title-main">Concurrency (computer science)</span> Ability to execute a task in a non-serial manner

In computer science, concurrency is the ability of different parts or units of a program, algorithm, or problem to be executed out-of-order or in partial order, without affecting the outcome. This allows for parallel execution of the concurrent units, which can significantly improve overall speed of the execution in multi-processor and multi-core systems. In more technical terms, concurrency refers to the decomposability of a program, algorithm, or problem into order-independent or partially-ordered components or units of computation.

Dataparallel C is a programming language based on C with parallel extensions by Philip Hatcher and Michael Quinn of the University of New Hampshire. Dataparallel C was based on an early version of C* and ran on the Intel iPSC/2 and nCUBE.

MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster.

The Caltech Cosmic Cube was a parallel computer, developed by Charles Seitz and Geoffrey C Fox from 1981 onward. It was the first working hypercube built.

<span class="mw-page-title-main">Intel Paragon</span>

The Intel Paragon is a discontinued series of massively parallel supercomputers that was produced by Intel in the 1990s. The Paragon XP/S is a productized version of the experimental Touchstone Delta system that was built at Caltech, launched in 1992. The Paragon superseded Intel's earlier iPSC/860 system, to which it is closely related.

Matthew Flatt is an American computer scientist and professor at the University of Utah School of Computing in Salt Lake City. He is also the leader of the core development team for the Racket programming language.

<span class="mw-page-title-main">Cube-connected cycles</span> Undirected cubic graph derived from a hypercube graph

In graph theory, the cube-connected cycles is an undirected cubic graph, formed by replacing each vertex of a hypercube graph by a cycle. It was introduced by Preparata & Vuillemin (1981) for use as a network topology in parallel computing.

HPX, short for High Performance ParalleX, is a runtime system for high-performance computing. It is currently under active development by the STE||AR group at Louisiana State University. Focused on scientific computing, it provides an alternative execution model to conventional approaches such as MPI. HPX aims to overcome the challenges MPI faces with increasing large supercomputers by using asynchronous communication between nodes and lightweight control objects instead of global barriers, allowing application developers to exploit fine-grained parallelism.

In computer science, the all nearest smaller values problem is the following task: for each position in a sequence of numbers, search among the previous positions for the last position that contains a smaller value. This problem can be solved efficiently both by parallel and non-parallel algorithms: Berkman, Schieber & Vishkin (1993), who first identified the procedure as a useful subroutine for other parallel programs, developed efficient algorithms to solve it in the Parallel Random Access Machine model; it may also be solved in linear time on a non-parallel computer using a stack-based algorithm. Later researchers have studied algorithms to solve it in other models of parallel computation.

A lightweight kernel (LWK) operating system is one used in a large computer with many processor cores, termed a parallel computer.

In computer science, region-based memory management is a type of memory management in which each allocated object is assigned to a region. A region, also called a zone, arena, area, or memory context, is a collection of allocated objects that can be efficiently reallocated or deallocated all at once. Like stack allocation, regions facilitate allocation and deallocation of memory with low overhead; but they are more flexible, allowing objects to live longer than the stack frame in which they were allocated. In typical implementations, all objects in a region are allocated in a single contiguous range of memory addresses, similarly to how stack frames are typically allocated.

<span class="mw-page-title-main">Tachyon (software)</span>

Tachyon is a parallel/multiprocessor ray tracing software. It is a parallel ray tracing library for use on distributed memory parallel computers, shared memory computers, and clusters of workstations. Tachyon implements rendering features such as ambient occlusion lighting, depth-of-field focal blur, shadows, reflections, and others. It was originally developed for the Intel iPSC/860 by John Stone for his M.S. thesis at University of Missouri-Rolla. Tachyon subsequently became a more functional and complete ray tracing engine, and it is now incorporated into a number of other open source software packages such as VMD, and SageMath. Tachyon is released under a permissive license.

References

  1. 1 2 3 Moler, Cleve (October 28, 2013). "The Intel Hypercube, part 1" . Retrieved November 4, 2013.
  2. "The Personal SuperComputer". Computer History Museum. Retrieved November 4, 2013.
  3. Pierce, Paul R. "Intel iPSC/1". Archived from the original on June 3, 2013. Retrieved November 4, 2013.
  4. Pierce, Paul (April 1994). "The NX message passing interface". Parallel Computing. 20 (4): 1285–1302. doi:10.1016/0167-8191(94)90023-X.
  5. Schedlbauer, Martin J. (1989). "An I/O management system for the iPSC/1 hypercube". Proceedings of the 17th Conference on ACM Annual Computer Science Conference. p. 400. doi:10.1145/75427. ISBN   978-0-89791-299-0.
  6. http://delivery.acm.org/10.1145/70000/63074/p1207-orcutt.pdf%5B%5D
  7. Pierce, Paul R. "Other Artifacts in the Collection". Archived from the original on June 3, 2013. Retrieved November 4, 2013.
  8. Riley, Betsy A. "ORNL HPCC history (timeline details)".
  9. "History of Supercomputing".
  10. 1 2 "Intel iPSC/2 (Rubik)". Computer Museum. Katholieke Universiteit Leuven . Retrieved November 4, 2013.
  11. Hatcher, Philip J.; Quinn, Michael Jay (1991). Data-parallel Programming on MIMD Computers. MIT Press. p. 7. ISBN   9780262082051.
  12. Chauddhuri, P. Pal (2008). Computer Organization and Design. PHI Learning. p. 826. ISBN   9788120335110.
  13. Ravikumār, Si. Pi (1996). Parallel Methods for VLSI Layout Design. Greenwood Publishing Group. p. 183. ISBN   9780893918286.
  14. 1 2 Dongarra, Jack; Duff, Iain S. (1991). "Advanced Architecture Computers". In Adeli, Hojjat (ed.). Supercomputing in Engineering Analysis. CRC Press. pp. 51–54. ISBN   9780893918286.
  15. Pierce, Paul (1988). "The NX/2 operating system". Proceedings of the third conference on Hypercube concurrent computers and applications Architecture, software, computer systems, and general issues -. C3P. Vol. 1. ACM. pp. 384–390. doi:10.1145/62297.62341. ISBN   978-0-89791-278-5. S2CID   45688408.
  16. French, James C.; Pratt, Terrence W.; Das, Mriganka (May 1991). "Performance measurement of a parallel Input/Output system for the Intel iPSC/2 Hypercube". Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems - SIGMETRICS '91. ACM. pp. 178–187. doi:10.1145/107971.107990. ISBN   978-0-89791-392-8. S2CID   13899933.
  17. Arshi, S.; Asbury, R.; Brandenburg, J.; Scott, D. (1988). "Application performance improvement on the iPSC/2 computer". Proceedings of the third conference on Hypercube concurrent computers and applications Architecture, software, computer systems, and general issues -. Vol. 1. ACM. pp. 149–154. doi:10.1145/62297.62316. ISBN   978-0-89791-278-5. S2CID   46148117.
  18. Bomans, Luc; Roose, Dirk (September 1989). "Benchmarking the iPSC/2 hypercube multiprocessor". Concurrency: Practice and Experience. 1 (1): 3–18. doi:10.1002/cpe.4330010103.
  19. Gilbert Kalb; Robert Moxley, eds. (1992). "Commercially Available Systems". Massively Parallel, Optical, and Neural Computing in the United States. IOS Press. pp. 17–18. ISBN   9781611971507.
  20. 1 2 Ramachandramurthi, Siddharthan (1996). "iPSC/860 Guide". Computational Science Education Project at Oak Ridge National Laboratory. Retrieved November 4, 2013.
  21. Venkatakrishnan, V. (1991). "Parallel Implicit Methods for Aerodynamic Applications on Unstructured Grids". In Keyes, David E.; Saad, Y.; Truhlar, Donald G. (eds.). Domain-based Parallelism and Problem Decomposition Methods in Computational Science and Engineering. SIAM. p. 66. ISBN   9781611971507.
  22. Berrendorf, Rudolf; Helin, Jukka (May 1992). "Evaluating the basic performance of the Intel iPSC/860 parallel computer". Concurrency: Practice and Experience. 4 (3): 223–240. doi:10.1002/cpe.4330040303.
  23. Dunigan, T.H. (December 1991). "Performance of the Intel iPSC/860 and Ncube 6400 hypercubes". Parallel Computing. 17 (10–11): 1285–1302. doi:10.1016/S0167-8191(05)80039-0.
  24. Stone, J.; Underwood, M. (1996-07-01). "Rendering of numerical flow simulations using MPI". Proceedings. Second MPI Developer's Conference. pp. 138–141. CiteSeerX   10.1.1.27.4822 . doi:10.1109/MPIDC.1996.534105. ISBN   978-0-8186-7533-1. S2CID   16846313.
  25. Stone, John E. (January 1998). An Efficient Library for Parallel Ray Tracing and Animation (Masters). Computer Science Department, University of Missouri-Rolla, April 1998.
  26. Stone, J.E.; Isralewitz, B.; Schulten, K. (2013-08-01). "Early experiences scaling VMD molecular visualization and analysis jobs on blue waters". 2013 Extreme Scaling Workshop (XSW 2013). pp. 43–50. CiteSeerX   10.1.1.396.3545 . doi:10.1109/XSW.2013.10. ISBN   978-1-4799-3691-5. S2CID   16329833.