The Intel Personal SuperComputer (Intel iPSC) was a product line of parallel computers in the 1980s and 1990s. The iPSC/1 was superseded by the Intel iPSC/2, and then the Intel iPSC/860.
In 1984, Justin Rattner became manager of the Intel Scientific Computers group in Beaverton, Oregon. He hired a team that included mathematician Cleve Moler. [1] The iPSC used a Hypercube internetwork topology of connections between the processors internally inspired by the Caltech Cosmic Cube research project. For that reason, it was configured with nodes numbering with power of two, which correspond to the corners of hypercubes of increasing dimension. [2]
Intel announced the iPSC/1 in 1985, with 32 to 128 nodes connected with Ethernet into a hypercube. The system was managed by a personal computer of the PC/AT era running Xenix, the "cube manager". [3] Each node had a 80286 CPU with 80287 math coprocessor, 512K of RAM, and eight Ethernet ports (seven for the hypercube interconnect, and one to talk to the cube manager). [1]
A message passing interface called NX that was developed by Paul Pierce evolved throughout the life of the iPSC line. [4] Because only the cube manager had connections to the outside world, developing and debugging applications was difficult. [5]
The basic models were the iPSC/d5 (five-dimension hypercube with 32 nodes), iPSC/d6 (six dimensions with 64 nodes), and iPSC/d7 (seven dimensions with 128 nodes). Each cabinet had 32 nodes, and prices ranged up to about half a million dollars for the four-cabinet iPSC/d7 model. [1] Extra memory (iPSC-MX) and vector processor (iPSC-VX) models were also available, in the three sizes. A four-dimensional hypercube was also available (iPSC/d4), with 16 nodes. [6]
iPSC/1 was called the first parallel computer built from commercial off-the-shelf parts. [7] This allowed it to reach the market about the same time as its competitor from nCUBE, even though the nCUBE project had started earlier. Each iPSC cabinet was (overall) 127 cm x 41 cm x 43 cm. Total computer performance was estimated at 2 MFLOPS. Memory width was 16-bit.
Serial #1 iPSC/1 with 32 nodes was delivered to Oak Ridge National Laboratory in 1985. [8] [9]
The Intel iPSC/2 was announced in 1987. It was available in several configurations, the base setup being one cabinet with 16 Intel 80386 processors at 16 MHz, each with 4 MB of memory and a 80387 coprocessor on the same module. [10] The operating system and user programs were loaded from a management PC. This PC was typically an Intel 301 with a special interface card. Instead of Ethernet, a custom Direct-Connect Module with eight channels of about 2.8 Mbyte/s data rate each was used for hypercube interconnection. [10] The custom interconnect hardware resulting in higher cost, but reduced communication delays. [11] The software in the management processor was called the System Resource Manager instead of "cube manager". The system allows for expansion up to 128 nodes, each with processor and coprocessor. [12]
The base modules could be upgraded to the SX (Scalar eXtension) version by adding a Weitek 1167 floating point unit. [13] Another configuration allowed for each processor module to be paired with a VX (Vector eXtension) module with a dedicated multiplication and addition units. This has the downside that the number of available interface card slots is halved. Having multiple cabinets as part of the same iPSC/2 system is necessary to run the maximum number of nodes and allow them to connect to VX modules. [14]
The nodes of iPSC/2 ran the proprietary NX/2 operating system, while the host machine ran System V or Xenix. [15] Nodes could be configured like the iPSC/1 without and local disk storage, or use one of the Direct-Connect Module connections with a clustered file system (called concurrent file system at the time). [14] [16] Using both faster node computing elements and the interconnect system improved application performance over the iPSC/1. [17] [18] An estimated 140 iPSC/2 systems were built. [19]
Intel announced the iPSC/860 in 1990. The iPSC/860 consisted of up to 128 processing elements connected in a hypercube, each element consisting of an Intel i860 at 40–50 MHz or Intel 80386 microprocessor. [20] Memory per node was increased to 8 MB and a similar Direct-Connect Module was used, which limited the size to 128 nodes. [21]
One customer was the Oak Ridge National Laboratory. [20] The performance of the iPSC/860 was analyzed in several research projects. [22] [23] The iPSC/860 was also the original development platform for the Tachyon parallel ray tracing engine [24] [25] that became part of the SPEC MPI 2007 benchmark, and is still widely used today. [26] The iPSC line was superseded by a research project called the Touchstone Delta at the California Institute of Technology which evolved into the Intel Paragon.
In computing, a virtual machine (VM) is the virtualization or emulation of a computer system. Virtual machines are based on computer architectures and provide the functionality of a physical computer. Their implementations may involve specialized hardware, software, or a combination of the two. Virtual machines differ and are organized by their function, shown here:
A Connection Machine (CM) is a member of a series of massively parallel supercomputers that grew out of doctoral research on alternatives to the traditional von Neumann architecture of computers by Danny Hillis at Massachusetts Institute of Technology (MIT) in the early 1980s. Starting with CM-1, the machines were intended originally for applications in artificial intelligence (AI) and symbolic processing, but later versions found greater success in the field of computational science.
nCUBE was a series of parallel computing computers from the company of the same name. Early generations of the hardware used a custom microprocessor. With its final generations of servers, nCUBE no longer designed custom microprocessors for machines, but used server-class chips manufactured by a third party in massively parallel hardware deployments, primarily for the purposes of on-demand video.
iWarp was an experimental parallel supercomputer architecture developed as a joint project by Intel and Carnegie Mellon University. The project started in 1988, as a follow-up to CMU's previous WARP research project, in order to explore building an entire parallel-computing "node" in a single microprocessor, complete with memory and communications links. In this respect the iWarp is very similar to the INMOS transputer and nCUBE.
Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different forms of parallel computing: bit-level, instruction-level, data, and task parallelism. Parallelism has long been employed in high-performance computing, but has gained broader interest due to the physical constraints preventing frequency scaling. As power consumption by computers has become a concern in recent years, parallel computing has become the dominant paradigm in computer architecture, mainly in the form of multi-core processors.
In computing, multiple instruction, multiple data (MIMD) is a technique employed to achieve parallelism. Machines using MIMD have a number of processors that function asynchronously and independently. At any time, different processors may be executing different instructions on different pieces of data.
Message Passing Interface (MPI) is a standardized and portable message-passing standard designed to function on parallel computing architectures. The MPI standard defines the syntax and semantics of library routines that are useful to a wide range of users writing portable message-passing programs in C, C++, and Fortran. There are several open-source MPI implementations, which fostered the development of a parallel software industry, and encouraged development of portable and scalable large-scale parallel applications.
The fat tree network is a universal network for provably efficient communication. It was invented by Charles E. Leiserson of the Massachusetts Institute of Technology in 1985. k-ary n-trees, the type of fat-trees commonly used in most high-performance networks, were initially formalized in 1997.
In computer science, concurrency is the ability of different parts or units of a program, algorithm, or problem to be executed out-of-order or in partial order, without affecting the outcome. This allows for parallel execution of the concurrent units, which can significantly improve overall speed of the execution in multi-processor and multi-core systems. In more technical terms, concurrency refers to the decomposability of a program, algorithm, or problem into order-independent or partially-ordered components or units of computation.
Dataparallel C is a programming language based on C with parallel extensions by Philip Hatcher and Michael Quinn of the University of New Hampshire. Dataparallel C was based on an early version of C* and ran on the Intel iPSC/2 and nCUBE.
MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster.
The Caltech Cosmic Cube was a parallel computer, developed by Charles Seitz and Geoffrey C Fox from 1981 onward. It was the first working hypercube built.
The Intel Paragon is a discontinued series of massively parallel supercomputers that was produced by Intel in the 1990s. The Paragon XP/S is a productized version of the experimental Touchstone Delta system that was built at Caltech, launched in 1992. The Paragon superseded Intel's earlier iPSC/860 system, to which it is closely related.
Matthew Flatt is an American computer scientist and professor at the University of Utah School of Computing in Salt Lake City. He is also the leader of the core development team for the Racket programming language.
In graph theory, the cube-connected cycles is an undirected cubic graph, formed by replacing each vertex of a hypercube graph by a cycle. It was introduced by Preparata & Vuillemin (1981) for use as a network topology in parallel computing.
HPX, short for High Performance ParalleX, is a runtime system for high-performance computing. It is currently under active development by the STE||AR group at Louisiana State University. Focused on scientific computing, it provides an alternative execution model to conventional approaches such as MPI. HPX aims to overcome the challenges MPI faces with increasing large supercomputers by using asynchronous communication between nodes and lightweight control objects instead of global barriers, allowing application developers to exploit fine-grained parallelism.
In computer science, the all nearest smaller values problem is the following task: for each position in a sequence of numbers, search among the previous positions for the last position that contains a smaller value. This problem can be solved efficiently both by parallel and non-parallel algorithms: Berkman, Schieber & Vishkin (1993), who first identified the procedure as a useful subroutine for other parallel programs, developed efficient algorithms to solve it in the Parallel Random Access Machine model; it may also be solved in linear time on a non-parallel computer using a stack-based algorithm. Later researchers have studied algorithms to solve it in other models of parallel computation.
A lightweight kernel (LWK) operating system is one used in a large computer with many processor cores, termed a parallel computer.
In computer science, region-based memory management is a type of memory management in which each allocated object is assigned to a region. A region, also called a zone, arena, area, or memory context, is a collection of allocated objects that can be efficiently reallocated or deallocated all at once. Like stack allocation, regions facilitate allocation and deallocation of memory with low overhead; but they are more flexible, allowing objects to live longer than the stack frame in which they were allocated. In typical implementations, all objects in a region are allocated in a single contiguous range of memory addresses, similarly to how stack frames are typically allocated.
Tachyon is a parallel/multiprocessor ray tracing software. It is a parallel ray tracing library for use on distributed memory parallel computers, shared memory computers, and clusters of workstations. Tachyon implements rendering features such as ambient occlusion lighting, depth-of-field focal blur, shadows, reflections, and others. It was originally developed for the Intel iPSC/860 by John Stone for his M.S. thesis at University of Missouri-Rolla. Tachyon subsequently became a more functional and complete ray tracing engine, and it is now incorporated into a number of other open source software packages such as VMD, and SageMath. Tachyon is released under a permissive license.