MasPar

MasPar Computer Corporation
Industry	Computers
Founded	1987;37 years ago in Sunnyvale, California, United States
Founder	Jeff Kalb
Defunct	1999
Fate	Acquired by Accrue Software

Last updated November 01, 2024

MasPar Computer Corporation was a minisupercomputer vendor that was founded in 1987 by Jeff Kalb. The company was based in Sunnyvale, California.

History

While Kalb was the vice-president of the division of Digital Equipment Corporation (DEC) that built integrated circuits, some researchers in that division were building a supercomputer based on the Goodyear MPP (massively parallel processor) supercomputer. The DEC researchers enhanced the architecture by:

making the processor elements to be 4-bit instead of 1-bit ^[1]
increasing the connectivity of each processor element to 8 neighbors from 4.
adding a global interconnect for all of the processing elements, which was a triple-redundant switch which was easier to implement than a full crossbar switch.

After Digital decided not to commercialize the research project, Kalb decided to start a company to sell this minisupercomputer. In 1990, the first generation product MP-1 was delivered. In 1992, the follow-on MP-2 was shipped. The company shipped more than 200 systems.

MasPar along with nCUBE criticized the open government support, by DARPA, of competitors Intel for their hypercube Personal SuperComputers (iPSC) and the Thinking Machines Connection Machine on the pages of Datamation .

Samples of MasPar MPs, from the NASA Goddard Space Flight Center, are in storage at the Computer History Museum.

MasPar offered a family of SIMD machines, second sourced by DEC. The processor units are proprietary.

There was no MP-3. MasPar exited the computer hardware business in June 1996, halting all hardware development and transforming itself into a new data mining software company called NeoVista Software. NeoVista was acquired by Accrue Software in 1999, which in turn sold the division to JDA Software in 2001.^[2]^[3]

Hardware

MasPar is unique in being a manufacturer of SIMD supercomputers (as opposed to vector machines). In this approach, a collection of ALU's listen to a program broadcast from a central source. The ALUs can do their own data fetch, but are all under control of a central Array Control Unit. There is a central clock. The emphasis is on communications efficiency, and low latency. The MasPar architecture is designed to scale, and balance processing, memory, and communication.

The Maspar MP-1 PE and the later binary-compatible Maspar MP-2 PE are full custom CMOS chips, designed in-house,^[1] and fabricated by various vendors such as HP or TI.

The Array Control Unit (ACU) handles instruction fetch. It is a load-store architecture. The MasPar architecture is Harvard in a broad sense. The ACU implements a microcoded instruction fetch, but achieves a RISC-like 1 instruction per clock. The Arithmetic units, ALUs with data fetch capability, are implemented 32 to a chip. Each ALU is connected in a nearest neighbor fashion to 8 others. The edge connections are brought off-chip. In this scheme, the perimeters can be toroid-wrapped. Up to 16,384 units can be connected within the confines of a cabinet. A global router, essentially a cross-bar switch, provides external I/O to the processor array.

The MP-2 PE chip contains 32 processor elements, each a full 32-bit ALU with floating point, registers, and a barrel shifter. Only the instruction fetch feature is removed, and placed in the ACU. The PE design is literally replicated 32 times on the chip. The chip is designed to interface to DRAM, to other processor array chips, and to communication router chips.

Each ALU, called a PE slice, contains 64 × 32 bit registers that are used for both integer and floating point. The registers are both bit and byte addressable. The floating point unit handles single precision and double precision arithmetic on IEEE format numbers. Each PE slice contains two registers for data memory address, and the data. Each PE also has two one-bit serial ports, one for inbound and one for outbound communication to its nearest neighbor. The direction of communication is controlled globally. The PEs also have inbound and outbound paths to a global router for I/O. A broadcast port allows a single instance of data to be "promoted" to parallel data. Alternately, global data can be 'or-ed' to a scalar result.

The serial links support 1 Mbyte/s bit-serial communication that allows coordinated register-register communication between processors. Each processor has its own local memory, implemented in DRAM. No internal memory is included on the processors. Microcoded instruction decode is used.

The 32 PEs on a chip are clustered into two groups sharing a common memory interface, or M-machine, for access. A global scoreboard keeps track of memory and register usage. The path to memory is 16 bits wide. Both big and little endian formats are supported. Each processor has its own 64 Kbyte of memory. Both direct and indirect data memory addressing are supported.

The chip is implemented in 1.0-micrometre, two-level, metal CMOS, dissipates 0.8 watt, and is packaged in a 208-pin PQFP. A relatively low clock rate of 12.5 MHz is used.

The Maspar machines are front ended by a host machine, usually a VAX. They are accessed by extensions to Fortran and C. Full IEEE single- and double-precision floating point are supported.

There is no cache for the ALUs. Cache is not required, due to the memory interface operating at commensurate speed with the ALU data accesses.

The ALUs do not implement memory management for data memory. The ACU uses demand paged virtual memory for the instruction memory.

As a gimmick MasPar handed out business cards with a MP-2 PE chip laminated to it.

Related Research Articles

A central processing unit (CPU), also called a central processor, main processor, or just processor, is the most important processor in a given computer. Its electronic circuitry executes instructions of a computer program, such as arithmetic, logic, controlling, and input/output (I/O) operations. This role contrasts with that of external components, such as main memory and I/O circuitry, and specialized coprocessors such as graphics processing units (GPUs).

Kendall Square Research (KSR) was a supercomputer company headquartered originally in Kendall Square in Cambridge, Massachusetts in 1986, near Massachusetts Institute of Technology (MIT). It was co-founded by Steven Frank and Henry Burkhardt III, who had formerly helped found Data General and Encore Computer and was one of the original team that designed the PDP-8. KSR produced two models of supercomputer, the KSR1 and KSR2. It went bankrupt in 1994.

In processor design, microcode serves as an intermediary layer situated between the central processing unit (CPU) hardware and the programmer-visible instruction set architecture of a computer, also known as its machine code. It consists of a set of hardware-level instructions that implement the higher-level machine code instructions or control internal finite-state machine sequencing in many digital processing components. While microcode is utilized in Intel and AMD general-purpose CPUs in contemporary desktops and laptops, it functions only as a fallback path for scenarios that the faster hardwired control unit is unable to manage.

In computer science, an instruction set architecture (ISA) is an abstract model that generally defines how software controls the CPU in a computer or a family of computers. A device or program that executes instructions described by that ISA, such as a central processing unit (CPU), is called an implementation of that ISA.

In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set where its instructions are designed to operate efficiently and effectively on large one-dimensional arrays of data called vectors. This is in contrast to scalar processors, whose instructions operate on single data items only, and in contrast to some of those same scalar processors having additional single instruction, multiple data (SIMD) or SIMD within a register (SWAR) Arithmetic Units. Vector processors can greatly improve performance on certain workloads, notably numerical simulation and similar tasks. Vector processing techniques also operate in video-game console hardware and in graphics accelerators.

The Intel i860 is a RISC microprocessor design introduced by Intel in 1989. It is one of Intel's first attempts at an entirely new, high-end instruction set architecture since the failed Intel iAPX 432 from the beginning of the 1980s. It was the world's first million-transistor chip. It was released with considerable fanfare, slightly obscuring the earlier Intel i960, which was successful in some niches of embedded systems. The i860 never achieved commercial success and the project was terminated in the mid-1990s.

In computer architecture, 64-bit integers, memory addresses, or other data units are those that are 64 bits wide. Also, 64-bit central processing units (CPU) and arithmetic logic units (ALU) are those that are based on processor registers, address buses, or data buses of that size. A computer that uses such a processor is a 64-bit computer.

In the history of computer hardware, some early reduced instruction set computer central processing units used a very similar architectural solution, now called a classic RISC pipeline. Those CPUs were: MIPS, SPARC, Motorola 88000, and later the notional CPU DLX invented for education.

Cell is a 64-bit multi-core microprocessor microarchitecture that combines a general-purpose PowerPC core of modest performance with streamlined coprocessing elements which greatly accelerate multimedia and vector processing applications, as well as many other forms of dedicated computation.

The Emotion Engine is a central processing unit developed and manufactured by Sony Computer Entertainment and Toshiba for use in the PlayStation 2 video game console. It was also used in early PlayStation 3 models sold in Japan and North America to provide PlayStation 2 game support. Mass production of the Emotion Engine began in 1999 and ended in late 2012 with the discontinuation of the PlayStation 2.

The POWER1 is a multi-chip CPU developed and fabricated by IBM that implemented the POWER instruction set architecture (ISA). It was originally known as the RISC System/6000 CPU or, when in an abbreviated form, the RS/6000 CPU, before introduction of successors required the original name to be replaced with one that used the same naming scheme (POWERn) as its successors in order to differentiate it from the newer designs.

Multiflow Computer, Inc., founded in April, 1984 near New Haven, Connecticut, USA, was a manufacturer and seller of minisupercomputer hardware and software embodying the VLIW design style. Multiflow, incorporated in Delaware, ended operations in March, 1990, after selling about 125 VLIW minisupercomputers in the United States, Europe, and Japan.

In computer science, stream processing is a programming paradigm which views streams, or sequences of events in time, as the central input and output objects of computation. Stream processing encompasses dataflow programming, reactive programming, and distributed data processing. Stream processing systems aim to expose parallel processing for data streams and rely on streaming algorithms for efficient implementation. The software stack for these systems includes components such as programming models and query languages, for expressing computation; stream management systems, for distribution and scheduling; and hardware components for acceleration including floating-point units, graphics processing units, and field-programmable gate arrays.

Am2900 is a family of integrated circuits (ICs) created in 1975 by Advanced Micro Devices (AMD). They were constructed with bipolar devices, in a bit-slice topology, and were designed to be used as modular components each representing a different aspect of a computer control unit (CCU). By using the bit slicing technique, the Am2900 family was able to implement a CCU with data, addresses, and instructions to be any multiple of 4 bits by multiplying the number of ICs. One major problem with this modular technique was that it required a larger number of ICs to implement what could be done on a single CPU IC. The Am2901 chip included an arithmetic logic unit (ALU) and 16 4-bit processor register slices, and was the "core" of the series. It could count using 4 bits and implement binary operations as well as various bit-shifting operations. The Am2909 was a 4-bit-slice address sequencer that could generate 4-bit addresses on a single chip, and by using n of them, it was able to generate 4n-bit addresses. It had a stack that could store a microprogram counter up to 4 nest levels, as well as a stack pointer.

The Goodyear Massively Parallel Processor (MPP) was a massively parallel processing supercomputer built by Goodyear Aerospace for the NASA Goddard Space Flight Center. It was designed to deliver enormous computational power at lower cost than other existing supercomputer architectures, by using thousands of simple processing elements, rather than one or a few highly complex CPUs. Development of the MPP began circa 1979; it was delivered in May 1983, and was in general use from 1985 until 1991.

The CVAX is a microprocessor chipset developed and fabricated by Digital Equipment Corporation (DEC) that implemented the VAX instruction set architecture (ISA). The chipset consisted of the CVAX 78034 CPU, CFPA floating-point accelerator, CVAX clock chip, and the associated support chips, the CVAX System Support Chip (CSSC), CVAX Memory Controller (CMCTL), and CVAX Q-Bus Interface Chip (CQBIC).

The R4000 is a microprocessor developed by MIPS Computer Systems that implements the MIPS III instruction set architecture (ISA). Officially announced on 1 October 1991, it was one of the first 64-bit microprocessors and the first MIPS III implementation. In the early 1990s, when RISC microprocessors were expected to replace CISC microprocessors such as the Intel i486, the R4000 was selected to be the microprocessor of the Advanced Computing Environment (ACE), an industry standard that intended to define a common RISC platform. ACE ultimately failed for a number of reasons, but the R4000 found success in the workstation and server markets.

Zero ASIC Corporation, formerly Adapteva, Inc., is a fabless semiconductor company focusing on low power many core microprocessor design. The company was the second company to announce a design with 1,000 specialized processing cores on a single integrated circuit.

IBM POWER is a reduced instruction set computer (RISC) instruction set architecture (ISA) developed by IBM. The name is an acronym for Performance Optimization With Enhanced RISC.

The Pixel Visual Core (PVC) is a series of ARM-based system in package (SiP) image processors designed by Google. The PVC is a fully programmable image, vision and AI multi-core domain-specific architecture (DSA) for mobile devices and in future for IoT. It first appeared in the Google Pixel 2 and 2 XL which were introduced on October 19, 2017. It has also appeared in the Google Pixel 3 and 3 XL. Starting with the Pixel 4, this chip was replaced with the Pixel Neural Core.

References

1 2 John Culver. "MasPar: Massively Parallel Computers – 32 cores on a chip".
↑ Bloomberg Businessweek, Company Overview of Neovista Software, Inc.
↑ DSstar Vol. 5 No. 27 (July 3, 2001), JDA Software Buys Accrue Software's NeoVista DM Products Archived 2014-03-16 at the Wayback Machine

External links

Ian Kaplan's history of venture capital

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[culver-1] 1 2 John Culver. "MasPar: Massively Parallel Computers – 32 cores on a chip".

[bloomberg-2] Bloomberg Businessweek, Company Overview of Neovista Software, Inc.

[dsstar-3] DSstar Vol. 5 No. 27 (July 3, 2001), JDA Software Buys Accrue Software's NeoVista DM Products Archived 2014-03-16 at the Wayback Machine

[1]

[2]

[3]