Heterogeneous Element Processor

Last updated
Nameplate from the Denelcor HEP H1000 HEP nameplate.jpg
Nameplate from the Denelcor HEP H1000

The Heterogeneous Element Processor (HEP) was introduced by Denelcor, Inc. in 1982. The HEP's architect was Burton Smith. The machine was designed to solve fluid dynamics problems for the Ballistic Research Laboratory. [1] A HEP system, as the name implies, was pieced together from many heterogeneous components -- processors, data memory modules, and I/O modules. The components were connected via a switched network.

A single processor, called a PEM, in a HEP system (up to sixteen PEMs could be connected) was rather unconventional; via a "program status word (PSW) queue," up to fifty processes could be maintained in hardware at once. The largest system ever delivered had 4 PEMs. The eight-stage instruction pipeline allowed instructions from eight different processes to proceed at once. In fact, only one instruction from a given process was allowed to be present in the pipeline at any point in time. Therefore, the full processor throughput of 10 MIPS could only be achieved when eight or more processes were active; no single process could achieve throughput greater than 1.25 MIPS. This type of multithreading processing classifies the HEP as a barrel processor. The hardware implementation of the HEP PEM was emitter-coupled logic.

Processes were classified as either user-level or supervisor-level. User-level processes could create supervisor-level processes, which were used to manage user-level processes and perform I/O. Processes of the same class were required to be grouped into one of seven user tasks and seven supervisor tasks.

Each processor, in addition to the PSW queue and instruction pipeline, contained instruction memory, 2,048 64-bit general purpose registers and 4,096 constant registers. Constant registers were differentiated by the fact that only supervisor processes could modify their contents. The processors themselves contained no data memory; instead, data memory modules could be separately attached to the switched network.

The HEP memory consisted of completely separate instruction memory (up to 128 MBs) and data memory (up to 1 GB). Users saw 64-bit words, but in reality, data memory words were 72-bit with the extra bits used for state, see next paragraph, parity, tagging, and other uses.

The HEP implemented a type of mutual exclusion in which all registers and locations in data memory had associated "empty" and "full" states. Reading from a location set the state to "empty," while writing to it set the state to "full." A programmer could allow processes to halt after trying to read from an empty location or write to a full location, enforcing critical sections.

The switched network between elements resembled, in many ways, a modern computer network. On the network were sets of nodes, each of which had three links. When a packet arrived at a node, it consulted a routing table and attempted to forward the packet closer to its destination. If a node became congested, any incoming packets were passed on without routing. Packets treated in such a manner had their priority level increased; when several packets vied for a single node, a packet with a higher priority level would be routed before ones with lower priority levels.

Another component of the switched network was the sO System, with its own memory and many individual DEC UNIBUS buses attached for disks and other peripherals. The system also had the ability to save the full/empty bits not normally visible directly. The initial IO System performance was shown to be woefully inadequate due to the high latency in starting the IO operations. Ron Natalie (from BRL) and Burton Smith designed a new system out of spare parts on napkins at a local steakhouse and put it into operation in the course of the ensuing week.

The HEP's primary application programming language was a unique Fortran variant. In time C, Pascal, and SISAL were added. The syntax of data variables using full-empty bits prepended '$' before their name. So 'A' would name a local variable, but $A would be a locking full-empty variable. Application dead-lock was thus possible. Problematic, failure to '$' could introduce unintended numerical inaccuracy.

The first HEP operating system was HEPOS. Mike Muuss was involved in a Unix port for the Ballistic Research Laboratory. HEPOS was not a Unix-like operating system.

Although it was known to have poor cost-performance, the HEP received attention due to what were, at the time, several revolutionary features. The HEP had the performance of a CDC 7600-class computer in the Cray-1 era. HEP systems were acquired by the Ballistic Research Laboratory (four PEM system), Los Alamos, the Argonne National Laboratory (single PEM), the National Security Agency, and Germany's Messerschmitt (three PEMS system. Denelcor also delivered a two PEM system to the University of Georgia in exchange for them providing software assistance (the system had also been offered to the University of Maryland). [2] Messerschmitt was the only client to put the HEP into use for "real" applications; the other clients used it for experimenting with parallel algorithms. The BRL system was used to prepare a movie using the BRL-CAD software as its only real application. Faster and larger designs for HEP-2 and HEP-3 were started but never completed. The architectural concept would later be embodied with the code-name Horizon.

See also

Related Research Articles

<span class="mw-page-title-main">Central processing unit</span> Central computer component which executes instructions

A central processing unit (CPU)—also called a central processor or main processor—is the most important processor in a given computer. Its electronic circuitry executes instructions of a computer program, such as arithmetic, logic, controlling, and input/output (I/O) operations. This role contrasts with that of external components, such as main memory and I/O circuitry, and specialized coprocessors such as graphics processing units (GPUs).

<span class="mw-page-title-main">Cray-1</span> Supercomputer manufactured by Cray Research

The Cray-1 was a supercomputer designed, manufactured and marketed by Cray Research. Announced in 1975, the first Cray-1 system was installed at Los Alamos National Laboratory in 1976. Eventually, eighty Cray-1s were sold, making it one of the most successful supercomputers in history. It is perhaps best known for its unique shape, a relatively small C-shaped cabinet with a ring of benches around the outside covering the power supplies and the cooling system.

<span class="mw-page-title-main">CDC 6600</span>

The CDC 6600 was the flagship of the 6000 series of mainframe computer systems manufactured by Control Data Corporation. Generally considered to be the first successful supercomputer, it outperformed the industry's prior recordholder, the IBM 7030 Stretch, by a factor of three. With performance of up to three megaFLOPS, the CDC 6600 was the world's fastest computer from 1964 to 1969, when it relinquished that status to its successor, the CDC 7600.

<span class="mw-page-title-main">Intel 8051</span> Single chip microcontroller series by Intel

The Intel MCS-51 is a single chip microcontroller (MCU) series developed by Intel in 1980 for use in embedded systems. The architect of the Intel MCS-51 instruction set was John H. Wharton. Intel's original versions were popular in the 1980s and early 1990s, and enhanced binary compatible derivatives remain popular today. It is a complex instruction set computer, but also has some of the features of RISC architectures, such as a large register set and register windows, and has separate memory spaces for program instructions and data.

<span class="mw-page-title-main">Parallel computing</span> Programming paradigm in which many processes are executed simultaneously

Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different forms of parallel computing: bit-level, instruction-level, data, and task parallelism. Parallelism has long been employed in high-performance computing, but has gained broader interest due to the physical constraints preventing frequency scaling. As power consumption by computers has become a concern in recent years, parallel computing has become the dominant paradigm in computer architecture, mainly in the form of multi-core processors.

General Comprehensive Operating System is a family of operating systems oriented toward the 36-bit GE/Honeywell mainframe computers.

The Cyclone, was a vacuum tube computer, built by Iowa State College at Ames, Iowa. The computer was commissioned in July 1959. It was based on the IAS architecture developed by John von Neumann. The prototype was ILLIAC, the University of Illinois Automatic Computer. The Cyclone used 40-bit words, used two 20-bit instructions per word, and each instruction had an eight-bit op-code and a 12-bit operand or address field. In general IAS-based computers were not code compatible with each other, although originally math routines which ran on the ILLIAC would also run on the Cyclone.

<span class="mw-page-title-main">ILLIAC IV</span> First massively parallel computer

The ILLIAC IV was the first massively parallel computer. The system was originally designed to have 256 64-bit floating point units (FPUs) and four central processing units (CPUs) able to process 1 billion operations per second. Due to budget constraints, only a single "quadrant" with 64 FPUs and a single CPU was built. Since the FPUs all had to process the same instruction – ADD, SUB etc. – in modern terminology the design would be considered to be single instruction, multiple data, or SIMD.

Flynn's taxonomy is a classification of computer architectures, proposed by Michael J. Flynn in 1966 and extended in 1972. The classification system has stuck, and it has been used as a tool in the design of modern processors and their functionalities. Since the rise of multiprocessing central processing units (CPUs), a multiprogramming context has evolved as an extension of the classification system. Vector processing, covered by Duncan's taxonomy, is missing from Flynn's work because the Cray-1 was released in 1977: Flynn's second paper was published in 1972.

A wait state is a delay experienced by a computer processor when accessing external memory or another device that is slow to respond.

<span class="mw-page-title-main">Contiki</span> Real-time operating system

Contiki is an operating system for networked, memory-constrained systems with a focus on low-power wireless Internet of Things (IoT) devices. Contiki is used for systems for street lighting, sound monitoring for smart cities, radiation monitoring and alarms. It is open-source software released under the BSD-3-Clause license.

<span class="mw-page-title-main">Network processor</span>

A network processor is an integrated circuit which has a feature set specifically targeted at the networking application domain.

<span class="mw-page-title-main">Clipper architecture</span>

The Clipper architecture is a 32-bit RISC-like instruction set architecture designed by Fairchild Semiconductor. The architecture never enjoyed much market success, and the only computer manufacturers to create major product lines using Clipper processors were Intergraph and High Level Hardware, although Opus Systems offered a product based on the Clipper as part of its Personal Mainframe range. The first processors using the Clipper architecture were designed and sold by Fairchild, but the division responsible for them was subsequently sold to Intergraph in 1987; Intergraph continued work on Clipper processors for use in its own systems.

<span class="mw-page-title-main">Cray XMT</span>

Cray XMT is a scalable multithreaded shared memory supercomputer architecture by Cray, based on the third generation of the Tera MTA architecture, targeted at large graph problems. Presented in 2005, it supersedes the earlier unsuccessful Cray MTA-2. It uses the Threadstorm3 CPUs inside Cray XT3 blades. Designed to make use of commodity parts and existing subsystems for other commercial systems, it alleviated the shortcomings of Cray MTA-2's high cost of fully custom manufacture and support. It brought various substantial improvements over Cray MTA-2, most notably nearly tripling the peak performance, and vastly increased maximum CPU count to 8,192 and maximum memory to 128 TB, with a data TLB of maximal 512 TB.

<span class="mw-page-title-main">GEC 4000 series</span> Series of 16/32-bit minicomputers

The GEC 4000 was a series of 16/32-bit minicomputers produced by GEC Computers Ltd in the United Kingdom during the 1970s, 1980s and early 1990s.

<span class="mw-page-title-main">Multithreading (computer architecture)</span> Ability of a CPU to provide multiple threads of execution concurrently

In computer architecture, multithreading is the ability of a central processing unit (CPU) to provide multiple threads of execution concurrently, supported by the operating system. This approach differs from multiprocessing. In a multithreaded application, the threads share the resources of a single or multiple cores, which include the computing units, the CPU caches, and the translation lookaside buffer (TLB).

The Berkeley Packet Filter (BPF) is a technology used in certain computer operating systems for programs that need to, among other things, analyze network traffic. It provides a raw interface to data link layers, permitting raw link-layer packets to be sent and received. In addition, if the driver for the network interface supports promiscuous mode, it allows the interface to be put into that mode so that all packets on the network can be received, even those destined to other hosts.

The PDP-11 architecture is a CISC instruction set architecture (ISA) developed by Digital Equipment Corporation (DEC). It is implemented by central processing units (CPUs) and microprocessors used in PDP-11 minicomputers. It was in wide use during the 1970s, but was eventually overshadowed by the more powerful VAX architecture in the 1980s.

The IBM System/360 architecture is the model independent architecture for the entire S/360 line of mainframe computers, including but not limited to the instruction set architecture. The elements of the architecture are documented in the IBM System/360 Principles of Operation and the IBM System/360 I/O Interface Channel to Control Unit Original Equipment Manufacturers' Information manuals.

The Bellmac 32 is a microprocessor developed by Bell Labs' processor division in 1980, implemented using CMOS technology and was the first microprocessor that could move 32 bits in one clock cycle. The microprocessor contains 150,000 transistors and improved on the speed of CMOS design by using "domino circuits". It was designed with the C programming language in mind. After its creation, an improved version was produced called the Bellmac 32A, then cancelled along with its successor, the "Hobbit" C-language Reduced Instruction Set Processor (CRISP).

References

  1. "The History of Computing at BRL".
  2. Padua, David (2011). Encyclopedia of Parallel Computing, Volume 4. New York, NY: Springer Verlag.