CDC STAR-100

Last updated
CDC STAR-100
CDC STAR-100 - 8MB and 4MB versions.png
Two CDC STAR-100, in 8 MB version (forefront) and 4 MB version (background)
Design
Manufacturer Control Data Corporation
Designer Jim Thornton
Release date1974 (1974) [1]
Casing
DimensionsFull computer approx:
Height: 212 cm (83 in)
Length: 745 cm (293 in)
Internal sections: [2]
Height: 76 in (190 cm)
Width: 28.5 in (72 cm)
Depth: 30 in (76 cm)
Weight2,200 pounds (1,000 kg)
Power250 kW @ 208 V 400 Hz [2]
System
Operating system HELIOS [2]
CPU 64-bit processor @ 25 MHz [1]
Memory Up to 8 megabytes (4 * 4 * 64K x 64 bits) [3]
Storage -
MIPS 1 MIPS (Scalar) [4] [2]
FLOPS 100 MFLOPS (Vector) [1]
Predecessor-
Successor CDC Cyber 200

The CDC STAR-100 is a vector supercomputer that was designed, manufactured, and marketed by Control Data Corporation (CDC). It was one of the first machines to use a vector processor to improve performance on appropriate scientific applications. It was also the first supercomputer to use integrated circuits and the first to be equipped with one million words of computer memory. [5]

Contents

STAR is a blend of STrings (of binary digits) and ARrays. [6] The 100 alludes to the nominal peak processing speed of 100 million floating point operations per second (MFLOPS); [5] the earlier CDC 7600 provided peak performance of 36 MFLOPS but more typically ran at around 10 MFLOPS.

The design was part of a bid made to Lawrence Livermore National Laboratory (LLNL) in the mid-1960s. [5] Livermore was looking for a partner who would build a much faster machine on their own budget and then lease the resulting design to the lab. It was announced publicly in the early 1970s, and on 17 August 1971, CDC announced that General Motors had placed the first commercial order for it.

A number of basic design features of the machine meant that its real-world performance was much lower than expected when first used commercially in 1974, and was one of the primary reasons CDC was pushed from its former dominance in the supercomputer market when the Cray-1 was announced in 1975. Only three STAR-100 systems were delivered, two to LLNL and another to NASA Langley Research Center.

Description

The STAR had a 64-bit architecture, consisting of 195 instructions. [7] Its main innovation was the inclusion of 65 vector instructions for vector processing. The operations performed by these instructions were strongly influenced by concepts and operators from the APL programming language; in particular, the concept of "control vectors" (vector masks in modern terminology), and several instructions for vector permutation with control vectors, were carried over directly from APL. [8] [9]

The vector instructions operated on vectors that were stored in consecutive locations in main memory; memory addressing was virtual. The vector instructions fed an arithmetic pipeline; a single instruction could add two variable-length vectors of up to 65,535 elements with just one instruction fetch. The STAR also fetched vector operands in 512-bit units (superwords), reducing average memory latency.

Since the memory location of the "next" operand is known, the CPU can fetch the next operands while it is operating on the previous ones. As with instruction pipelines in general, the time needed to complete any one instruction was no better than it was before, but since the CPU is working on a number of data points at once, the overall performance dramatically improves.

Many of the STAR's instructions were complex, especially the vector macro instructions, which performed complex operations that normally would have required long sequences of instructions. These instructions, along with the STAR's generally complex architecture, was implemented with microcode. [10]

Main memory had a capacity of 65,536 512-bit words, called superwords (SWORDs). [11] Main memory was 32-way interleaved to pipeline memory accesses. It was constructed from core memory with an access time of 1.28 μs. The main memory was accessed via a 512-bit bus, controlled by the storage access controller (SAC), which handled requests from the stream unit. The stream unit accesses the main memory through the SAC via three 128-bit data buses, two for reads, and one for writes. There is also a 128-bit data bus for instruction fetch, I/O, and control vector access. The stream unit serves as the control unit, fetching and decoding instructions, initiating memory accesses on the behalf of the pipelined functional units, and controlling instruction execution, among other tasks. It also contains two read buffers and one write buffer for streaming data to the execution units. [11]

The STAR-100 has two arithmetic pipelines. The first has a floating point adder and multiplier, and the second can execute all scalar instructions. It also contains a floating point adder, multiplier, and divider. Both pipelines are 64-bit for floating point operations and are controlled by microcode. The STAR-100 can split its floating point pipelines into four 32-bit pipelines, doubling the peak performance of the system to 100 MFLOPS at the expense of half the precision. [11]

The STAR-100 uses I/O processors to offload I/O from the CPU. Each I/O processor is a 16-bit minicomputer with its own main memory of 65,536 words of 16 bits each, which is implemented with core memory. The I/O processors all share a 128-bit data bus to the SAC.

Real-world performance, users and impact

The STAR-100's real-world performance was a fraction of its theoretical performance for a number of reasons. Firstly, the vector instructions, being "memory-to-memory," had a relatively long startup time, since the pipeline from the memory to the functional units was very long. In contrast to the register-based pipelined functional units in the 7600, the STAR pipelines were much deeper. The problem was compounded by the fact that the STAR had a slower cycle time than the 7600 (40 ns vs 27.5 ns). So the vector length needed for the STAR to run faster than the 7600 occurred at about 50 elements; if the loops were working on data sets with fewer elements, the time cost of setting up the vector pipeline was higher than the time savings provided by the vector instruction(s).

When the machine was released in 1974, it quickly became apparent that the general performance was disappointing. Very few programs can be effectively vectorized into a series of single instructions; nearly all calculations will rely on the results of some earlier instruction, yet the results had to clear the pipelines before they could be fed back in. This forced most programs to pay the high setup cost of the vector units, and generally the ones that did "work" were extreme examples. Worse, basic scalar performance was sacrificed to improve vector performance. Any time that the program had to run scalar instructions, the overall performance of the machine dropped dramatically. (See Amdahl's Law.)

Two STAR-100 systems were eventually delivered to the Lawrence Livermore National Laboratory and one to NASA Langley Research Center. [12] In preparation for the STAR deliveries, LLNL programmers developed a library of subroutines, called STACKLIB, on the 7600 to emulate the vector operations of the STAR. In the process of developing STACKLIB, they found that programs converted to use it ran faster than they had before, even on the 7600. This placed further pressures on the performance of the STAR.

The STAR-100 was a disappointment to everyone involved. Jim Thornton, formerly Seymour Cray's close assistant on the CDC 1604 and 6600 projects and the chief designer of STAR, left CDC to form Network Systems Corporation. An updated version of the basic architecture was later released in 1979 as the Cyber 203, [12] followed by the Cyber 205 in 1980, but by this point systems from Cray Research with considerably higher performance were on the market. The failure of the STAR led to CDC being pushed from its former dominance in the supercomputer market, something they tried to address with the formation of ETA Systems in September 1983. [12]

Installations

Five CDC STAR-100s were built. Deliveries started from 1974: [1]

Related Research Articles

<span class="mw-page-title-main">Cray-1</span> Supercomputer manufactured by Cray Research

The Cray-1 was a supercomputer designed, manufactured and marketed by Cray Research. Announced in 1975, the first Cray-1 system was installed at Los Alamos National Laboratory in 1976. Eventually, eighty Cray-1s were sold, making it one of the most successful supercomputers in history. It is perhaps best known for its unique shape, a relatively small C-shaped cabinet with a ring of benches around the outside covering the power supplies and the cooling system.

<span class="mw-page-title-main">Single instruction, multiple data</span> Type of parallel processing

Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal and it can be directly accessible through an instruction set architecture (ISA), but it should not be confused with an ISA. SIMD describes computers with multiple processing elements that perform the same operation on multiple data points simultaneously.

Control Data Corporation (CDC) was a mainframe and supercomputer company that in the 1960s was one of the nine major U.S. computer companies, which group included IBM, the Burroughs Corporation, and the Digital Equipment Corporation (DEC), the NCR Corporation (NCR), General Electric, and Honeywell, RCA and UNIVAC. For most of the 1960s, the strength of CDC was the work of the electrical engineer Seymour Cray who developed a series of fast computers, then considered the fastest computing machines in the world; in the 1970s, Cray left the Control Data Corporation and founded Cray Research (CRI) to design and make supercomputers. In 1988, after much financial loss, the Control Data Corporation began withdrawing from making computers and sold the affiliated companies of CDC; in 1992, Cray established Control Data Systems, Inc. The remaining affiliate companies of CDC currently do business as the software company Dayforce.

In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set where its instructions are designed to operate efficiently and effectively on large one-dimensional arrays of data called vectors. This is in contrast to scalar processors, whose instructions operate on single data items only, and in contrast to some of those same scalar processors having additional single instruction, multiple data (SIMD) or SIMD within a register (SWAR) Arithmetic Units. Vector processors can greatly improve performance on certain workloads, notably numerical simulation and similar tasks. Vector processing techniques also operate in video-game console hardware and in graphics accelerators.

<span class="mw-page-title-main">CDC 6600</span> Mainframe computer by Control Data

The CDC 6600 was the flagship of the 6000 series of mainframe computer systems manufactured by Control Data Corporation. Generally considered to be the first successful supercomputer, it outperformed the industry's prior recordholder, the IBM 7030 Stretch, by a factor of three. With performance of up to three megaFLOPS, the CDC 6600 was the world's fastest computer from 1964 to 1969, when it relinquished that status to its successor, the CDC 7600.

The Advanced Scientific Computer (ASC) is a supercomputer designed and manufactured by Texas Instruments (TI) between 1966 and 1973. The ASC's central processing unit (CPU) supported vector processing, a performance-enhancing technique which was key to its high-performance. The ASC, along with the Control Data Corporation STAR-100 supercomputer, were the first computers to feature vector processing. However, this technique's potential was not fully realized by either the ASC or STAR-100 due to an insufficient understanding of the technique; it was the Cray Research Cray-1 supercomputer, announced in 1975 that would fully realize and popularize vector processing. The more successful implementation of vector processing in the Cray-1 would demarcate the ASC as first-generation vector processors, with the Cray-1 belonging in the second.

<span class="mw-page-title-main">ILLIAC IV</span> First massively parallel computer

The ILLIAC IV was the first massively parallel computer. The system was originally designed to have 256 64-bit floating point units (FPUs) and four central processing units (CPUs) able to process 1 billion operations per second. Due to budget constraints, only a single "quadrant" with 64 FPUs and a single CPU was built. Since the FPUs all processed the same instruction – ADD, SUB etc. – in modern terminology, the design would be considered to be single instruction, multiple data, or SIMD.

In computer engineering, instruction pipelining is a technique for implementing instruction-level parallelism within a single processor. Pipelining attempts to keep every part of the processor busy with some instruction by dividing incoming instructions into a series of sequential steps performed by different processor units with different parts of instructions processed in parallel.

<span class="mw-page-title-main">ETA10</span> 1980s supercomputer

The ETA10 is a vector supercomputer designed, manufactured, and marketed by ETA Systems, a spin-off division of Control Data Corporation (CDC). The ETA10 was an evolution of the CDC Cyber 205, which can trace its origins back to the CDC STAR-100, one of the first vector supercomputers to be developed.

<span class="mw-page-title-main">CDC 7600</span> 1967 supercomputer

The CDC 7600 was designed by Seymour Cray to be the successor to the CDC 6600, extending Control Data's dominance of the supercomputer field into the 1970s. The 7600 ran at 36.4 MHz and had a 65 Kword primary memory using magnetic core and variable-size secondary memory. It was generally about ten times as fast as the CDC 6600 and could deliver about 10 MFLOPS on hand-compiled code, with a peak of 36 MFLOPS. In addition, in benchmark tests in early 1970 it was shown to be slightly faster than its IBM rival, the IBM System/360, Model 195. When the system was released in 1967, it sold for around $5 million in base configurations, and considerably more as options and features were added.

<span class="mw-page-title-main">CDC Cyber</span> Range of mainframe-class supercomputers

The CDC Cyber range of mainframe-class supercomputers were the primary products of Control Data Corporation (CDC) during the 1970s and 1980s. In their day, they were the computer architecture of choice for scientific and mathematically intensive computing. They were used for modeling fluid flow, material science stress analysis, electrochemical machining analysis, probabilistic analysis, energy and academic computing, radiation shielding modeling, and other applications. The lineup also included the Cyber 18 and Cyber 1000 minicomputers. Like their predecessor, the CDC 6600, they were unusual in using the ones' complement binary representation.

<span class="mw-page-title-main">CDC 8600</span>

The CDC 8600 was the last of Seymour Cray's supercomputer designs while he worked for Control Data Corporation. As the natural successor to the CDC 6600 and CDC 7600, the 8600 was intended to be about 10 times as fast as the 7600, already the fastest computer on the market. The design was essentially four 7600's, packed into a very small chassis so they could run at higher clock speeds.

In computer engineering, out-of-order execution is a paradigm used in high-performance central processing units to make use of instruction cycles that would otherwise be wasted. In this paradigm, a processor executes instructions in an order governed by the availability of input data and execution units, rather than by their original order in a program. In doing so, the processor can avoid being idle while waiting for the preceding instruction to complete and can, in the meantime, process the next instructions that are able to run immediately and independently.

The Fujitsu FACOM VP is a series of vector supercomputers designed, manufactured, and marketed by Fujitsu. Announced in July 1982, the FACOM VP were the first of the three initial Japanese commercial supercomputers, followed by the Hitachi HITAC S-810 in August 1982 and the NEC SX-2 in April 1983.

The Cray Time Sharing System, also known in the Cray user community as CTSS, was developed as an operating system for the Cray-1 or Cray X-MP line of supercomputers in 1978. CTSS was developed by the Los Alamos Scientific Laboratory in conjunction with the Lawrence Livermore Laboratory. CTSS was popular with Cray sites in the United States Department of Energy (DOE), but was used by several other Cray sites, such as the San Diego Supercomputing Center.

The R8000 is a microprocessor chipset developed by MIPS Technologies, Inc. (MTI), Toshiba, and Weitek. It was the first implementation of the MIPS IV instruction set architecture. The R8000 is also known as the TFP, for Tremendous Floating-Point, its name during development.

<span class="mw-page-title-main">CDC 6000 series</span> Family of 1960s mainframe computers

The CDC 6000 series is a discontinued family of mainframe computers manufactured by Control Data Corporation in the 1960s. It consisted of the CDC 6200, CDC 6300, CDC 6400, CDC 6500, CDC 6600 and CDC 6700 computers, which were all extremely rapid and efficient for their time. Each is a large, solid-state, general-purpose, digital computer that performs scientific and business data processing as well as multiprogramming, multiprocessing, Remote Job Entry, time-sharing, and data management tasks under the control of the operating system called SCOPE. By 1970 there also was a time-sharing oriented operating system named KRONOS. They were part of the first generation of supercomputers. The 6600 was the flagship of Control Data's 6000 series.

The HITAC S-810 is a family of vector supercomputers developed, manufactured and marketed by Hitachi. The S-810, first announced in August 1982, was the second Japanese supercomputer, following the Fujitsu VP-200 but predating the NEC SX-2. The S-810 was Hitachi's first supercomputer, although the company had previously built a vector processor, the IAP.

Duncan's taxonomy is a classification of computer architectures, proposed by Ralph Duncan in 1990. Duncan suggested modifications to Flynn's taxonomy to include pipelined vector processes.

<span class="mw-page-title-main">History of supercomputing</span>

The history of supercomputing goes back to the 1960s when a series of computers at Control Data Corporation (CDC) were designed by Seymour Cray to use innovative designs and parallelism to achieve superior computational peak performance. The CDC 6600, released in 1964, is generally considered the first supercomputer. However, some earlier computers were considered supercomputers for their day such as the 1954 IBM NORC in the 1950s, and in the early 1960s, the UNIVAC LARC (1960), the IBM 7030 Stretch (1962), and the Manchester Atlas (1962), all of which were of comparable power.

References

  1. 1 2 3 4 LARGE COMPUTER SYSTEMS AND NEW ARCHITECTURES, T. Bloch, CERN, Geneva, Switzerland, November 1978
  2. 1 2 3 4 A Proposal to the Atlas Computer Laboratory for a STAR Computer System, Michael Baylis, Control Data, April 1972
  3. Star-100 Hardware Reference Manual
  4. Whetstone Benchmark History and Results
  5. 1 2 3 MacKenzie, Donald (1998). Knowing Machines: Essays on Technical Change. MIT Press. ISBN   978-0-262-63188-4.
  6. C. J. Purcell (1974). "The Control Data STAR-100 - Performance measurements". AFIPS 1974 International Workshop on Managing Requirements Knowledge. p. 385. doi:10.1109/AFIPS.1974.113. S2CID   43509695.
  7. Hwang, Kai; Briggs, Fayé Alayé (1984). Computer Architecture and Parallel Processing. McGraw-Hill. pp. 234–249.
  8. Hockney, R.W.; Jesshope, C.R. (1981). Parallel Computers: Architecture, Programming and Algorithms. Adam Hilger. p. 15.
  9. Ibbett, R.N; Topham, N.P (1989). Architecture of High Performance Computers, Volume I: Uniprocessors and vector processors. Springer-Verlag. p. 159.
  10. Schneck, P.B. (1987). Supercomputer Architecture. Kluwer Academic. pp. 99–118.
  11. 1 2 3 P.M. Kogge, The Architecture of Pipelined Computers, Taylor & Francis, 1981, pp. 162164.
  12. 1 2 3 R.W. Hockney and C.R. Jesshope, Parallel Computers 2: Architecture, Programming and Algorithms, Adam Hilger, 1988, p. 21.

Further reading