ILLIAC IV

Last updated
ILLIAC IV parallel computer's Control Unit (CU) ILLIAC 4 parallel computer.jpg
ILLIAC IV parallel computer's Control Unit (CU)

The ILLIAC IV was the first massively parallel computer. [1] The system was originally designed to have 256 64-bit floating point units (FPUs) and four central processing units (CPUs) able to process 1 billion operations per second. [2] Due to budget constraints, only a single "quadrant" with 64 FPUs and a single CPU was built. Since the FPUs all processed the same instruction – ADD, SUB etc. – in modern terminology, the design would be considered to be single instruction, multiple data, or SIMD. [3]

Contents

The concept of building a computer using an array of processors came to Daniel Slotnick while working as a programmer on the IAS machine in 1952. A formal design did not start until 1960, when Slotnick was working at Westinghouse Electric and arranged development funding under a US Air Force contract. When that funding ended in 1964, Slotnick moved to the University of Illinois Urbana–Champaign and joined the Illinois Automatic Computer (ILLIAC) team. With funding from Advanced Research Projects Agency (ARPA), they began the design of a newer concept with 256 64-bit processors instead of the original concept with 1,024 1-bit processors.

While the machine was being assembled by Burroughs, the university began building a new facility to house it. Political tension over the funding from the US Department of Defense led to the ARPA and the university fearing for the machine's safety. When the first 64-processor quadrant of the machine was completed in 1972, it was sent to the NASA Ames Research Center in California. After three years of thorough modification to fix various flaws, ILLIAC IV was connected to the ARPANET for distributed use in November 1975, becoming the first network-available supercomputer, beating the Cray-1 by nearly 12 months.

Running at half its design speed, the one-quadrant ILLIAC IV delivered 50 MFLOP peak, [4] making it the fastest computer in the world at that time. It is also credited with being the first large computer to use solid-state memory, as well as the most complex computer built to that date, with over 1 million gates. [5] Generally considered a failure due to massive budget overruns, [5] [6] the design was instrumental in the development of new techniques and systems for programming parallel systems. In the 1980s, several machines based on ILLIAC IV concepts were successfully delivered.

History

Origins

In June 1952, Daniel Slotnick began working on the IAS machine at the Institute for Advanced Study (IAS) at Princeton University. [7] The IAS machine featured a bit-parallel math unit that operated on 40-bit words. [8] Originally equipped with Williams tube memory, a magnetic drum from Engineering Research Associates was later added. This drum had 80 tracks so two words could be read at a time, and each track stored 1,024 bits. [9]

While contemplating the drum's mechanism, Slotnik began to wonder if that was the correct way to build a computer. If the bits of a word were written serially to a single track, instead of in parallel across 40 tracks, then the data could be fed into a bit-serial computer directly from the drum bit-by-bit. The drum would still have multiple tracks and heads, but instead of gathering up a word and sending it to a single ALU, in this concept the data on each track would be read a bit at a time and sent into parallel ALUs. This would be a word-parallel, bit-serial computer. [7]

Slotnick raised the idea at the IAS, but John von Neumann dismissed it as requiring "too many tubes". [7] Slotnick left the IAS in February 1954 to return to school for his PhD and the matter was forgotten. [7]

SOLOMON

After completing his PhD and some post-doc work, Slotnick ended up at IBM. By this time, for scientific computing at least, tubes and drums had been replaced with transistors and magnetic-core memory. The idea of parallel processors working on different streams of data from a drum no longer had the same obvious appeal. Nevertheless, further consideration showed that parallel machines could still offer significant performance in some applications; Slotnick and a colleague, John Cocke (better known as the inventor of RISC), wrote a paper on the concept in 1958. [10]

After a short time at IBM and then another at Aeronca Aircraft, Slotnick ended up at Westinghouse's Air Arm division, which worked on radar and similar systems. [11] Under a contract from the US Air Force's RADC, Slotnik was able to build a team to design a system with 1,024 bit-serial ALUs, known as "processing elements" or PE's. This design was given the name SOLOMON, after King Solomon, who was both very wise and had 1,000 wives. [12]

The PE's would be fed instructions from a single master central processing unit (CPU), the "control unit" or CU. SOLOMON's CU would read instructions from memory, decode them, and then hand them off to the PE's for processing. Each PE had its own memory for holding operands and results, the PE Memory module, or PEM. The CU could access the entire memory via a dedicated memory bus, whereas the PE's could only access their own PEM. [13] To allow results from one PE to be used as inputs in another, a separate network connected each PE to its eight closest neighbours. [14]

Several testbed systems were constructed, including a 3-by-3 (9 PE) system and a 10-by-10 model with simplified PEs. During this period, some consideration was given to more complex PE designs, becoming a 24-bit parallel system that would be organized in a 256-by-32 arrangement. A single PE using this design was built in 1963. As the design work continued, the primary sponsor within the US Department of Defense was killed in an accident and no further funding was forthcoming. [15]

Looking to continue development, Slotnik approached Livermore, who at that time had been at the forefront of supercomputer purchases. They were very interested in the design but convinced him to upgrade the current design's fixed-point math units to true floating point, which resulted in the SOLOMON.2 design. [16]

Livermore would not fund development, instead, they offered a contract in which they would lease the machine once it was completed. Westinghouse management considered it too risky, and shut down the team. Slotnik left Westinghouse attempting to find venture capital to continue the project, but failed. Livermore would later select the CDC STAR-100 for this role, as CDC was willing to take on the development costs. [17]

ILLIAC IV

When SOLOMON ended, Slotnick joined the Illinois Automatic Computer design (ILLIAC) team at the University of Illinois at Urbana-Champaign. Illinois had been designing and building large computers for the U.S. Department of Defense and the Advanced Research Projects Agency (ARPA) since 1949. In 1964 the university signed a contract with ARPA to fund the effort, which became known as ILLIAC IV, since it was the fourth computer designed and created at the university. Development started in 1965, and a first-pass design was completed in 1966. [18]

In contrast to the bit-serial concept of SOLOMON, in ILLIAC IV the PE's were upgraded to be full 64-bit (bit-parallel) processors, using 12,000 gates and 2048-words of thin-film memory. [19] The PEs had five 64-bit registers, each with a special purpose. One of these, RGR, was used for communicating data to neighbouring PEs, moving one "hop" per clock cycle. Another register, RGD, indicated whether or not that PE was currently active. "Inactive" PEs could not access memory, but they would pass results to neighbouring PEs using the RGR. [14] The PEs were designed to work as a single 64-bit FPU, two 32-bit half-precision FPUs, or eight 8-bit fixed-point processors. [19]

Instead of 1,024 PEs and a single CU, the new design had a total of 256 PEs arranged into four 64-PE "quadrants", each with its own CU. The CU's were also 64-bit designs, with sixty-four 64-bit registers and another four 64-bit accumulators. The system could run as four separate 64-PE machines, two 128-PE machines, or a single 256-PE machine. This allowed the system to work on different problems when the data was too small to demand the entire 256-PE array. [19]

Based on a 25 MHz clock, with all 256-PEs running on a single program, the machine was designed to deliver 1 billion floating point operations per second, or in today's terminology, 1  GFLOPS. [20] This made it much faster than any machine in the world; the contemporary CDC 7600 had a clock cycle of 27.5 nanoseconds, or 36 MIPS, [21] although for a variety of reasons it generally offered performance closer to 10 MIPS. [22] [lower-alpha 1]

To support the machine, an extension to the Digital Computer Laboratory buildings were constructed. [23] [24] Sample work at the university was primarily aimed at ways to efficiently fill the PEs with data, thus conducting the first "stress test" in computer development. In order to make this as easy as possible, several new computer languages were created; IVTRAN and TRANQUIL were parallelized versions of FORTRAN, and Glypnir was a similar conversion of ALGOL. Generally, these languages provided support for loading arrays of data "across" the PEs to be executed in parallel, and some even supported the unwinding of loops into array operations. [25]

Construction, problems

In early 1966, the university sent out a request for proposals looking for industrial partners interested in building the design. Seventeen responses were received in July, seven responded, and of these three were selected. [26] Several of the responses, including Control Data, attempted to interest them in a vector processor design instead, but as these were already being designed the team was not interested in building another. In August 1966, [lower-alpha 2] eight-month contracts were offered to RCA, Burroughs, and Univac to bid on the construction of the machine. [19]

Burroughs eventually won the contract, having teamed up with Texas Instruments (TI). Both offered new technical advances that made their bid the most interesting. Burroughs was offering to build a new and much faster version of thin-film memory which would improve performance. TI was offering to build 64-pin emitter-coupled logic (ECL) integrated circuits (ICs) with 20 logic gates each. [lower-alpha 3] At the time, most ICs used 16-pin packages and had between 4 and 7 gates. Using TI's ICs would make the system much smaller. [19]

Burroughs also supplied the specialized disk drives, which featured a separate stationary head for every track and could offer speeds up to 500 Mbit/s and stored about 80 MB per 36" disk. They would also provide a Burroughs B6500 mainframe to act as a front-end controller, loading data from secondary storage and performing other housekeeping tasks. Connected to the B6500 was a 3rd party laser optical recording medium, a write-once system that stored up to 1  Tbit on thin metal film coated on a strip of polyester sheet carried by a rotating drum. Construction of the new design began at Burroughs' Great Valley Lab. [13] At the time, it was estimated the machine would be delivered in early 1970. [27]

After a year of working on the ICs, TI announced they had failed to be able to build the 64-pin designs. The more complex internal wiring was causing crosstalk in the circuitry, and they asked for another year to fix the problems. Instead, the ILLIAC team chose to redesign the machine based on available 16-pin ICs. This required the system to run slower, using a 16 MHz clock instead of the original 25 MHz. [28] The change from 64-pin to 16-pin cost the project about two years, and millions of dollars. TI was able to get the 64-pin design working after just over another year, and began offering them on the market before ILLIAC was complete. [28]

As a result of this change, the individual PC boards grew about 1 inch (2.5 cm) square to about 6 by 10 inches (15 cm × 25 cm). This doomed Burroughs' efforts to produce a thin-film memory for the machine, because there was now no longer enough space for the memory to fit within the design's cabinets. Attempts to increase the size of the cabinets to make room for the memory caused serious problems with signal propagation. [29] Slotnick surveyed the potential replacements and picked a semiconductor memory from Fairchild Semiconductor, a decision that was so opposed by Burroughs that a full review by ARPA followed. [19]

In 1969, these problems, combined with the resulting cost overruns from the delays, led to the decision to build only a single 64-PE quadrant, [19] thereby limiting the machine's speed to about 200 MFLOPS. [30] Together, these changes cost the project three years and $6 million. [19] By 1969, the project was spending $1 million a month, and had to be spun out of the original ILLIAC team who were becoming increasingly vocal in their opposition to the project. [31]

Move to Ames

By 1970, the machine was finally being built at a reasonable rate and it was being readied for delivery in about a year. On 6 January 1970, The Daily Illini , the student newspaper, claimed that the computer would be used to design nuclear weapons. [32] In May, the Kent State shootings took place, and anti-war violence erupted across university campuses. [31]

Slotnick grew to be opposed to the use of the machine on classified research, and announced that as long as it was on the university grounds that all processing that took place on the machine would be publicly released. He also grew increasingly concerned that the machine would be subject to attack by the more radical student groups. [31] a position that seemed wise after the local students joined the 9 May 1970 nationwide student strike by declaring a "day of Illiaction", [33] and especially the 24 August bombing of the mathematics building at the University of Wisconsin–Madison. [34]

With the help of Hans Mark, the director of the NASA Ames Research Center in what was becoming Silicon Valley, in January 1971 the decision was made to deliver the machine to Ames rather than the university. Located on an active US Navy base and protected by the U.S. Marines, security would no longer be a concern. The machine was finally delivered to Ames in April 1972, and installed in the Central Computer Facility in building N-233. [35] By this point it was several years late and well over budget at a total price of $31 million, almost four times the original estimate of $8 million for the complete 256-PE machine. [31] [2] [lower-alpha 4] [lower-alpha 5]

NASA also decided to replace the B6500 front-end machine with a PDP-10, which were in common use at Ames and would make it much easier to connect to the ARPAnet. [36] This required the development of new software, especially compilers, on the PDP-10. This caused further delays in bringing the machine online. [31]

The Illiac IV was contracted to be managed by ACTS Computing Corporation headquartered in Southfield, MI, a Timesharing and Remote Job Entry (RJE) company that had recently been acquired by the conglomerate, Lear Siegler Corporation. The DoD contracted with ACTS under a cost plus 10% contract. This unusual arrangement was due to the constraint that no government employee could be paid more than a Congress person and many Illiac IV personnel made more than that limit. Dr. Mel Pirtle, with a background from the University of California, Berkeley and the Berkeley Computer Corporation (BCC) was engaged as the Illiac IV's director.

Making it work

ILLIAC IV Processing Unit on display in Computer History Museum ILLIAC IV Processing Unit.JPG
ILLIAC IV Processing Unit on display in Computer History Museum

When the machine first arrived, it could not be made to work. It suffered from all sorts of problems from cracking PCBs, to bad resistors, to the packaging of the TI ICs being highly sensitive to humidity. These issues were slowly addressed, and by the summer of 1973 the first programs were able to be run on the system although the results were highly questionable. Starting in June 1975, a concerted four-month effort began that required, among other changes, replacing 110,000 resistors, rewiring parts to fix propagation delay issues, improving filtering in the power supplies, and a further reduction in clock speed to 13 MHz. At the end of this process, the system was finally working properly. [31] [2]

From then on, the system ran Monday morning to Friday afternoon, providing 60 hours of up-time for the users, but requiring 44 hours of scheduled downtime. [2] Nevertheless, it was increasingly used as NASA programmers learned ways to get performance out of the complex system. At first, performance was dismal, with most programs running at about 15 MFLOPS, about three times the average for the CDC 7600. [37] Over time this improved, notably after Ames programmers wrote their own version of FORTRAN, CFD, and learned how to parallel I/O into the limited PEMs. On problems that could be parallelized the machine was still the fastest in the world, outperforming the CDC 7600 by two to six times, and it is generally credited as the fastest machine in the world until 1981. [31]

On 7 September 1981, after nearly 10 years of operation, the ILLIAC IV was turned off. [38] The machine was officially decommissioned in 1982, and NASA's advanced computing division ended with it. One control unit and one processing element chassis from the machine is now on display at the Computer History Museum in Mountain View, less than a mile from its operational site. [39]

Aftermath

ILLIAC was very late, very expensive, and never met its goal of producing 1 GFLOP. It was widely considered a failure even by those who worked on it; one stated simply that "any impartial observer has to regard Illiac IV as a failure in a technical sense." [40] In terms of project management it is widely regarded as a failure, running over its cost estimates by four times and requiring years of remedial efforts to make it work. As Slotnik himself later put it:

I'm bitterly disappointed, and very pleased... delighted and dismayed. Delighted that the overall objectives came out well in the end. Dismayed that it cost too much, took too long, doesn't do enough, and not enough people are using it. [41]

However, later analyses note that the project had several long-lasting effects on the computer market as a whole, both intentionally and unintentionally. [42]

Among the indirect effects was the rapid update of semiconductor memory after the ILLIAC project. Slotnick received a lot of criticism when he chose Fairchild Semiconductor to produce the memory ICs, as at the time the production line was an empty room and the design existed only on paper. [43] However, after three months of intense effort, Fairchild had a working design being produced en masse. As Slotnick would later comment, "Fairchild did a magnificent job of pulling our chestnuts out of the fire. The Fairchild memories were superb and their reliability to this day is just incredibly good." [29] ILLIAC is considered to have dealt a death blow to magnetic-core memory and related systems like thin-film. [29]

Another indirect effect was caused by the complexity of the printed circuit boards (PCBs), or modules. At the original 25 MHz design speed, impedance in the ground wiring proved to be a serious problem, demanding that the PCBs be as small as possible. As their complexity grew, the PCBs had to add more and more layers in order to avoid growing larger. Eventually, they reached 15-layers deep, which proved to be well beyond the capabilities of draftsmen. The design was ultimately completed using new automated design tools provided by a subcontractor, and the complete design required two years of computer time on a Burroughs mainframe. This was a major step forward in computer aided design, and by the mid-1970s such tools were commonplace. [44]

ILLIAC also led to major research into the topic of parallel processing that had wide-ranging effects. During the 1980s, with the price of microprocessors falling according to Moore's Law, a number of companies created MIMD (Multiple Instruction, Multiple Data) to build even more parallel machines, with compilers that could make better use of the parallelism. The Thinking Machines CM-5 is an excellent example of the MIMD concept. It was the better understanding of parallelism on ILLIAC that led to the improved compilers and programs that could take advantage of these designs. As one ILLIAC programmer put it, "If anybody builds a fast computer out of a lot of microprocessors, Illiac IV will have done its bit in the broad scheme of things." [45]

Most supercomputers of the era took another approach to higher performance, using a single very-high-speed vector processor. Similar to the ILLIAC in some ways, these processor designs loaded up many data elements into a single custom processor instead of a large number of specialized ones. The classic example of this design is the Cray-1, which had performance similar to the ILLIAC. There was more than a little "backlash" against the ILLIAC design as a result, and for some time the supercomputer market looked on massively parallel designs with disdain, even when they were successful. As Seymour Cray famously quipped, "If you were plowing a field, which would you rather use? Two strong oxen or 1024 chickens?" [46]

Description

Physical arrangement

Each quadrant of the machine was 10 feet (3 m) high, 8 feet (2.4 m) deep and 50 feet (15 m) long. [47] Arranged beside the quadrant was its input/output (I/O) system, whose disk system stored 2.5  GiB and could read and write data at 1 billion  bits per second, along with the B6700 computer that connected to the machine through the same 1,024-bit-wide interface as the disk system. [48]

The machine consisted of a series of carrier chassis holding a number of the small modules. The majority of these were the Processing Units (PUs), which contained the modules for a single PE, its PEM, and the Memory Logic Unit that handled address translation and I/O. The PUs were identical, so they could be replaced or reordered as required. [49]

Processor details

Each CU had about 30 to 40,000 gates. [50] The CU had sixteen 64-bit registers and a separate sixty-four slot 64-bit "scratchpad", LDB. There were four accumulators, AC0 through AC3, a program counter ILR, and various control registers. The system had a short instruction pipeline and implemented instruction look ahead. [51]

The PEs had about 12,000 gates. [50] It included four 64-bit registers, using an accumulator A, an operand buffer B and a secondary scratchpad S. The fourth, R, was used to broadcast or receive data from the other PEs. [52] The PEs used a carry-lookahead adder, a leading-one detector for boolean operations, and a barrel shifter. 64-bit additions took about 200 ns and multiplications about 400 ns. The PE's were connected to a private memory bank, the PEM, which held 2,048 64-bit words. Access time was on the order of 250 ns [53] The PEs used a load/store architecture. [54]

The instruction set (ISA) contained two separate sets of instructions, one for the CU (or a unit within it, ADVAST) and another for the PEs. Instructions for the PEs were not decoded, and instead sent directly to the FINST register to be sent to the PEs to process. The ADVAST instructions were decoded and entered the CU's processing pipeline. [55]

Logical arrangement

Each quadrant contained 64 PEs and one CU. The CU had access to the entire I/O bus and could address all of the machine's memory. The PEs could only access their own local store, the PEM, of 2,048 64-bit words. Both the PEs and CU could use load and store operations to access the disk system. [48]

The cabinets were so large that it required 240  ns for signals to travel from one end to the other. For this reason, the CU could not be used to coordinate actions, instead, the entire system was clock-synchronous with all operations in the PEs guaranteed to take the same amount of time no matter what the operands were. That way the CU could be sure that the operations were complete without having to wait for results or status codes. [47]

To improve the performance of operations that required the output of one PE's results to be used as the input to another PE, the PEs were connected directly to their neighbours, as well as the ones eight-steps away - for instance, PE1 was directly connected to PE0 and PE2, as well as PE9 and PE45. The eight-away connections allowed faster transport when the data needed to travel between more distant PEs. [48] Each shift of data moved 64-words in a single 125 ns clock cycle. [47]

The system used a one-address format, in which the instructions included the address of one of the operands and the other operand was in the PE's accumulator (the A register). The address was sent to the PE's over a separate "broadcast" bus. Depending on the instruction, the value on the bus might refer to a memory location in the PE's PEM, a value in one of the PE registers, or a numeric constant. [56]

Since each PE had its own memory, while the instruction format and the CUs saw the entire address space, the system included an index register (X) to offset the base address. This allowed, for example, the same instruction stream to work on data that was not aligned in the same locations in different PEs. The common example would be an array of data that was loaded into different locations in the PEMs, which could then be made uniform by setting the index in the different PEs. [56]

Branches

In traditional computer designs, instructions are loaded into the CPU one at a time as they are read from memory. Normally, when the CPU completes processing an instruction, the program counter (PC) is incremented by one word and the next instruction is read. This process is interrupted by branches, which causes the PC to jump to one of two locations depending on a test, like whether a given memory address holds a non-zero value. In the ILLIAC design, each PE would be applying this test to different values, and thus have different outcomes. Since those values are private to the PE, the following instructions would need to be loaded based on a value only the PE knew. [57]

To avoid the delays reloading the PE instructions would cause, the ILLIAC loaded the PEMs with the instructions on both sides of the branch. Logical tests did not change the PC, instead, they set "mode bits" that told the PE whether or not to run the next arithmetic instruction. To use this system, the program would be written so that one of the two possible instruction streams followed the test, and ended with an instruction to invert the bits. Code for the second branch would then follow, ending with an instruction to set all the bits to 1. [57]

If the test selected the "first" branch, that PE would continue on as normal. When it reached the end of that code, the mode operator instruction would flip the mode bits, and from then on that PE would ignore further instructions. This would continue until it reached the end of the code for the second branch, where the mode reset instruction would turn the PE back on. If a particular PE's test resulted in the second branch being taken, it would instead set the mode bits to ignore further instructions until it reached the end of the first branch, where the mode operator would flip the bits and cause the second branch to begin processing, once again turning them all on at the end of that branch. [57]

Since the PEs can operate in 64-, 32- and 8-bit modes, the mode flags had multiple bits so the individual words could be turned on or off. For instance, in the case when the PE was operating in 32-bit mode, one "side" of the PE might have the test come out true while the other side was false. [57]

Terminology

See also

Notes

  1. Note that the term "FLOP" was not widely used at this time, MIPS and FLOPS were synonymous.
  2. Chen says July. [26]
  3. Later known as medium scale integration.
  4. Slotnick, and others, have claimed the original $8 million estimate was an ad hoc number that was the same as the purse in the Clay-Liston fight. [2]
  5. It was being developed during a period of historically high inflation rates, and at least some of the increase in the price is attributable to those increases. [2]

Related Research Articles

<span class="mw-page-title-main">Burroughs Corporation</span> American computer company

The Burroughs Corporation was a major American manufacturer of business equipment. The company was founded in 1886 as the American Arithmometer Company by William Seward Burroughs. In 1986, it merged with Sperry UNIVAC to form Unisys. The company's history paralleled many of the major developments in computing. At its start, it produced mechanical adding machines, and later moved into programmable ledgers and then computers. It was one of the largest producers of mainframe computers in the world, also producing related equipment including typewriters and printers.

<span class="mw-page-title-main">Cray-1</span> Supercomputer manufactured by Cray Research

The Cray-1 was a supercomputer designed, manufactured and marketed by Cray Research. Announced in 1975, the first Cray-1 system was installed at Los Alamos National Laboratory in 1976. Eventually, eighty Cray-1s were sold, making it one of the most successful supercomputers in history. It is perhaps best known for its unique shape, a relatively small C-shaped cabinet with a ring of benches around the outside covering the power supplies and the cooling system.

In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set where its instructions are designed to operate efficiently and effectively on large one-dimensional arrays of data called vectors. This is in contrast to scalar processors, whose instructions operate on single data items only, and in contrast to some of those same scalar processors having additional single instruction, multiple data (SIMD) or SWAR Arithmetic Units. Vector processors can greatly improve performance on certain workloads, notably numerical simulation and similar tasks. Vector processing techniques also operate in video-game console hardware and in graphics accelerators.

<span class="mw-page-title-main">CDC 6600</span> Mainframe computer by Control Data

The CDC 6600 was the flagship of the 6000 series of mainframe computer systems manufactured by Control Data Corporation. Generally considered to be the first successful supercomputer, it outperformed the industry's prior recordholder, the IBM 7030 Stretch, by a factor of three. With performance of up to three megaFLOPS, the CDC 6600 was the world's fastest computer from 1964 to 1969, when it relinquished that status to its successor, the CDC 7600.

<span class="mw-page-title-main">Parallel computing</span> Programming paradigm in which many processes are executed simultaneously

Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different forms of parallel computing: bit-level, instruction-level, data, and task parallelism. Parallelism has long been employed in high-performance computing, but has gained broader interest due to the physical constraints preventing frequency scaling. As power consumption by computers has become a concern in recent years, parallel computing has become the dominant paradigm in computer architecture, mainly in the form of multi-core processors.

<span class="mw-page-title-main">IBM 7030 Stretch</span> First IBM supercomputer using dedicated transistors

The IBM 7030, also known as Stretch, was IBM's first transistorized supercomputer. It was the fastest computer in the world from 1961 until the first CDC 6600 became operational in 1964.

The Advanced Scientific Computer (ASC) is a supercomputer designed and manufactured by Texas Instruments (TI) between 1966 and 1973. The ASC's central processing unit (CPU) supported vector processing, a performance-enhancing technique which was key to its high-performance. The ASC, along with the Control Data Corporation STAR-100 supercomputer, were the first computers to feature vector processing. However, this technique's potential was not fully realized by either the ASC or STAR-100 due to an insufficient understanding of the technique; it was the Cray Research Cray-1 supercomputer, announced in 1975 that would fully realize and popularize vector processing. The more successful implementation of vector processing in the Cray-1 would demarcate the ASC as first-generation vector processors, with the Cray-1 belonging in the second.

Flynn's taxonomy is a classification of computer architectures, proposed by Michael J. Flynn in 1966 and extended in 1972. The classification system has stuck, and it has been used as a tool in the design of modern processors and their functionalities. Since the rise of multiprocessing central processing units (CPUs), a multiprogramming context has evolved as an extension of the classification system. Vector processing, covered by Duncan's taxonomy, is missing from Flynn's work because the Cray-1 was released in 1977: Flynn's second paper was published in 1972.

<span class="mw-page-title-main">History of computing hardware (1960s–present)</span>

The history of computing hardware starting at 1960 is marked by the conversion from vacuum tube to solid-state devices such as transistors and then integrated circuit (IC) chips. Around 1953 to 1959, discrete transistors started being considered sufficiently reliable and economical that they made further vacuum tube computers uncompetitive. Metal–oxide–semiconductor (MOS) large-scale integration (LSI) technology subsequently led to the development of semiconductor memory in the mid-to-late 1960s and then the microprocessor in the early 1970s. This led to primary computer memory moving away from magnetic-core memory devices to solid-state static and dynamic semiconductor memory, which greatly reduced the cost, size, and power consumption of computers. These advances led to the miniaturized personal computer (PC) in the 1970s, starting with home computers and desktop computers, followed by laptops and then mobile computers over the next several decades.

The Burroughs Large Systems Group produced a family of large 48-bit mainframes using stack machine instruction sets with dense syllables. The first machine in the family was the B5000 in 1961, which was optimized for compiling ALGOL 60 programs extremely well, using single-pass compilers. The B5000 evolved into the B5500 and the B5700. Subsequent major redesigns include the B6500/B6700 line and its successors, as well as the separate B8500 line.

The Heterogeneous Element Processor (HEP) was introduced by Denelcor, Inc. in 1982. The HEP's architect was Burton Smith. The machine was designed to solve fluid dynamics problems for the Ballistic Research Laboratory. A HEP system, as the name implies, was pieced together from many heterogeneous components -- processors, data memory modules, and I/O modules. The components were connected via a switched network.

<span class="mw-page-title-main">CDC 8600</span>

The CDC 8600 was the last of Seymour Cray's supercomputer designs while he worked for Control Data Corporation. As the natural successor to the CDC 6600 and CDC 7600, the 8600 was intended to be about 10 times as fast as the 7600, already the fastest computer on the market. The design was essentially four 7600's, packed into a very small chassis so they could run at higher clock speeds.

The Distributed Array Processor (DAP) produced by International Computers Limited (ICL) was the world's first commercial massively parallel computer. The original paper study was complete in 1972 and building of the prototype began in 1974. The first machine was delivered to Queen Mary College in 1979.

ILLIAC was a series of supercomputers built at a variety of locations, some at the University of Illinois at Urbana–Champaign. In all, five computers were built in this series between 1951 and 1974. Some more modern projects also use the name.

The Burroughs B1000 Series was a series of mainframe computers, built by the Burroughs Corporation, and originally introduced in the 1970s with continued software development until 1987. The series consisted of three major generations which were the B1700, B1800, and B1900 series machines. They were also known as the Burroughs Small Systems, by contrast with the Burroughs Large Systems and the Burroughs Medium Systems.

The Burroughs B2500 through Burroughs B4900 was a series of mainframe computers developed and manufactured by Burroughs Corporation in Pasadena, California, United States, from 1966 to 1991. They were aimed at the business world with an instruction set optimized for the COBOL programming language. They were also known as Burroughs Medium Systems, by contrast with the Burroughs Large Systems and Burroughs Small Systems.

In computing, a word is the natural unit of data used by a particular processor design. A word is a fixed-sized datum handled as a unit by the instruction set or the hardware of the processor. The number of bits or digits in a word is an important characteristic of any specific processor design or computer architecture.

<span class="mw-page-title-main">Goodyear MPP</span>

The Goodyear Massively Parallel Processor (MPP) was a massively parallel processing supercomputer built by Goodyear Aerospace for the NASA Goddard Space Flight Center. It was designed to deliver enormous computational power at lower cost than other existing supercomputer architectures, by using thousands of simple processing elements, rather than one or a few highly complex CPUs. Development of the MPP began circa 1979; it was delivered in May 1983, and was in general use from 1985 until 1991.

The Parallel Element Processing Ensemble (PEPE) was one of the very early parallel computing systems. Bell began researching the concept in the mid-1960s as a way to provide high-performance computing support for the needs of anti-ballistic missile (ABM) systems. The goal was to build a computer system that could simultaneously track hundreds of incoming ballistic missile warheads. A single PEPE system was built by Burroughs Corporation in the 1970s, by which time the US Army's ABM efforts were winding down. The design later evolved into the Burroughs Scientific Computer for commercial sales, but a lack of sales prospects led to it being withdrawn from the market.

<span class="mw-page-title-main">History of general-purpose CPUs</span>

The history of general-purpose CPUs is a continuation of the earlier history of computing hardware.

References

Citations

  1. Hord 1982, p. 1.
  2. 1 2 3 4 5 6 Hord 1982, p. 14.
  3. Hord 1982, p. 5.
  4. Hockney & Jesshope 1988, p. 24.
  5. 1 2 Hord 1982, p. 8.
  6. Hockney & Jesshope 1988, p. 25.
  7. 1 2 3 4 Slotnick 1982, p. 20.
  8. Ware, W.H. (10 March 1953). History and Development of the IAS Computer (PDF) (Technical report). Rand.
  9. MacKenzie 1998, p. 295.
  10. Slotnick 1982, p. 21.
  11. Slotnick 1982, pp. 21–22.
  12. MacKenzie 1998, p. 105.
  13. 1 2 Bouknight et al. 1972, p. 371.
  14. 1 2 Slotnick 1982, p. 23.
  15. Slotnick 1982, p. 24.
  16. MacKenzie 1998, p. 118.
  17. MacKenzie 1998, p. 119.
  18. Slotnick 1982, p. 25.
  19. 1 2 3 4 5 6 7 8 Slotnick 1982, p. 26.
  20. Barnes et al. 1968, p. 746.
  21. Levesque, John; Williamson, Joel (2014). A Guidebook to Fortran on Supercomputers. Academic Press. p. 14.
  22. Parkinson, Dennis (17 June 1976). "Computers by the thousand". New Scientist. p. 626.
  23. Hord 1982, p. 9.
  24. Leetaru, Kalev (2010). "Digital Computer Laboratory / DCL". UI Histories/University of Illinois.
  25. Hord 1982, p. 15.
  26. 1 2 Chen 1967, p. 3.
  27. Barnes et al. 1968, p. 747.
  28. 1 2 Hord 1982, p. 11.
  29. 1 2 3 Falk 1976, p. 67.
  30. Burroughs 1974, p. 3.
  31. 1 2 3 4 5 6 7 Slotnick 1982, p. 27.
  32. Falk 1976, p. 65.
  33. "Byte of History: Computing at the University of Illinois". University of Illinois. March 1997. Archived from the original on 10 June 2007.
  34. "Sterling Hall Bombing of 1970". University of Wisconsin–Madison.
  35. "Scientific Information Bulletin" (PDF). Office of Naval Research Asian Office. December 1993. p. 51. Archived (PDF) from the original on September 24, 2015. Retrieved March 25, 2024.
  36. Hord 1982, p. 7.
  37. Falk 1976, p. 69.
  38. 'This Day in History: September 7', Computer History Museum
  39. "ILLIAC IV control unit". Computer History Museum.
  40. Falk 1976, p. 68.
  41. Hord 1990, p. 9.
  42. Hord 1990, p. 10.
  43. Hord 1990, p. 12.
  44. Hord 1990, p. 13.
  45. Falk 1976, p. 66.
  46. Robbins, Kay; Robbins, Steven (2003). UNIX Systems Programming: Communication, Concurrency, and Threads . Prentice Hall. p.  582. ISBN   9780130424112.
  47. 1 2 3 Burroughs 1974, p. 5.
  48. 1 2 3 Burroughs 1974, p. 4.
  49. Burroughs 1974, pp. 11–12.
  50. 1 2 Chen 1967, p. 9.
  51. Technical 1968, p. 2.10.
  52. Technical 1968, p. 2.7.
  53. Technical 1968, p. 2.8.
  54. Technical 1968, p. 2.11.
  55. Technical 1968, p. 2.12.
  56. 1 2 Burroughs 1974, p. 7.
  57. 1 2 3 4 Burroughs 1974, p. 6.

Bibliography

Further reading