FPS AP-120B

Last updated August 07, 2019

The FPS AP-120B was a 38-bit, pipeline-oriented array processor manufactured by Floating Point Systems. It was designed to be attached to a host computer such as a DEC PDP-11 as a fast number-cruncher. Data transfer was accomplished using direct memory access.

In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set containing instructions that operate on one-dimensional arrays of data called vectors, compared to the scalar processors, whose instructions operate on single data items. Vector processors can greatly improve performance on certain workloads, notably numerical simulation and similar tasks. Vector machines appeared in the early 1970s and dominated supercomputer design through the 1970s into the 1990s, notably the various Cray platforms. The rapid fall in the price-to-performance ratio of conventional microprocessor designs led to the vector supercomputer's demise in the later 1990s.

Floating Point Systems Inc. (FPS) was a Beaverton, Oregon vendor of attached array processors and minisupercomputers. The company was founded in 1970 by former Tektronix engineer Norm Winningstad, with partners Tom Prince, Frank Bouton and Robert Carter. Carter was a salesman for Data General Corp. who persuaded Bouton and Prince to leave Tektronix to start the new company. Winningstad was the fourth partner.

The PDP-11 is a series of 16-bit minicomputers sold by Digital Equipment Corporation (DEC) from 1970 into the 1990s, one of a succession of products in the PDP series. In total, around 600,000 PDP-11s of all models were sold, making it one of DEC's most successful product lines. The PDP-11 is considered by some experts to be the most popular minicomputer ever.

Architecture

The processor was designed around the concept of multiple parallel processing units operating in synchronization. A single 64-bit instruction word was divided into fields, each of which instructed a particular module under the control of the CPU. The modules were as follows:

16-bit Arithmetic and Logic unit (ALU)
38-bit Floating Point Adder (FADD) (two stages)
38-bit Floating Point Multiplier (FMUL) (three stages)
Two Data Pad registers for receiving data from memory.

The processor had access to dual-interleaved core memory in which odd numbered addresses were stored in one physical bank, and even numbered addresses were stored in the other. This represented an attempt to take advantage of typical sequential fetching of memory words. Fetching sequentially from one physical bank would result in a latency of two instruction cycles before the data was loaded into the destination data pad. Interleaving allowed a sequential access to occur immediately after the previous one. Both accesses took two cycles to complete, but the overlap and dual destination pads maximized the use of the data channel.

The floating point arithmetic modules were both multi-stage processors which were driven by explicit instructions. In the two-stage adder an assembler instruction such as FADD DX,DY would load values from data pads DX and DY into stage one of the adder. A subsequent FADD instruction would be required to present the result at the adder's output. This second FADD could be a dummy with no arguments, or it could be the next calculation in a sequence. In this fashion a stream of FADD operations could be performed in a pipeline, with a new result in every instruction cycle though every addition requires two cycles.

Similarly the multiplier, a three-stage unit, required one FMUL DX,DY to begin a multiplication, followed by two more FMUL instructions to produce the result. Careful programming of the pipeline allowed the production of one result per cycle, with each calculation taking three cycles in itself.

For maximum efficiency all calculations were programmed using the assembler language supplied with the hardware. A high-level language resembling Fortran was provided for coordinating tasks and controlling data transfers to and from the host computer.

Fortran is a general-purpose, compiled imperative programming language that is especially suited to numeric computation and scientific computing.

Lookup tables

In order to support typical applications in signal processing, the hardware was delivered with a pre-calculated lookup table of sine and cosine values. Sines and cosines for angles from 0 to π/2 radians were stored in alternate addresses to take advantage of the interleaving described above. Values for all other angles could be calculated by using one or other of the values from the lookup table, negating if necessary, using well-known rules.

In mathematics, the sine is a trigonometric function of an angle. The sine of an acute angle is defined in the context of a right triangle: for the specified angle, it is the ratio of the length of the side that is opposite that angle to the length of the longest side of the triangle.

Typical programming style

This was unusual, being driven by the synchronous parallel processing architecture. The basic philosophy can be summarized as follows:

Lay out the shortest sequence of instructions for performing one instance of the desired calculation, allowing for two-cycle memory latency, and the driving of the floating-point modules with explicit FADD and FMUL instructions.
Inspect the sequence to determine the minimum number of instructions forming a loop which will perform the calculation repetitively. This requires attention to resource conflicts. For instance the data bus for moving results around can only move one data word per cycle. Likewise the ALU, used mostly for counting loops and memory addressing, can only be used for one purpose per cycle. This step is typically trial-and-error.
Conceptually "wrap" the full sequence of instructions around the loop, using FADD and FMUL instructions to drive calculations through the pipelines.
Before the loop begins, add parallel process initiations as required.

The final item was accomplished as follows: assume that the entire calculation requires 15 cycles, and the minimum loop size is 5 cycles. The first 5 instruction words begin iteration 1 of the calculation. The second 5 words contain both iteration 1, and the beginning of iteration 2 in parallel. This usually would be a copy of the operations beginning iteration 1. The next 5 words contain the final steps of iteration 1, the middle of iteration 2, and the beginning of iteration 3. These five words form the body of the loop which repeats until the desired number of data points have been processed.

Application

As an attached processor, the AP-120B was typically used as a low cost/cost-effective adjunct to systems like diagnostic medical imaging systems, and more. In the early 80's a VAX 11/780 or 11/785 with an FPS-AP-120B and a Versatec plotter were the workhorse systems for seismic data processing in the oil industry. Commercial seismic processing packages were written so that they could call FPS AP-120B routines if one were present.

Related Research Articles

A central processing unit (CPU), also called a central processor or main processor, is the electronic circuitry within a computer that carries out the instructions of a computer program by performing the basic arithmetic, logic, controlling, and input/output (I/O) operations specified by the instructions. The computer industry has used the term "central processing unit" at least since the early 1960s. Traditionally, the term "CPU" refers to a processor, more specifically to its processing unit and control unit (CU), distinguishing these core elements of a computer from external components such as main memory and I/O circuitry.

The first Pentium microprocessor was introduced by Intel on March 22, 1993. Dubbed P5, its microarchitecture was the fifth generation for Intel, and the first superscalar IA-32 microarchitecture. As a direct extension of the 80486 architecture, it included dual integer pipelines, a faster floating-point unit, wider data bus, separate code and data caches and features for further reduced address calculation latency. In 1996, the Pentium with MMX Technology was introduced with the same basic microarchitecture complemented with an MMX instruction set, larger caches, and some other enhancements.

In computing, Streaming SIMD Extensions (SSE) is a single instruction, multiple data (SIMD) instruction set extension to the x86 architecture, designed by Intel and introduced in 1999 in their Pentium III series of Central processing units (CPUs) shortly after the appearance of Advanced Micro Devices (AMD's) 3DNow!. SSE contains 70 new instructions, most of which work on single precision floating point data. SIMD instructions can greatly increase performance when exactly the same operations are to be performed on multiple data objects. Typical applications are digital signal processing and graphics processing.

The CDC 6600 was the flagship of the 6000 series of mainframe computer systems manufactured by Control Data Corporation. Generally considered to be the first successful supercomputer, it outperformed the industry's prior recordholder, the IBM 7030 Stretch, by a factor of three. With performance of up to three megaFLOPS, the CDC 6600 was the world's fastest computer from 1964 to 1969, when it relinquished that status to its successor, the CDC 7600.

A digital signal processor (DSP) is a specialized microprocessor, with its architecture optimized for the operational needs of digital signal processing.

In the history of computer hardware, some early reduced instruction set computer central processing units used a very similar architectural solution, now called a classic RISC pipeline. Those CPUs were: MIPS, SPARC, Motorola 88000, and later the notional CPU DLX invented for education.

The CDC 7600 was the Seymour Cray-designed successor to the CDC 6600, extending Control Data's dominance of the supercomputer field into the 1970s. The 7600 ran at 36.4 MHz and had a 65 Kword primary memory using magnetic core and variable-size secondary memory. It was generally about ten times as fast as the CDC 6600 and could deliver about 10 MFLOPS on hand-compiled code, with a peak of 36 MFLOPS. In addition, in benchmark tests in early 1970 it was shown to be slightly faster than its IBM rival, the IBM System/360, Model 195. When the system was released in 1969, it sold for around $5 million in base configurations, and considerably more as options and features were added.

In computer science and particularly in compiler design, loop nest optimization (LNO) is an optimization technique that applies a set of loop transformations for the purpose of locality optimization or parallelization or other loop overhead reduction of the loop nests. One classical usage is to reduce memory access latency or the cache bandwidth necessary due to cache reuse for some common linear algebra algorithms.

The CDC Cyber range of mainframe-class supercomputers were the primary products of Control Data Corporation (CDC) during the 1970s and 1980s. In their day, they were the computer architecture of choice for scientific and mathematically intensive computing. They were used for modeling fluid flow, material science stress analysis, electrochemical machining analysis, probabilistic analysis, energy and academic computing, radiation shielding modeling, and other applications. The lineup also included the Cyber 18 and Cyber 1000 minicomputers. Like their predecessor, the CDC 6600, they were unusual in using the ones' complement binary representation.

The Intel 8087, announced in 1980, was the first x87 floating-point coprocessor for the 8086 line of microprocessors.

The CDC STAR-100 is a vector supercomputer that was designed, manufactured, and marketed by Control Data Corporation (CDC). It was one of the first machines to use a vector processor to improve performance on appropriate scientific applications. It was also the first supercomputer to use integrated circuits and the first to be equipped with one million words of computer memory.

The RISC Single Chip, or RSC, is a single-chip microprocessor developed and fabricated by International Business Machines (IBM). The RSC was a feature-reduced single-chip implementation of the POWER1, a multi-chip central processing unit (CPU) which implemented the POWER instruction set architecture (ISA). It was used in entry-level workstation models of the IBM RS/6000 family, such as the Model 220 and 230.

The Orion was a series of 32-bit super-minicomputers designed and produced in the 1980s by High Level Hardware Limited (HLH), a company based in Oxford, UK. The company produced four versions of the machine:

In computer science, software pipelining is a technique used to optimize loops, in a manner that parallels hardware pipelining. Software pipelining is a type of out-of-order execution, except that the reordering is done by a compiler instead of the processor. Some computer architectures have explicit support for software pipelining, notably Intel's IA-64 architecture.

The R10000, code-named "T5", is a RISC microprocessor implementation of the MIPS IV instruction set architecture (ISA) developed by MIPS Technologies, Inc. (MTI), then a division of Silicon Graphics, Inc. (SGI). The chief designers are Chris Rowen and Kenneth C. Yeager. The R10000 microarchitecture is known as ANDES, an abbreviation for Architecture with Non-sequential Dynamic Execution Scheduling. The R10000 largely replaces the R8000 in the high-end and the R4400 elsewhere. MTI was a fabless semiconductor company; the R10000 was fabricated by NEC and Toshiba. Previous fabricators of MIPS microprocessors such as Integrated Device Technology (IDT) and three others did not fabricate the R10000 as it was more expensive to do so than the R4000 and R4400.

The R4000 is a microprocessor developed by MIPS Computer Systems that implements the MIPS III instruction set architecture (ISA). Officially announced on 1 October 1991, it was one of the first 64-bit microprocessors and the first MIPS III implementation. In the early 1990s, when RISC microprocessors were expected to replace CISC microprocessors such as the Intel i486, the R4000 was selected to be the microprocessor of the Advanced Computing Environment (ACE), an industry standard that intended to define a common RISC platform. ACE ultimately failed for a number of reasons, but the R4000 found success in the workstation and server markets.

Saturn Launch Vehicle Digital Computer computer of the Saturn V rocket

The Saturn Launch Vehicle Digital Computer (LVDC) was a computer that provided the autopilot for the Saturn V rocket from launch to Earth orbit insertion. Designed and manufactured by IBM's Electronics Systems Center in Owego, N.Y., it was one of the major components of the Instrument Unit, fitted to the S-IVB stage of the Saturn V and Saturn IB rockets. The LVDC also supported pre- and post-launch checkout of the Saturn hardware. It was used in conjunction with the Launch Vehicle Data Adaptor (LVDA) which performed signal conditioning to the sensor inputs to the computer from the launch vehicle.

In computer architecture, multithreading is the ability of a central processing unit (CPU) to provide multiple threads of execution concurrently, supported by the operating system. This approach differs from multiprocessing. In a multithreaded application, the threads share the resources of a single or multiple cores, which include the computing units, the CPU caches, and the translation lookaside buffer (TLB).

The Mill architecture is a novel belt machine-based computer architecture for general-purpose computing. It has been under development since about 2003 by Ivan Godard and his startup Mill Computing, Inc., formerly named Out Of The Box Computing, in East Palo Alto, California. Mill Computing claims it has a "10x single-thread power/performance gain over conventional out-of-order superscalar architectures" but "runs the same programs, without rewrite".

References

Page 206 ff, Parallel Computers Two: Architecture, Programming and Algorithms, by Roger W. Hockney, C. R. Jesshope. CRC Press 1988 ISBN 0-85274-811-6
FPS had a bibliography of papers.^{[ where? ]}

The CRC Press, LLC is a publishing group based in the United States that specializes in producing technical books. Many of their books relate to engineering, science and mathematics. Their scope also includes books on business, forensics and information technology. CRC Press is now a division of Taylor & Francis, itself a subsidiary of Informa.

International Standard Book Number Unique numeric book identifier

The International Standard Book Number (ISBN) is a numeric commercial book identifier which is intended to be unique. Publishers purchase ISBNs from an affiliate of the International ISBN Agency.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.