Coprocessor

Last updated
AM9511-1 arithmetic coprocessor AM9511-1 Arithmetic Co-Processor side.jpg
AM9511-1 arithmetic coprocessor

A coprocessor is a computer processor used to supplement the functions of the primary processor (the CPU). Operations performed by the coprocessor may be floating-point arithmetic, graphics, signal processing, string processing, cryptography or I/O interfacing with peripheral devices. By offloading processor-intensive tasks from the main processor, coprocessors can accelerate system performance. Coprocessors allow a line of computers to be customized, so that customers who do not need the extra performance do not need to pay for it.

Contents

Functionality

Coprocessors vary in their degree of autonomy. Some (such as FPUs) rely on direct control via coprocessor instructions, embedded in the CPU's instruction stream. Others are independent processors in their own right, capable of working asynchronously; they are still not optimized for general-purpose code, or they are incapable of it due to a limited instruction set focused on accelerating specific tasks. It is common for these to be driven by direct memory access (DMA), with the host processor (a CPU) building a command list. The PlayStation 2's Emotion Engine contained an unusual DSP-like SIMD vector unit capable of both modes of operation.

History

To make the best use of mainframe computer processor time, input/output tasks were delegated to separate systems called Channel I/O. The mainframe would not require any I/O processing at all, instead would just set parameters for an input or output operation and then signal the channel processor to carry out the whole of the operation. By dedicating relatively simple sub-processors to handle time-consuming I/O formatting and processing, overall system performance was improved.

Coprocessors for floating-point arithmetic first appeared in desktop computers in the 1970s and became common throughout the 1980s and into the early 1990s. Early 8-bit and 16-bit processors used software to carry out floating-point arithmetic operations. Where a coprocessor was supported, floating-point calculations could be carried out many times faster. Math coprocessors were popular purchases for users of computer-aided design (CAD) software and scientific and engineering calculations. Some floating-point units, such as the AMD 9511, Intel 8231/8232 and Weitek FPUs were treated as peripheral devices, while others such as the Intel 8087, Motorola 68881 and National 32081 were more closely integrated with the CPU.

Another form of coprocessor was a video display coprocessor, as used in the Atari 8-bit family, TI-99/4A, and MSX home computers, which were called "Video Display Controllers". The Amiga custom chipset includes such a unit known as the Copper, as well as a blitter for accelerating bitmap manipulation in memory.

As microprocessors developed, the cost of integrating the floating point arithmetic functions into the processor declined. High processor speeds also made a closely integrated coprocessor difficult to implement. Separately packaged mathematics coprocessors are now uncommon in desktop computers. The demand for a dedicated graphics coprocessor has grown, however, particularly due to the increasing demand for realistic 3D graphics in computer games.

Intel

i8087 and i80287 microarchitecture Intel 8087 arch.svg
i8087 and i80287 microarchitecture
i80387 microarchitecture Intel 387 arch.svg
i80387 microarchitecture

The original IBM PC included a socket for the Intel 8087 floating-point coprocessor (aka FPU) which was a popular option for people using the PC for computer-aided design or mathematics-intensive calculations. In that architecture, the coprocessor speeds up floating-point arithmetic on the order of fiftyfold. Users that only used the PC for word processing, for example, saved the high cost of the coprocessor, which would not have accelerated performance of text manipulation operations.

The 8087 was tightly integrated with the 8086/8088 and responded to floating-point machine code operation codes inserted in the 8088 instruction stream. An 8088 processor without an 8087 could not interpret these instructions, requiring separate versions of programs for FPU and non-FPU systems, or at least a test at run time to detect the FPU and select appropriate mathematical library functions.

Intel 80386DX CPU with 80387DX math coprocessor 80386with387.JPG
Intel 80386DX CPU with 80387DX math coprocessor

Another coprocessor for the 8086/8088 central processor was the 8089 input/output coprocessor. It used the same programming technique as 8087 for input/output operations, such as transfer of data from memory to a peripheral device, and so reducing the load on the CPU. But IBM did not use it in IBM PC design and Intel stopped development of this type of coprocessor.

The Intel 80386 microprocessor used an optional "math" coprocessor (the 80387) to perform floating point operations directly in hardware. The Intel 80486DX processor included floating-point hardware on the chip. Intel released a cost-reduced processor, the 80486SX, that had no floating point hardware, and also sold an 80487SX coprocessor that essentially disabled the main processor when installed, since the 80487SX was a complete 80486DX with a different set of pin connections. [1]

Intel processors later than the 80486 integrated floating-point hardware on the main processor chip; the advances in integration eliminated the cost advantage of selling the floating point processor as an optional element. It would be very difficult to adapt circuit-board techniques adequate at 75 MHz processor speed to meet the time-delay, power consumption, and radio-frequency interference standards required at gigahertz-range clock speeds. These on-chip floating point processors are still referred to as coprocessors because they operate in parallel with the main CPU.

During the era of 8- and 16-bit desktop computers another common source of floating-point coprocessors was Weitek. These coprocessors had a different instruction set from the Intel coprocessors, and used a different socket, which not all motherboards supported. The Weitek processors did not provide transcendental mathematics functions (for example, trigonometric functions) like the Intel x87 family, and required specific software libraries to support their functions. [2]

Motorola

The Motorola 68000 family had the 68881/68882 coprocessors which provided similar floating-point speed acceleration as for the Intel processors. Computers using the 68000 family but not equipped with the hardware floating point processor could trap and emulate the floating-point instructions in software, which, although slower, allowed one binary version of the program to be distributed for both cases. The 68451 memory-management coprocessor was designed to work with the 68020 processor. [3]

Modern coprocessors

As of 2001, dedicated Graphics Processing Units (GPUs) in the form of graphics cards are commonplace. Certain models of sound cards have been fitted with dedicated processors providing digital multichannel mixing and real-time DSP effects as early as 1990 to 1994 (the Gravis Ultrasound and Sound Blaster AWE32 being typical examples), while the Sound Blaster Audigy and the Sound Blaster X-Fi are more recent examples.

In 2006, AGEIA announced an add-in card for computers that it called the PhysX PPU. PhysX was designed to perform complex physics computations so that the CPU and GPU do not have to perform these time-consuming calculations. It was designed for video games, although other mathematical uses could theoretically be developed for it. In 2008, Nvidia purchased the company and phased out the PhysX card line; the functionality was added through software allowing their GPUs to render PhysX on cores normally used for graphics processing, using their Nvidia PhysX engine software.

In 2006, BigFoot Systems unveiled a PCI add-in card they christened the KillerNIC which ran its own special Linux kernel on a FreeScale PowerQUICC running at 400 MHz, calling the FreeScale chip a Network Processing Unit or NPU.

The SpursEngine is a media-oriented add-in card with a coprocessor based on the Cell microarchitecture. The SPUs are themselves vector coprocessors.

In 2008, Khronos Group released the OpenCL with the aim to support general-purpose CPUs, ATI/AMD and Nvidia GPUs (and other accelerators) with a single common language for compute kernels.

In 2010s, some mobile computation devices had implemented the sensor hub as a coprocessor. Examples of coprocessors used for handling sensor integration in mobile devices include the Apple M7 and M8 motion coprocessors, the Qualcomm Snapdragon Sensor Core and Qualcomm Hexagon, and the Holographic Processing Unit for the Microsoft HoloLens.

In 2012, Intel announced the Intel Xeon Phi coprocessor. [4]

As of 2016, various companies are developing coprocessors aimed at accelerating artificial neural networks for vision and other cognitive tasks (e.g. vision processing units, TrueNorth, and Zeroth), and as of 2018, such AI chips are in smartphones such as from Apple, and several Android phone vendors.

Other coprocessors

Over time CPUs have tended to grow to absorb the functionality of the most popular coprocessors. FPUs are now considered an integral part of a processors' main pipeline; SIMD units gave multimedia its acceleration, taking over the role of various DSP accelerator cards; and even GPUs have become integrated on CPU dies. Nonetheless, specialized units remain popular away from desktop machines, and for additional power, and allow continued evolution independently of the main processor product lines.

See also

Related Research Articles

<span class="mw-page-title-main">Intel 80286</span> Microprocessor model

The Intel 80286 is a 16-bit microprocessor that was introduced on February 1, 1982. It was the first 8086-based CPU with separate, non-multiplexed address and data buses and also the first with memory management and wide protection abilities. The 80286 used approximately 134,000 transistors in its original nMOS (HMOS) incarnation and, just like the contemporary 80186, it could correctly execute most software written for the earlier Intel 8086 and 8088 processors.

<span class="mw-page-title-main">Intel 8086</span> 16-bit microprocessor

The 8086 is a 16-bit microprocessor chip designed by Intel between early 1976 and June 8, 1978, when it was released. The Intel 8088, released July 1, 1979, is a slightly modified chip with an external 8-bit data bus, and is notable as the processor used in the original IBM PC design.

i386 32-bit microprocessor by Intel

The Intel 386, originally released as 80386 and later renamed i386, is a 32-bit microprocessor introduced in 1985. The first versions had 275,000 transistors and were the central processing unit (CPU) of many workstations and high-end personal computers of the time.

<span class="mw-page-title-main">Intel 80186</span> 16-bit microcontroller

The Intel 80186, also known as the iAPX 186, or just 186, is a microprocessor and microcontroller introduced in 1982. It was based on the Intel 8086 and, like it, had a 16-bit external data bus multiplexed with a 20-bit address bus. The 80188 variant, with an 8-bit external data bus was also available.

<span class="mw-page-title-main">Microprocessor</span> Computer processor contained on an integrated-circuit chip

A microprocessor is a computer processor for which the data processing logic and control is included on a single integrated circuit (IC), or a small number of ICs. The microprocessor contains the arithmetic, logic, and control circuitry required to perform the functions of a computer's central processing unit (CPU). The IC is capable of interpreting and executing program instructions and performing arithmetic operations. The microprocessor is a multipurpose, clock-driven, register-based, digital integrated circuit that accepts binary data as input, processes it according to instructions stored in its memory, and provides results as output. Microprocessors contain both combinational logic and sequential digital logic, and operate on numbers and symbols represented in the binary number system.

x86 Family of instruction set architectures

x86 is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was introduced in 1978 as a fully 16-bit extension of Intel's 8-bit 8080 microprocessor, with memory segmentation as a solution for addressing more memory than can be covered by a plain 16-bit address. The term "x86" came into being because the names of several successors to Intel's 8086 processor end in "86", including the 80186, 80286, 80386 and 80486 processors. Colloquially, their names were "186", "286", "386" and "486".

The NS32000, sometimes known as the 32k, is a series of microprocessors produced by National Semiconductor. The first member of the family came to market in 1982, briefly known as the 16032 before becoming the 32016. It was the first general-purpose microprocessor on the market that used 32-bit data throughout: the Motorola 68000 used 32-bit data but had a 16-bit ALU and thus took twice as long perform many operations. However, the 32016 contained many bugs and often could not be run at its rated speed. These problems, and the presence of the otherwise similar 68000 which had been available since 1980, led to little use in the market.

<span class="mw-page-title-main">Floating-point unit</span> Part of a computer system

A floating-point unit is a part of a computer system specially designed to carry out operations on floating-point numbers. Typical operations are addition, subtraction, multiplication, division, and square root. Some FPUs can also perform various transcendental functions such as exponential or trigonometric calculations, but the accuracy can be low, so some systems prefer to compute these functions in software.

The Motorola 68000 series is a family of 32-bit complex instruction set computer (CISC) microprocessors. During the 1980s and early 1990s, they were popular in personal computers and workstations and were the primary competitors of Intel's x86 microprocessors. They were best known as the processors used in the early Apple Macintosh, the Sharp X68000, the Commodore Amiga, the Sinclair QL, the Atari ST and Falcon, the Atari Jaguar, the Sega Genesis, the Philips CD-i, the Capcom System I (Arcade), the AT&T UNIX PC, the Tandy Model 16/16B/6000, the Sun Microsystems Sun-1, Sun-2 and Sun-3, the NeXT Computer, NeXTcube, NeXTstation, and NeXTcube Turbo, early Silicon Graphics IRIS workstations, computers from MASSCOMP, the Texas Instruments TI-89/TI-92 calculators, the Palm Pilot, the Control Data Corporation CDCNET Device Interface, and the Space Shuttle. Although no modern desktop computers are based on processors in the 680x0 series, derivative processors are still widely used in embedded systems.

The Intel i860 is a RISC microprocessor design introduced by Intel in 1989. It is one of Intel's first attempts at an entirely new, high-end instruction set architecture since the failed Intel iAPX 432 from the beginning of the 1980s. It was the world's first million-transistor chip. It was released with considerable fanfare, slightly obscuring the earlier Intel i960, which was successful in some niches of embedded systems. The i860 never achieved commercial success and the project was terminated in the mid-1990s.

<span class="mw-page-title-main">Cyrix</span> American microprocessor developer

Cyrix Corporation was a microprocessor developer that was founded in 1988 in Richardson, Texas, as a specialist supplier of floating point units for 286 and 386 microprocessors. The company was founded by Tom Brightman and Jerry Rogers.

<span class="mw-page-title-main">NEC V20</span> 16-bit microprocessor introduced by NEC in 1984

The NEC V20 is a microprocessor that was designed and produced by NEC. It is both pin compatible and object code compatible with the Intel 8088, with an instruction set architecture (ISA) similar to that of the Intel 80188 with some extensions. The V20 was introduced in March 1984.

<span class="mw-page-title-main">Weitek</span>

Weitek Corporation was an American chip-design company that originally focused on floating-point units for a number of commercial CPU designs. During the early to mid-1980s, Weitek designs could be found powering a number of high-end designs and parallel-processing supercomputers.

<span class="mw-page-title-main">Intel 8087</span> Floating-point microprocessor made by Intel

The Intel 8087, announced in 1980, was the first floating-point coprocessor for the 8086 line of microprocessors. The purpose of the chip was to speed up floating-point arithmetic operations, such as addition, subtraction, multiplication, division, and square root. It also computes transcendental functions such as exponential, logarithmic or trigonometric calculations. The performance enhancements were from approximately 20% to over 500%, depending on the specific application. The 8087 could perform about 50,000 FLOPS using around 2.4 watts.

<span class="mw-page-title-main">Hauppauge Computer Works</span> Company focusing on computer software

Hauppauge Computer Works is a US manufacturer and marketer of electronic video hardware for personal computers. Although it is most widely known for its WinTV line of TV tuner cards for PCs, Hauppauge also produces personal video recorders, digital video editors, digital media players, hybrid video recorders and digital television products for both Windows and Mac. The company is named after the hamlet of Hauppauge, New York, in which it is based.

x87 is a floating-point-related subset of the x86 architecture instruction set. It originated as an extension of the 8086 instruction set in the form of optional floating-point coprocessors that work in tandem with corresponding x86 CPUs. These microchips have names ending in "87". This is also known as the NPX. Like other extensions to the basic instruction set, x87 instructions are not strictly needed to construct working programs, but provide hardware and microcode implementations of common numerical tasks, allowing these tasks to be performed much faster than corresponding machine code routines can. The x87 instruction set includes instructions for basic floating-point operations such as addition, subtraction and comparison, but also for more complex numerical operations, such as the computation of the tangent function and its inverse, for example.

Heterogeneous computing refers to systems that use more than one kind of processor or core. These systems gain performance or energy efficiency not just by adding the same type of processors, but by adding dissimilar coprocessors, usually incorporating specialized processing capabilities to handle particular tasks.

Computation offloading is the transfer of resource intensive computational tasks to a separate processor, such as a hardware accelerator, or an external platform, such as a cluster, grid, or a cloud. Offloading to a coprocessor can be used to accelerate applications including: image rendering and mathematical calculations. Offloading computing to an external platform over a network can provide computing power and overcome hardware limitations of a device, such as limited computational power, storage, and energy.

A vision processing unit (VPU) is an emerging class of microprocessor; it is a specific type of AI accelerator, designed to accelerate machine vision tasks.

The Intel 8231 and 8232 were early designs of floating-point maths coprocessors (FPUs), marketed for use with their i8080 line of primary CPUs. They were licensed versions of AMD's Am9511 and Am9512 FPUs, from 1977 and 1979, themselves claimed by AMD as the world's first single-chip FPU solutions.

References

  1. Scott Mueller, Upgrading and repairing PCs 15th edition, Que Publishing, 2003 ISBN   0-7897-2974-1, pages 108–110
  2. Scott Mueller, Upgrading and Repairing PCs, Second Edition, Que Publishing, 1992 ISBN   0-88022-856-3, pp. 412-413
  3. William Ford, William R. Topp Assembly language and systems programming for the M68000 family Jones & Bartlett Learning, 1992 ISBN   0-7637-0357-5 page 892 and ff.
  4. "Intel Delivers New Architecture for Discovery with Intel® Xeon Phi™ Coprocessors". Newsroom.intel.com. 2012-11-12. Archived from the original on 2013-06-03. Retrieved 2013-06-16.
  5. Erin Farquhar, Philip Bunce, The MIPS programmer's handbook, Morgan Kaufmann, 1994 ISBN   1-55860-297-6, appendix A3 page 330
  6. "China's Tianhe-2A will Use Proprietary Accelerator and Boast 95 Petaflops Peak". hpcwire.com. 25 September 2017. Archived from the original on 1 December 2020. Retrieved 7 April 2018.