Floating-point unit

Last updated May 04, 2024

A floating-point unit (FPU, colloquially a math coprocessor) is a part of a computer system specially designed to carry out operations on floating-point numbers.^[1] Typical operations are addition, subtraction, multiplication, division, and square root. Some FPUs can also perform various transcendental functions such as exponential or trigonometric calculations, but the accuracy can be low,^[2]^[3] so some systems prefer to compute these functions in software.

History

In 1954, the IBM 704 had floating-point arithmetic as a standard feature, one of its major improvements over its predecessor the IBM 701. This was carried forward to its successors the 709, 7090, and 7094.

In 1963, Digital announced the PDP-6, which had floating point as a standard feature.^[4]

In 1963, the GE-235 featured an "Auxiliary Arithmetic Unit" for floating point and double-precision calculations.^[5]

Historically, some systems implemented floating point with a coprocessor rather than as an integrated unit (but now in addition to the CPU, e.g. GPUs –that are coprocessors not always built into the CPU –have FPUs as a rule, while first generations of GPUs did not). This could be a single integrated circuit, an entire circuit board or a cabinet. Where floating-point calculation hardware has not been provided, floating-point calculations are done in software, which takes more processor time, but avoids the cost of the extra hardware. For a particular computer architecture, the floating-point unit instructions may be emulated by a library of software functions; this may permit the same object code to run on systems with or without floating-point hardware. Emulation can be implemented on any of several levels: in the CPU as microcode, as an operating system function, or in user-space code. When only integer functionality is available, the CORDIC methods are most commonly used for transcendental function evaluation.^{[ citation needed ]}

In most modern computer architectures, there is some division of floating-point operations from integer operations. This division varies significantly by architecture; some have dedicated floating-point registers, while some, like Intel x86, go as far as independent clocking schemes.^[6]

CORDIC routines have been implemented in Intel x87 coprocessors (8087,^[7]^[8]^[9]^[10]^[11] 80287,^[11]^[12] 80387^[11]^[12]) up to the 80486 ^[7] microprocessor series, as well as in the Motorola 68881 ^[7]^[8] and 68882 for some kinds of floating-point instructions, mainly as a way to reduce the gate counts (and complexity) of the FPU subsystem.

Floating-point operations are often pipelined. In earlier superscalar architectures without general out-of-order execution, floating-point operations were sometimes pipelined separately from integer operations.

The modular architecture of Bulldozer microarchitecture uses a special FPU named FlexFPU, which uses simultaneous multithreading. Each physical integer core, two per module, is single-threaded, in contrast with Intel's Hyperthreading, where two virtual simultaneous threads share the resources of a single physical core.^[13]^[14]

Floating-point library

Some floating-point hardware only supports the simplest operations: addition, subtraction, and multiplication. But even the most complex floating-point hardware has a finite number of operations it can support –for example, no FPUs directly support arbitrary-precision arithmetic.

When a CPU is executing a program that calls for a floating-point operation that is not directly supported by the hardware, the CPU uses a series of simpler floating-point operations. In systems without any floating-point hardware, the CPU emulates it using a series of simpler fixed-point arithmetic operations that run on the integer arithmetic logic unit.

The software that lists the necessary series of operations to emulate floating-point operations is often packaged in a floating-point library.

Integrated FPUs

In some cases, FPUs may be specialized, and divided between simpler floating-point operations (mainly addition and multiplication) and more complicated operations, like division. In some cases, only the simple operations may be implemented in hardware or microcode, while the more complex operations are implemented as software.

In some current architectures, the FPU functionality is combined with SIMD units to perform SIMD computation; an example of this is the augmentation of the x87 instructions set with SSE instruction set in the x86-64 architecture used in newer Intel and AMD processors.

Add-on FPUs

Several models of the PDP-11, such as the PDP-11/45,^[15] PDP-11/34a,^[16]^{: 184–185} PDP-11/44,^[16]^{: 195, 211} and PDP-11/70,^[16]^{: 277, 286–287} supported an add-on floating-point unit to support floating-point instructions. The PDP-11/60,^[16]^: 261 MicroPDP-11/23^[17] and several VAX models^[18]^[19] could execute floating-point instructions without an add-on FPU (the MicroPDP-11/23 required an add-on microcode option),^[17] and offered add-on accelerators to further speed the execution of those instructions.

In the 1980s, it was common in IBM PC/compatible microcomputers for the FPU to be entirely separate from the CPU, and typically sold as an optional add-on. It would only be purchased if needed to speed up or enable math-intensive programs.

The IBM PC, XT, and most compatibles based on the 8088 or 8086 had a socket for the optional 8087 coprocessor. The AT and 80286-based systems were generally socketed for the 80287, and 80386/80386SX-based machines –for the 80387 and 80387SX respectively, although early ones were socketed for the 80287, since the 80387 did not exist yet. Other companies manufactured co-processors for the Intel x86 series. These included Cyrix and Weitek. Acorn Computers opted for the WE32206 to offer single, double and extended precision ^[20] to its ARM powered Archimedes range.

Coprocessors were available for the Motorola 68000 family, the 68881 and 68882. These were common in Motorola 68020/68030-based workstations, like the Sun-3 series. They were also commonly added to higher-end models of Apple Macintosh and Commodore Amiga series, but unlike IBM PC-compatible systems, sockets for adding the coprocessor were not as common in lower-end systems.

There are also add-on FPU coprocessor units for microcontroller units (MCUs/μCs)/single-board computer (SBCs), which serve to provide floating-point arithmetic capability. These add-on FPUs are host-processor-independent, possess their own programming requirements (operations, instruction sets, etc.) and are often provided with their own integrated development environments (IDEs).

Related Research Articles

A central processing unit (CPU), also called a central processor, main processor, or just processor, is the most important processor in a given computer. Its electronic circuitry executes instructions of a computer program, such as arithmetic, logic, controlling, and input/output (I/O) operations. This role contrasts with that of external components, such as main memory and I/O circuitry, and specialized coprocessors such as graphics processing units (GPUs).

The Cyrix 6x86 is a line of sixth-generation, 32-bit x86 microprocessors designed and released by Cyrix in 1995. Cyrix, being a fabless company, had the chips manufactured by IBM and SGS-Thomson. The 6x86 was made as a direct competitor to Intel's Pentium microprocessor line, and was pin compatible. During the 6x86's development, the majority of applications performed almost entirely integer operations. The designers foresaw that future applications would most likely maintain this instruction focus. So, to optimize the chip's performance for what they believed to be the most likely application of the CPU, the integer execution resources received most of the transistor budget. This would later prove to be a strategic mistake, as the popularity of the P5 Pentium caused many software developers to hand-optimize code in assembly language, to take advantage of the P5 Pentium's tightly pipelined and lower latency FPU. For example, the highly anticipated first-person shooter Quake used highly optimized assembly code designed almost entirely around the P5 Pentium's FPU. As a result, the P5 Pentium significantly outperformed other CPUs in the game.

The 8086 is a 16-bit microprocessor chip designed by Intel between early 1976 and June 8, 1978, when it was released. The Intel 8088, released July 1, 1979, is a slightly modified chip with an external 8-bit data bus, and is notable as the processor used in the original IBM PC design.

The Intel 386, originally released as 80386 and later renamed i386, is a 32-bit microprocessor designed by Intel. The first pre-production samples of the 386 were released to select developers in 1985, while mass production commenced in 1986. The processor was a significant evolution in the x86 architecture, extending a long line of processors that stretched back to the Intel 8008. The 386 was the central processing unit (CPU) of many workstations and high-end personal computers of the time. The 386 began to fall out of public use starting with the release of the i486 processor in 1989, while in embedded systems the 386 remained in widespread use until Intel finally discontinued it in 2007.

The Intel 486, officially named i486 and also known as 80486, is a microprocessor. It is a higher-performance follow-up to the Intel 386. The i486 was introduced in 1989. It represents the fourth generation of binary compatible CPUs following the 8086 of 1978, the Intel 80286 of 1982, and 1985's i386.

A microprocessor is a computer processor for which the data processing logic and control is included on a single integrated circuit (IC), or a small number of ICs. The microprocessor contains the arithmetic, logic, and control circuitry required to perform the functions of a computer's central processing unit (CPU). The IC is capable of interpreting and executing program instructions and performing arithmetic operations. The microprocessor is a multipurpose, clock-driven, register-based, digital integrated circuit that accepts binary data as input, processes it according to instructions stored in its memory, and provides results as output. Microprocessors contain both combinational logic and sequential digital logic, and operate on numbers and symbols represented in the binary number system.

In processor design, microcode serves as an intermediary layer situated between the central processing unit (CPU) hardware and the programmer-visible instruction set architecture of a computer, also known as its machine code. It consists of a set of hardware-level instructions that implement the higher-level machine code instructions or control internal finite-state machine sequencing in many digital processing components. While microcode is utilized in general-purpose CPUs in contemporary desktops, it also functions as a fallback path for scenarios that the faster hardwired control unit is unable to manage.

x86 is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was introduced in 1978 as a fully 16-bit extension of Intel's 8-bit 8080 microprocessor, with memory segmentation as a solution for addressing more memory than can be covered by a plain 16-bit address. The term "x86" came into being because the names of several successors to Intel's 8086 processor end in "86", including the 80186, 80286, 80386 and 80486 processors. Colloquially, their names were "186", "286", "386" and "486".

In computing, endianness is the order in which bytes within a word of digital data are transmitted over a data communication medium or addressed in computer memory, counting only byte significance compared to earliness. Endianness is primarily expressed as big-endian (BE) or little-endian (LE), terms introduced by Danny Cohen into computer science for data ordering in an Internet Experiment Note published in 1980. The adjective endian has its origin in the writings of 18th century Anglo-Irish writer Jonathan Swift. In the 1726 novel Gulliver's Travels, he portrays the conflict between sects of Lilliputians divided into those breaking the shell of a boiled egg from the big end or from the little end. By analogy, a CPU may read a digital word big end first, or little end first.

<span class="mw-page-title-main">MMX (instruction set)</span> Instruction set designed by Intel

MMX is a single instruction, multiple data (SIMD) instruction set architecture designed by Intel, introduced on January 8, 1997 with its Pentium P5 (microarchitecture) based line of microprocessors, named "Pentium with MMX Technology". It developed out of a similar unit introduced on the Intel i860, and earlier the Intel i750 video pixel processor. MMX is a processor supplementary capability that is supported on IA-32 processors by Intel and other vendors as of 1997. AMD also added MMX instruction set in its K6 processor.

The Motorola 68000 series is a family of 32-bit complex instruction set computer (CISC) microprocessors. During the 1980s and early 1990s, they were popular in personal computers and workstations and were the primary competitors of Intel's x86 microprocessors. They were best known as the processors used in the early Apple Macintosh, the Sharp X68000, the Commodore Amiga, the Sinclair QL, the Atari ST and Falcon, the Atari Jaguar, the Sega Genesis and Sega CD, the Philips CD-i, the Capcom System I (Arcade), the AT&T UNIX PC, the Tandy Model 16/16B/6000, the Sun Microsystems Sun-1, Sun-2 and Sun-3, the NeXT Computer, NeXTcube, NeXTstation, and NeXTcube Turbo, early Silicon Graphics IRIS workstations, the Aesthedes, computers from MASSCOMP, the Texas Instruments TI-89/TI-92 calculators, the Palm Pilot, the Control Data Corporation CDCNET Device Interface, the VTech Precomputer Unlimited and the Space Shuttle. Although no modern desktop computers are based on processors in the 680x0 series, derivative processors are still widely used in embedded systems.

Cyrix Corporation was a microprocessor developer that was founded in 1988 in Richardson, Texas, as a specialist supplier of floating point units for 286 and 386 microprocessors. The company was founded by Tom Brightman and Jerry Rogers.

A coprocessor is a computer processor used to supplement the functions of the primary processor. Operations performed by the coprocessor may be floating-point arithmetic, graphics, signal processing, string processing, cryptography or I/O interfacing with peripheral devices. By offloading processor-intensive tasks from the main processor, coprocessors can accelerate system performance. Coprocessors allow a line of computers to be customized, so that customers who do not need the extra performance do not need to pay for it.

A processor register is a quickly accessible location available to a computer's processor. Registers usually consist of a small amount of fast storage, although some registers have specific hardware functions, and may be read-only or write-only. In computer architecture, registers are typically addressed by mechanisms other than main memory, but may in some cases be assigned a memory address e.g. DEC PDP-10, ICT 1900.

The x86 instruction set refers to the set of instructions that x86-compatible microprocessors support. The instructions are usually part of an executable program, often stored as a computer file and executed on the processor.

The Intel 8087, announced in 1980, was the first floating-point coprocessor for the 8086 line of microprocessors. The purpose of the chip was to speed up floating-point arithmetic operations, such as addition, subtraction, multiplication, division, and square root. It also computes transcendental functions such as exponential, logarithmic or trigonometric calculations. The performance enhancements were from approximately 20% to over 500%, depending on the specific application. The 8087 could perform about 50,000 FLOPS using around 2.4 watts.

<span class="mw-page-title-main">Hauppauge Computer Works</span> Company focusing on computer software

Hauppauge Computer Works is a US manufacturer and marketer of electronic video hardware for personal computers. Although it is most widely known for its WinTV line of TV tuner cards for PCs, Hauppauge also produces personal video recorders, digital video editors, digital media players, hybrid video recorders and digital television products for both Windows and Mac. The company is named after the hamlet of Hauppauge, New York, in which it is based.

x87 is a floating-point-related subset of the x86 architecture instruction set. It originated as an extension of the 8086 instruction set in the form of optional floating-point coprocessors that work in tandem with corresponding x86 CPUs. These microchips have names ending in "87". This is also known as the NPX. Like other extensions to the basic instruction set, x87 instructions are not strictly needed to construct working programs, but provide hardware and microcode implementations of common numerical tasks, allowing these tasks to be performed much faster than corresponding machine code routines can. The x87 instruction set includes instructions for basic floating-point operations such as addition, subtraction and comparison, but also for more complex numerical operations, such as the computation of the tangent function and its inverse, for example.

Extended precision refers to floating-point number formats that provide greater precision than the basic floating-point formats. Extended precision formats support a basic format by minimizing roundoff and overflow errors in intermediate values of expressions on the base format. In contrast to extended precision, arbitrary-precision arithmetic refers to implementations of much larger numeric types using special software.

<span class="mw-page-title-main">Intel 8231/8232</span> Early floating-point math coprocessor

The Intel 8231 and 8232 were early designs of floating-point maths coprocessors (FPUs), marketed for use with their i8080 line of primary CPUs. They were licensed versions of AMD's Am9511 and Am9512 FPUs, from 1977 and 1979, themselves claimed by AMD as the world's first single-chip FPU solutions.

References

↑ Anderson, Stanley F.; Earle, John G.; Goldschmidt, Robert Elliott; Powers, Don M. (January 1967). "The IBM System/360 Model 91: Floating-Point Execution Unit". IBM Journal of Research and Development . 11 (1): 34–53. doi:10.1147/rd.111.0034. ISSN 0018-8646.
↑ Dawson, Bruce (2014-10-09). "Intel Underestimates Error Bounds by 1.3 quintillion". randomascii.wordpress.com. Retrieved 2020-01-16.
↑ "FSIN Documentation Improvements in the "Intel® 64 and IA-32 Architectures Software Developer's Manual"". intel.com. 2014-10-09. Archived from the original on 2020-01-16. Retrieved 2020-01-16.
↑ "PDP-6 Handbook" (PDF). www.bitsavers.org. Archived (PDF) from the original on 2022-10-09.
↑ "GE-2xx documents". www.bitsavers.org. CPB-267_GE-235-SystemManual_1963.pdf, p. IV-4.
↑ "Intel 80287 family". www.cpu-world.com. Retrieved 2019-01-15.
1 2 3 Muller, Jean-Michel (2006). Elementary Functions: Algorithms and Implementation (2nd ed.). Boston, Massachusetts: Birkhäuser. p. 134. ISBN 978-0-8176-4372-0. LCCN 2005048094 . Retrieved 2015-12-01.
1 2 Nave, Rafi (March 1983). "Implementation of Transcendental Functions on a Numerics Processor". Microprocessing and Microprogramming. 11 (3–4): 221–225. doi:10.1016/0165-6074(83)90151-5.
↑ Palmer, John F.; Morse, Stephen Paul (1984). The 8087 Primer (1st ed.). John Wiley & Sons Australia, Limited. ISBN 0471875694. 9780471875697. Retrieved 2016-01-02.
↑ Glass, L. Brent (January 1990). "Math Coprocessors: A look at what they do, and how they do it". Byte . 15 (1): 337–348. ISSN 0360-5280.
1 2 3 Jarvis, Pitts (1990-10-01). "Implementing CORDIC algorithms – A single compact routine for computing transcendental functions". Dr. Dobb's Journal : 152–156. Retrieved 2016-01-02.
1 2 Yuen, A. K. (1988). "Intel's Floating-Point Processors". Electro/88 Conference Record: 48/5/1–7.
↑ "Archived copy". cdn3.wccftech.com. Archived from the original on 9 May 2015. Retrieved 14 March 2022.{{cite web}}: CS1 maint: archived copy as title (link)
↑ "AMD unveils Flex FP". bit-tech.net. Retrieved 29 March 2018.
↑ PDP-11/45 Processor Handbook (PDF). Digital Equipment Corporation. 1973. Chapter 7 "Floating Point Processor".
1 2 3 4 PDP-11 Processor Handbook (PDF). Digital Equipment Corporation. 1979.
1 2 MICRO/PDP-11 Handbook (PDF). Digital Equipment Corporation. 1983. p. 33.
↑ VAX – Hardware Handbook Volume I – 1986 (PDF). Digital Equipment Corporation. 1985.
↑ VAX – Hardware Handbook Volume II – 1986 (PDF). Digital Equipment Corporation. 1986.
↑ "Western Electric 32206 co-processor". www.cpu-world.com. Retrieved 2021-11-06.