X87

Last updated

x87 is a floating-point-related subset of the x86 architecture instruction set. It originated as an extension of the 8086 instruction set in the form of optional floating-point coprocessors that work in tandem with corresponding x86 CPUs. These microchips have names ending in "87". This is also known as the NPX (Numeric Processor eXtension). Like other extensions to the basic instruction set, x87 instructions are not strictly needed to construct working programs, but provide hardware and microcode implementations of common numerical tasks, allowing these tasks to be performed much faster than corresponding machine code routines can. The x87 instruction set includes instructions for basic floating-point operations such as addition, subtraction and comparison, but also for more complex numerical operations, such as the computation of the tangent function and its inverse, for example.

Contents

Most x86 processors since the Intel 80486 have had these x87 instructions implemented in the main CPU, but the term is sometimes still used to refer to that part of the instruction set. Before x87 instructions were standard in PCs, compilers or programmers had to use rather slow library calls to perform floating-point operations, a method that is still common in (low-cost) embedded systems.

Description

The x87 registers form an eight-level deep non-strict stack structure ranging from ST(0) to ST(7) with registers that can be directly accessed by either operand, using an offset relative to the top, as well as pushed and popped. (This scheme may be compared to how a stack frame may be both pushed/popped and indexed.)

There are instructions to push, calculate, and pop values on top of this stack; unary operations (FSQRT, FPTAN etc.) then implicitly address the topmost ST(0), while binary operations (FADD, FMUL, FCOM, etc.) implicitly address ST(0) and ST(1). The non-strict stack model also allows binary operations to use ST(0) together with a direct memory operand or with an explicitly specified stack register, ST(x), in a role similar to a traditional accumulator (a combined destination and left operand). This can also be reversed on an instruction-by-instruction basis with ST(0) as the unmodified operand and ST(x) as the destination. Furthermore, the contents in ST(0) can be exchanged with another stack register using an instruction called FXCH ST(x).

These properties make the x87 stack usable as seven freely addressable registers plus a dedicated accumulator (or as seven independent accumulators). This is especially applicable on superscalar x86 processors (such as the Pentium of 1993 and later), where these exchange instructions (codes D9C8..D9CFh) are optimized down to a zero clock penalty by using one of the integer paths for FXCH ST(x) in parallel with the FPU instruction. Despite being natural and convenient for human assembly language programmers, some compiler writers have found it complicated to construct automatic code generators that schedule x87 code effectively. Such a stack-based interface potentially can minimize the need to save scratch variables in function calls compared with a register-based interface [1] (although, historically, design issues in the 8087 implementation limited that potential. [2] [3] )

The x87 provides single-precision, double-precision and 80-bit double-extended precision binary floating-point arithmetic as per the IEEE 754-1985 standard. By default, the x87 processors all use 80-bit double-extended precision internally (to allow sustained precision over many calculations, see IEEE 754 design rationale). A given sequence of arithmetic operations may thus behave slightly differently compared to a strict single-precision or double-precision IEEE 754 FPU. [4] As this may sometimes be problematic for some semi-numerical calculations written to assume double precision for correct operation, to avoid such problems, the x87 can be configured using a special configuration/status register to automatically round to single or double precision after each operation. Since the introduction of SSE2, the x87 instructions are not as essential as they once were, but remain important as a high-precision scalar unit for numerical calculations sensitive to round-off error and requiring the 64-bit mantissa precision and extended range available in the 80-bit format.

Performance

Clock cycle counts for examples of typical x87 FPU instructions (only register-register versions shown here). [5]

The A...B notation (minimum to maximum) covers timing variations dependent on transient pipeline status and the arithmetic precision chosen (32, 64 or 80 bits); it also includes variations due to numerical cases (such as the number of set bits, zero, etc.). The L → H notation depicts values corresponding to the lowest (L) and the highest (H) maximal clock frequencies that were available.

x87 implementationFADDFMULFDIVFXCHFCOMFSQRTFPTANFPATANMax clock
(MHz)
Peak FMUL
(millions/s)
FMUL§
rel. 5 MHz 8087
808770…10090…145193…20310…1540…50180…18630…540250…8005 → 100.034…0.055 → 0.100…0.1111 → 2× as fast
80287 (original)6 → 120.041…0.066 → 0.083…0.1331.2 → 2.4×
80387 (and later 287 models)23…3429…5788…911824122…129191…497314…48716 → 330.280…0.552 → 0.580…1.1~10 → 20×
80486 (or 80487)8…2016734483…87200…273218…30316 → 501.0 → 3.1~18 → 56×
Cyrix 6x86, Cyrix MII 4…74…624…342459…60117…12997…16166 → 30011…16 → 50…75~320 → 1400×
AMD K6 (including K6 II/III)2221…412321…41 ? ?166 → 55083 → 275~1500 → 5000×
Pentium / Pentium MMX1…31…3391 (0*) 1…47017…17319…13460 → 30020…60 → 100…300~1100 → 5400×
Pentium Pro 1…32…516…56128…68 ? ?150 → 20030…75 → 40…100~1400 → 1800×
Pentium II / III1…32…517…38127…50 ? ?233 → 140047…116 → 280…700~2100 → 13000×
Athlon (K7)1…41…413…241…216…35 ? ?500 → 2330125…500 → 580…2330~9000 → 42000×
Athlon 64 (K8)1000 → 3200250…1000 → 800…3200~18000 → 58000×
Pentium 4 1…52…720…43multiple
cycles
120…43 ? ?1300 → 3800186…650 → 543…1900~11000 → 34000×
* An effective zero clock delay is often possible, via superscalar execution.
§ The 5 MHz 8087 was the original x87 processor. Compared to typical software-implemented floating-point routines on an 8086 (without an 8087), the factors would be even larger, perhaps by another factor of 10 (i.e., a correct floating-point addition in assembly language may well consume over 1000 cycles).

Manufacturers

Companies that have designed or manufactured [lower-alpha 1] floating-point units compatible with the Intel 8087 or later models include AMD (287, 387, 486DX, 5x86, K5, K6, K7, K8), Chips and Technologies (the Super MATH coprocessors), Cyrix (the FasMath, Cx87SLC, Cx87DLC, etc., 6x86, Cyrix MII), Fujitsu (early Pentium Mobile etc.), Harris Semiconductor (manufactured 80387 and 486DX processors), IBM (various 387 and 486 designs), IDT (the WinChip, C3, C7, Nano, etc.), IIT (the 2C87, 3C87, etc.), LC Technology (the Green MATH coprocessors), National Semiconductor (the Geode GX1, Geode GXm, etc.), NexGen (the Nx587), Rise Technology (the mP6), ST Microelectronics (manufactured 486DX, 5x86, etc.), Texas Instruments (manufactured 486DX processors etc.), Transmeta (the TM5600 and TM5800), ULSI (the Math·Co coprocessors), VIA (the C3, C7, and Nano, etc.), Weitek (the 1067, 1167, 3167 and 4167), and Xtend (the 83S87SX-25 and other coprocessors).

Architectural generations

8087

The 8087 was the first math coprocessor for 16-bit processors designed by Intel. It was built to be paired with the Intel 8088 or 8086 microprocessors. (Intel's earlier 8231 and 8232 floating-point processors, marketed for use with the i8080 CPU, were in fact licensed versions of AMD's Am9511 and Am9512 FPUs from 1977 and 1979. [6] )

80C187

16 MHz version of the Intel 80C187 KL Intel 80C187.jpg
16 MHz version of the Intel 80C187

Although the original 1982 datasheet for the (NMOS based) 80188 and 80186 seem to mention specific math coprocessors, [7] both chips were actually paired with an 8087.

However, in 1987, to work with the refreshed CMOS based Intel 80C186 CPU, Intel introduced the 80C187 [8] math coprocessor. Although the 80C187 interface to the main processor is the same as that of the 8087, its core is essentially that of an 80387SX and is thus fully IEEE 754-compliant and capable of executing all the 80387's extra instructions. [9]

80287

The 80287 (i287) is the math coprocessor for the Intel 80286 series of microprocessors. Intel's models included variants with specified upper frequency limits ranging from 6 up to 12 MHz. The NMOS version were available 6, 8 and 10 MHz. [10] Later followed the i80287XL with 387 microarchitecture and the i80287XLT, a special version intended for laptops, as well as other variants. The available 10 MHz Intel 80287-10 Numerics Coprocessor version was for 250  USD in quantities of 100. [11] Both 80287XL and 80287XLT offered 50% better performance, 83% less power and additional instructions. [12]

The 80287XL is actually an 80387SX with a 287 pinout. [13] It contains an internal 3/2 multiplier, so that motherboards that ran the coprocessor at 2/3 CPU speed could instead run the FPU at the same speed of the CPU. Other 287 models with 387-like performance are the Intel 80C287, built using CHMOS III, and the AMD 80EC287 manufactured in AMD's CMOS process, using only fully static gates.

The 80287 and 80287XL work with the 80386 microprocessor and were initially the only coprocessors available for the 80386 until the introduction of the 80387 in 1987. Finally, they were able to work with the Cyrix Cx486SLC. However, for both of these chips the 80387 is strongly preferred for its higher performance and the greater capability of its instruction set.

80387

Intel 80387 CPU die image Intel 80387 CPU Die Image.jpg
Intel 80387 CPU die image

The 80387 (387 or i387) is the first Intel coprocessor to be fully compliant with the IEEE 754-1985 standard. Released in 1987, [14] two years after the 386 chip, the i387 includes much improved speed over Intel's previous 8087/80287 coprocessors and improved characteristics of its trigonometric functions. It was made available for USD $500 in quantities of 100. [15] Shortly afterwards, it was made available through Intel's Personal Computer Enhancement Operation for a retail market price of USD $795. [16] The 25 MHz version was available in retail channel for USD $1395. [17] The Intel M387 math coprocessor met under MIL-STD-883 Rev. C standard. This device was tested which includes temperature cycling between -55 and 125 °C, hermeticity sealed and extended burn-in. This military version operates at 16 MHz. This military version was available in 68-lead PGA and quad flatpack. This military version was available for USD $1155 in 100-unit of quantities for the PGA version. [18] The 33 MHz version of 387DX was available and it has the performance of 3.4 megawhetstones per second. [19] The 8087 and 80287's FPTAN and FPATAN instructions are limited to an argument in the range ±π/4 (±45°), and the 8087 and 80287 have no direct instructions for the SIN and COS functions. [20] [ full citation needed ]

Without a coprocessor, the 386 normally performs floating-point arithmetic through (relatively slow) software routines, implemented at runtime through a software exception handler. When a math coprocessor is paired with the 386, the coprocessor performs the floating-point arithmetic in hardware, returning results much faster than an (emulating) software library call.

The i387 is compatible only with the standard i386 chip, which has a 32-bit processor bus. The later cost-reduced i386SX, which has a narrower 16-bit data bus, can not interface with the i387's 32-bit bus. The i386SX requires its own coprocessor, the 80387SX, which is compatible with the SX's narrower 16-bit data bus. Intel released the low power version of 387SX coprocessor. [21]

80487

i487SX KL Intel i487SX.jpg
i487SX

The i487SX (P23N) was marketed as a floating-point unit coprocessor for Intel i486SX machines. It actually contained a full-blown i486DX implementation. When installed into an i486SX system, the i487 disabled the main CPU and took over all CPU operations. The i487 took measures to detect the presence of an i486SX and would not function without the original CPU in place. [22] [23] [ failed verification ]

80587

The Nx587 was the last FPU for x86 to be manufactured separately from the CPU, in this case NexGen's Nx586.

See also

Notes

  1. Fabless companies design a chip and rely on a fabbed company to manufacture it, while fabbed companies can do both the design and the manufacture by themselves.

Related Research Articles

<span class="mw-page-title-main">Intel 80286</span> Microprocessor model

The Intel 80286 is a 16-bit microprocessor that was introduced on February 1, 1982. It was the first 8086-based CPU with separate, non-multiplexed address and data buses and also the first with memory management and wide protection abilities. The 80286 used approximately 134,000 transistors in its original nMOS (HMOS) incarnation and, just like the contemporary 80186, it could correctly execute most software written for the earlier Intel 8086 and 8088 processors.

<span class="mw-page-title-main">Intel 8086</span> 16-bit microprocessor

The 8086 is a 16-bit microprocessor chip designed by Intel between early 1976 and June 8, 1978, when it was released. The Intel 8088, released July 1, 1979, is a slightly modified chip with an external 8-bit data bus, and is notable as the processor used in the original IBM PC design.

i386 32-bit microprocessor by Intel

The Intel 386, originally released as 80386 and later renamed i386, is a 32-bit microprocessor introduced in 1985. The first versions had 275,000 transistors and were the central processing unit (CPU) of many workstations and high-end personal computers of the time.

i486 Successor to the Intel 386

The Intel 486, officially named i486 and also known as 80486, is a microprocessor. It is a higher-performance follow-up to the Intel 386. The i486 was introduced in 1989. It represents the fourth generation of binary compatible CPUs following the 8086 of 1978, the Intel 80286 of 1982, and 1985's i386.

<span class="mw-page-title-main">Intel 80186</span> 16-bit microcontroller

The Intel 80186, also known as the iAPX 186, or just 186, is a microprocessor and microcontroller introduced in 1982. It was based on the Intel 8086 and, like it, had a 16-bit external data bus multiplexed with a 20-bit address bus.

<span class="mw-page-title-main">Microprocessor</span> Computer processor contained on an integrated-circuit chip

A microprocessor is a computer processor for which the data processing logic and control is included on a single integrated circuit (IC), or a small number of ICs. The microprocessor contains the arithmetic, logic, and control circuitry required to perform the functions of a computer's central processing unit (CPU). The IC is capable of interpreting and executing program instructions and performing arithmetic operations. The microprocessor is a multipurpose, clock-driven, register-based, digital integrated circuit that accepts binary data as input, processes it according to instructions stored in its memory, and provides results as output. Microprocessors contain both combinational logic and sequential digital logic, and operate on numbers and symbols represented in the binary number system.

x86 Family of instruction set architectures

x86 is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was introduced in 1978 as a fully 16-bit extension of Intel's 8-bit 8080 microprocessor, with memory segmentation as a solution for addressing more memory than can be covered by a plain 16-bit address. The term "x86" came into being because the names of several successors to Intel's 8086 processor end in "86", including the 80186, 80286, 80386 and 80486 processors. Colloquially, their names were "186", "286", "386" and "486".

<span class="mw-page-title-main">Floating-point unit</span> Part of a computer system

A floating-point unit is a part of a computer system specially designed to carry out operations on floating-point numbers. Typical operations are addition, subtraction, multiplication, division, and square root. Some FPUs can also perform various transcendental functions such as exponential or trigonometric calculations, but the accuracy can be low, so some systems prefer to compute these functions in software.

<span class="mw-page-title-main">MMX (instruction set)</span> Instruction set designed by Intel

MMX is a single instruction, multiple data (SIMD) instruction set architecture designed by Intel, introduced on January 8, 1997 with its Pentium P5 (microarchitecture) based line of microprocessors, named "Pentium with MMX Technology". It developed out of a similar unit introduced on the Intel i860, and earlier the Intel i750 video pixel processor. MMX is a processor supplementary capability that is supported on IA-32 processors by Intel and other vendors as of 1997.

<span class="mw-page-title-main">Cyrix</span> American microprocessor developer

Cyrix Corporation was a microprocessor developer that was founded in 1988 in Richardson, Texas, as a specialist supplier of floating point units for 286 and 386 microprocessors. The company was founded by Tom Brightman and Jerry Rogers.

SSE2 is one of the Intel SIMD processor supplementary instruction sets introduced by Intel with the initial version of the Pentium 4 in 2000. It extends the earlier SSE instruction set, and is intended to fully replace MMX. Intel extended SSE2 to create SSE3 in 2004. SSE2 added 144 new instructions to SSE, which has 70 instructions. Competing chip-maker AMD added support for SSE2 with the introduction of their Opteron and Athlon 64 ranges of AMD64 64-bit CPUs in 2003.

<span class="mw-page-title-main">Coprocessor</span> Type of computer processor

A coprocessor is a computer processor used to supplement the functions of the primary processor. Operations performed by the coprocessor may be floating-point arithmetic, graphics, signal processing, string processing, cryptography or I/O interfacing with peripheral devices. By offloading processor-intensive tasks from the main processor, coprocessors can accelerate system performance. Coprocessors allow a line of computers to be customized, so that customers who do not need the extra performance do not need to pay for it.

<span class="mw-page-title-main">Weitek</span>

Weitek Corporation was an American chip-design company that originally focused on floating-point units for a number of commercial CPU designs. During the early to mid-1980s, Weitek designs could be found powering a number of high-end designs and parallel-processing supercomputers.

The x86 instruction set refers to the set of instructions that x86-compatible microprocessors support. The instructions are usually part of an executable program, often stored as a computer file and executed on the processor.

<span class="mw-page-title-main">Intel 8087</span> Floating-point microprocessor made by Intel

The Intel 8087, announced in 1980, was the first floating-point coprocessor for the 8086 line of microprocessors. The purpose of the chip was to speed up floating-point arithmetic operations, such as addition, subtraction, multiplication, division, and square root. It also computes transcendental functions such as exponential, logarithmic or trigonometric calculations. The performance enhancements were from approximately 20% to over 500%, depending on the specific application. The 8087 could perform about 50,000 FLOPS using around 2.4 watts.

<span class="mw-page-title-main">Hauppauge Computer Works</span> Company focusing on computer software

Hauppauge Computer Works is a US manufacturer and marketer of electronic video hardware for personal computers. Although it is most widely known for its WinTV line of TV tuner cards for PCs, Hauppauge also produces personal video recorders, digital video editors, digital media players, hybrid video recorders and digital television products for both Windows and Mac. The company is named after the hamlet of Hauppauge, New York, in which it is based.

<span class="mw-page-title-main">Motorola 68881</span> Computer floating-point unit

The Motorola 68881 and Motorola 68882 are floating-point units (FPUs) used in some computer systems in conjunction with Motorola's 32-bit 68020 or 68030 microprocessors. These coprocessors are external chips, designed before floating point math became standard on CPUs. The Motorola 68881 was introduced in 1984. The 68882 is a higher performance version produced later.

<span class="mw-page-title-main">Intel 80387SX</span> Floating-point unit for the Intel 80386SX series of microprocessors

The Intel 80387SX is the math coprocessor, also called an FPU, for the Intel 80386SX microprocessor. Introduced in 1987, it was used to perform floating-point arithmetic operations directly in hardware. The coprocessor was designed only to work with the 386SX, rather than the standard 386DX. This was because the original 80387 could not communicate with the altered 16 bit data bus of the 386SX, which was modified from the original 386DX's 32 bit data bus. The 387SX uses a 68-pin PLCC socket, just like some variants of the 80286 and the less common 80186 CPU, and was made in speeds ranging from 16 MHz to 33 MHz, matching the clock speed range of the Intel manufactured 386SX. Some chips like the IIT 3C87SX could get up to 40 MHz, matching the clock speeds of the fastest 386SX CPUs.

Extended precision refers to floating-point number formats that provide greater precision than the basic floating-point formats. Extended precision formats support a basic format by minimizing roundoff and overflow errors in intermediate values of expressions on the base format. In contrast to extended precision, arbitrary-precision arithmetic refers to implementations of much larger numeric types using special software.

The Intel 8231 and 8232 were early designs of floating-point maths coprocessors (FPUs), marketed for use with their i8080 line of primary CPUs. They were licensed versions of AMD's Am9511 and Am9512 FPUs, from 1977 and 1979, themselves claimed by AMD as the world's first single-chip FPU solutions.

References

  1. William Kahan (2 November 1990). "On the advantages of 8087's stack" (PDF). Unpublished course notes, Computer Science Division, University of California at Berkeley. Archived from the original (PDF) on 18 January 2017.
  2. William Kahan (8 July 1989). "How Intel 8087 stack overflow/underflow should have been handled" (PDF). Archived from the original (PDF) on 12 June 2013.
  3. Jack Woehr (1 November 1997). "A conversation with William Kahan".
  4. David Monniaux (May 2008). "The pitfalls of verifying floating-point computations". ACM Transactions on Programming Languages and Systems . 30 (3): 1–41. arXiv: cs/0701192 . doi:10.1145/1353445.1353446. S2CID   218578808.
  5. Numbers are taken from respective processors' data sheets, programming manuals, and optimization manuals.
  6. "Arithmetic Processors: Then and Now". www.cpushack.com. 23 September 2010. Retrieved 3 May 2023.
  7. Intel (1983). Intel Microprocessor & Peripherals Handbook. pp. 3-25 (iAPX 186/20) and 3-106 (iAPX 188/20).
  8. "CPU Collection – Model 80187". cpu-info.com. Archived from the original on 23 July 2011. Retrieved 14 April 2018.
  9. "80C187 80-BIT MATH COPROCESSOR" (PDF). November 1992. Retrieved 3 May 2023.
  10. Yoshida, Stacy, "Math Coprocessors: Keeping Your Computer Up for the Count", Intel Corporation, Microcomputer Solutions, September/October 1990, page 16
  11. Intel Corporation, "New Product Focus Component: A 32-Bit Microprocessor With A Little Help From Some Friends", Special 32-Bit Issue Solutions, November/December 1985, page 13.
  12. Yoshida, Stacy, "Math Coprocessors: Keeping Your Computer Up for the Count", Intel Corporation, Microcomputer Solutions, September/October 1990, page 16
  13. Intel Corporation, "New Product Focus: Systems: SnapIn 386 Module Upgrades PS/2 PCs", Microcomputer Solutions, September/October 1991, page 12
  14. Moran, Tom (1987-02-16). "Chips to Improve Performance Of 386 Machines, Intel Says". InfoWorld . Vol. 9, no. 7. p. 5. ISSN   0199-6649.
  15. "New Product Focus Components: The 32-Bit Computing Engine Full Speed Ahead". Solutions. Intel Corporation: 10. May–June 1987.
  16. "NewsBit: Intel 80387 Available Through Retail Channels". Solutions. Intel Corporation: 1. July–August 1987.
  17. Intel Corporation, "NewsBits: 25 MHZ 80387 Available Through Retail Channels", Microcomputer Solutions, September/October 1988, page 1
  18. Intel Corporation, "Focus: Components: Militarized Peripherals Support M386 Microprocessor", Microcomputer Solutions, March/April 1989, page 12
  19. Lewnes, Ann, "The Intel386 Architecture Here to Stay", Intel Corporation, Microcomputer Solutions, July/August 1989, page 2
  20. Borland Turbo Assembler documentation.
  21. Lewnes, Ann, "The Intel386 Architecture Here to Stay", Intel Corporation, Microcomputer Solutions, July/August 1989, page 2
  22. Intel 487SX at the Free On-line Dictionary of Computing
  23. "Intel 80487". www.cpu-world.com. Retrieved 9 June 2021.