Pentium FDIV bug

Last updated

66 MHz Intel Pentium (sSpec=SX837) with the FDIV bug KL Intel Pentium A80501.jpg
66 MHz Intel Pentium (sSpec=SX837) with the FDIV bug

The Pentium FDIV bug is a hardware bug affecting the floating-point unit (FPU) of the early Intel Pentium processors. Because of the bug, the processor would return incorrect binary floating point results when dividing certain pairs of high-precision numbers. The bug was discovered in 1994 by Thomas R. Nicely, a professor of mathematics at Lynchburg College. [1] Missing values in a lookup table used by the FPU's floating-point division algorithm led to calculations acquiring small errors. While these errors would in most use-cases only occur rarely and result in small deviations from the correct output values, in certain circumstances the errors can occur frequently and lead to more significant deviations. [2]

Contents

The severity of the FDIV bug is debated. Though rarely encountered by most users ( Byte magazine estimated that 1 in 9 billion floating point divides with random parameters would produce inaccurate results), [3] both the flaw and Intel's initial handling of the matter were heavily criticized by the tech community.

In December 1994, Intel recalled the defective processors in what was the first full recall of a computer chip. [4] In its 1994 annual report, Intel said it incurred "a $475 million pre-tax charge ... to recover replacement and write-off of these microprocessors." [5]

Description

In order to improve the speed of floating-point division calculations on the Pentium chip over the 486DX, Intel opted to replace the shift-and-subtract division algorithm with the Sweeney, Robertson, and Tocher (SRT) algorithm. The SRT algorithm can generate two bits of the division result per clock cycle, whereas the 486's algorithm could only generate one. It is implemented using a programmable logic array with 2,048 cells, of which 1,066 cells should have been populated with one of five values: −2, −1, 0, +1, +2. When the original array for the Pentium was compiled, five values were not correctly sent to the equipment that etches the arrays into the chips – thus five of the array cells contained zero when they should have contained +2. [6]

As a result, calculations that rely on these five cells acquire errors; these errors can accumulate repeatedly owing to the recursive nature of the SRT algorithm. In pathological cases the error can reach the fourth significant digit of the result, although this is rare. The error is usually confined to the ninth or tenth significant digit. [3]

Only certain combinations of numerator and denominator trigger the bug. One commonly-reported example is dividing 4,195,835 by 3,145,727. Performing this calculation in any software that used the floating-point coprocessor, such as Windows Calculator, would allow users to discover whether their Pentium chip was affected. [7]

The correct value of the calculation is:

When converted to the hexadecimal value used by the processor, 4,195,835 = 0x4005FB and 3,145,727 = 0x2FFFFF. The "5" in 0x4005FB triggers the access to the "empty" array cells. As a result, the value returned by a flawed Pentium processor is incorrect at or beyond four digits: [8]

which is actually the value of 4,195,579/3,145,727 =4,195,835 - 256/3,145,727.

Discovery and response

Thomas Nicely, a professor of mathematics at Lynchburg College, had written code to enumerate primes, twin primes, prime triplets, and prime quadruplets. Nicely noticed some inconsistencies in the calculations on June 13, 1994, shortly after adding a Pentium system to his group of computers, but was unable to eliminate other factors (such as programming errors, motherboard chipsets, etc.) until October 19, 1994. [1] On October 24, 1994, he reported the issue to Intel. [9] Intel had reportedly become aware of the issue independently by June 1994, and had begun fixing it at this point, but chose not to publicly disclose any details or recall affected CPUs. [10]

On October 30, 1994, Nicely sent an email describing the bug to various academic contacts, requesting reports of testing for the flaw on 486-DX4s, Pentiums and Pentium clones. [9] The bug was quickly verified by others, and news of it spread quickly on the Internet. The bug acquired the name "Pentium FDIV bug" from the x86 assembly language mnemonic for floating-point division, the most frequently used instruction affected. [9]

The story first appeared in the press on November 7, 1994, in an article in Electronic Engineering Times , "Intel fixes a Pentium FPU glitch" by Alexander Wolfe, [11] and was subsequently picked up by CNN in a segment aired on November 22. It was also reported on by the New York Times and the Boston Globe, making the front page in the latter. [10] [12]

At this point, Intel acknowledged the floating-point flaw, but claimed that it was not serious and would not affect most users. Intel offered to replace processors to users who could prove that they were affected. However, although most independent estimates found that the bug would have a very limited impact on most users, it caused significant negative press for the company. During a 2019 talk, while reflecting on development of Quake , John Romero described how frequently and persistently this bug could be reproduced by describing behavior Michael Abrash spent hours tracking down that would result in parts of a game level appearing unexpectedly when viewed from certain camera angles. [13] IBM paused the sale of PCs containing Intel CPUs, and Intel's stock price decreased significantly. [14] The motive behind IBM's decision was questioned by some in the industry; IBM produced the PowerPC CPUs at the time, and potentially stood to benefit from any reputational damage to the Pentium or Intel as a company. However, the decision led to corporate buyers of PC equipment demanding replacements of existing Pentium CPUs, and soon afterwards other PC manufacturers began offering "no questions asked" replacements of flawed Pentium chips. [4]

The growing dissatisfaction with Intel's response led to the company offering to replace all flawed Pentium processors on request on December 20. [15] On January 17, 1995, Intel announced a pre-tax charge of $475 million against earnings, ostensibly the total cost associated with replacement of the flawed processors. [9] This is equivalent to $868 million in 2023. [16] Intel was criticised for barring resellers and OEMs from participating in the recall program, requiring end-users to replace chips themselves. Intel's justification for this, posted on its support web page, was that "it is the individual decision of the end user to determine if the flaw is affecting their application accuracy". [14]

A 1995 article in Science describes the value of number theory problems in discovering computer bugs and gives the mathematical background and history of Brun's constant, the problem Nicely was working on when he discovered the bug. [17]

Intel's response to the FDIV bug has been cited as a case of the public relations impact of a problem eclipsing the practical impact of said problem on customers. [18] While most users were unlikely to encounter the flaw in their day-to-day computing, the company's initial reaction to not replace chips unless customers could guarantee they were affected caused pushback from a vocal minority of industry experts. The subsequent publicity generated shook consumer confidence in the CPUs, and led to a demand for action even from people unlikely to be affected by the issue. Andy Grove, Intel's CEO at the time was quoted in The Wall Street Journal as saying "I think the kernel of the issue we missed ... was that we presumed to tell somebody what they should or shouldn't worry about, or should or shouldn't do". [4]

In the aftermath of the bug and subsequent recall, there was a marked increase in the use of formal verification of hardware floating point operations across the semiconductor industry. Prompted by the discovery of the bug, a technique applicable to the SRT algorithm called "word-level model checking" was developed in 1996. [19] Intel went on to use formal verification extensively in the development of later CPU architectures. In the development of the Pentium 4, symbolic trajectory evaluation and theorem proving were used to find a number of bugs that could have led to a similar recall incident had they gone undetected. [20] The first Intel microarchitecture to use formal verification as the primary method of validation was Nehalem, developed in 2008. [21]

Affected models

The FDIV bug affects the 60 and 66 MHz Pentium P5 800 in stepping levels prior to D1, and the 75, 90, and 100 MHz Pentium P54C 600 in steppings prior to B5. The 120 MHz P54C and P54CQS CPUs are unaffected. [22] [23]

Software patches

Various software patches were produced by manufacturers to work around the bug. One specific algorithm, outlined in a paper in IEEE Computational Science & Engineering, is to check for divisors that can trigger the access to the programmable logic array cells that erroneously contain zero, and if found, multiply both numerator and denominator by 15/16. This takes them out of the 'buggy' range. This fix does carry a measurable speed penalty - worst case for a program doing nothing but FDIV operations with bad divisors the running time would double since each FDIV would take about 80 instead of 40 clock cycles. With more random divisors the average time per FDIV was approximately 50 clock cycles, i.e. 10 cycles added to check the divisor: Only 5 out of 1024 random divisors would trigger the scaling fixup. Since FDIV is a rare operation in most programs, the normal slowdown with the fix installed was typically a percent or less. [8]

The main challenge faced by software companies was implementing the fix in pre-existing software, much of which relied on libraries outside their control. Some companies, such as Wolfram Research, opted to directly patch the machine code of existing executables to replace the FDIV opcode with an illegal instruction. This would then trigger an exception that an exception handler (also patched in) would catch. From here, arbitrary code could be executed to work around the bug. [2]

Microsoft offered operating system level workarounds in versions of Windows up to Windows XP. Utilities were included with the operating system to check for the presence of the bug and disable the FPU if found. [24] [25]

See also

Related Research Articles

i386 32-bit microprocessor by Intel

The Intel 386, originally released as 80386 and later renamed i386, is a 32-bit microprocessor introduced in 1985. The first versions had 275,000 transistors and were the central processing unit (CPU) of many workstations and high-end personal computers of the time.

i486 Successor to the Intel 386

The Intel 486, officially named i486 and also known as 80486, is a microprocessor. It is a higher-performance follow-up to the Intel 386. The i486 was introduced in 1989. It represents the fourth generation of binary compatible CPUs following the 8086 of 1978, the Intel 80286 of 1982, and 1985's i386.

i486SX

The i486SX was a microprocessor originally released by Intel in 1991. It was a modified Intel i486DX microprocessor with its floating-point unit (FPU) disabled. It was intended as a lower-cost CPU for use in low-end systems—selling for US$258—adapting the SX suffix of the earlier i386SX in order to connote a lower-cost option. However, unlike the i386SX, which had a 16-bit external data bus and a 24-bit external address bus, the i486SX was entirely 32-bit.

<span class="mw-page-title-main">Pentium (original)</span> Intel microprocessor

The Pentium is a x86 microprocessor introduced by Intel on March 22, 1993. It is the first CPU using the Pentium brand. Considered the fifth generation in the 8086 compatible line of processors, its implementation and microarchitecture was internally called P5.

x86 Family of instruction set architectures

x86 is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was introduced in 1978 as a fully 16-bit extension of Intel's 8-bit 8080 microprocessor, with memory segmentation as a solution for addressing more memory than can be covered by a plain 16-bit address. The term "x86" came into being because the names of several successors to Intel's 8086 processor end in "86", including the 80186, 80286, 80386 and 80486 processors. Colloquially, their names were "186", "286", "386" and "486".

<span class="mw-page-title-main">Floating-point unit</span> Part of a computer system

A floating-point unit is a part of a computer system specially designed to carry out operations on floating-point numbers. Typical operations are addition, subtraction, multiplication, division, and square root. Some FPUs can also perform various transcendental functions such as exponential or trigonometric calculations, but the accuracy can be low, so some systems prefer to compute these functions in software.

<span class="mw-page-title-main">MMX (instruction set)</span> Instruction set designed by Intel

MMX is a single instruction, multiple data (SIMD) instruction set architecture designed by Intel, introduced on January 8, 1997 with its Pentium P5 (microarchitecture) based line of microprocessors, named "Pentium with MMX Technology". It developed out of a similar unit introduced on the Intel i860, and earlier the Intel i750 video pixel processor. MMX is a processor supplementary capability that is supported on IA-32 processors by Intel and other vendors as of 1997. AMD also added MMX instruction set in its K6 processor.

<span class="mw-page-title-main">Cyrix</span> American microprocessor developer

Cyrix Corporation was a microprocessor developer that was founded in 1988 in Richardson, Texas, as a specialist supplier of floating point units for 286 and 386 microprocessors. The company was founded by Tom Brightman and Jerry Rogers.

<span class="mw-page-title-main">Pentium Pro</span> Sixth-generation x86 microprocessor by Intel

The Pentium Pro is a sixth-generation x86 microprocessor developed and manufactured by Intel and introduced on November 1, 1995. It introduced the P6 microarchitecture and was originally intended to replace the original Pentium in a full range of applications. Later, it was reduced to a more narrow role as a server and high-end desktop processor. The Pentium Pro was also used in supercomputers, most notably ASCI Red, which used two Pentium Pro CPUs on each computing nodes and was the first computer to reach over one teraFLOPS in 1996, holding the number one spot in the TOP500 list from 1997 to 2000.

SSE2 is one of the Intel SIMD processor supplementary instruction sets introduced by Intel with the initial version of the Pentium 4 in 2000. It extends the earlier SSE instruction set, and is intended to fully replace MMX. Intel extended SSE2 to create SSE3 in 2004. SSE2 added 144 new instructions to SSE, which has 70 instructions. Competing chip-maker AMD added support for SSE2 with the introduction of their Opteron and Athlon 64 ranges of AMD64 64-bit CPUs in 2003.

<span class="mw-page-title-main">Coprocessor</span> Type of computer processor

A coprocessor is a computer processor used to supplement the functions of the primary processor. Operations performed by the coprocessor may be floating-point arithmetic, graphics, signal processing, string processing, cryptography or I/O interfacing with peripheral devices. By offloading processor-intensive tasks from the main processor, coprocessors can accelerate system performance. Coprocessors allow a line of computers to be customized, so that customers who do not need the extra performance do not need to pay for it.

The x86 instruction set refers to the set of instructions that x86-compatible microprocessors support. The instructions are usually part of an executable program, often stored as a computer file and executed on the processor.

<span class="mw-page-title-main">Intel 8087</span> Floating-point microprocessor made by Intel

The Intel 8087, announced in 1980, was the first floating-point coprocessor for the 8086 line of microprocessors. The purpose of the chip was to speed up floating-point arithmetic operations, such as addition, subtraction, multiplication, division, and square root. It also computes transcendental functions such as exponential, logarithmic or trigonometric calculations. The performance enhancements were from approximately 20% to over 500%, depending on the specific application. The 8087 could perform about 50,000 FLOPS using around 2.4 watts.

<span class="mw-page-title-main">Erratum</span> Correction of a published text

An erratum or corrigendum is a correction of a published text. As a general rule, publishers issue an erratum for a production error and a corrigendum for an author's error. It is usually bound into the back of a book, but for a single error a slip of paper detailing a corrigendum may be bound in before or after the page on which the error appears. An erratum may also be issued shortly after its original text is published.

<span class="mw-page-title-main">RapidCAD</span>

RapidCAD is a specially packaged Intel 486DX and a dummy floating point unit (FPU) designed as pin-compatible replacements for an Intel 80386 processor and 80387 FPU. Because the i486DX has a working on-chip FPU, a dummy FPU package is supplied to go in the Intel 387 FPU socket. The dummy FPU is used to provide the FERR signal, necessary for compatibility purposes.

The Pentium F00F bug is a design flaw in the majority of Intel Pentium, Pentium MMX, and Pentium OverDrive processors. Discovered in 1997, it can result in the processor ceasing to function until the computer is physically rebooted. The bug has been circumvented through operating system updates.

x87 is a floating-point-related subset of the x86 architecture instruction set. It originated as an extension of the 8086 instruction set in the form of optional floating-point coprocessors that work in tandem with corresponding x86 CPUs. These microchips have names ending in "87". This is also known as the NPX. Like other extensions to the basic instruction set, x87 instructions are not strictly needed to construct working programs, but provide hardware and microcode implementations of common numerical tasks, allowing these tasks to be performed much faster than corresponding machine code routines can. The x87 instruction set includes instructions for basic floating-point operations such as addition, subtraction and comparison, but also for more complex numerical operations, such as the computation of the tangent function and its inverse, for example.

<span class="mw-page-title-main">Meltdown (security vulnerability)</span> Microprocessor security vulnerability

Meltdown is one of the two original transient execution CPU vulnerabilities. Meltdown affects Intel x86 microprocessors, IBM POWER processors, and some ARM-based microprocessors. It allows a rogue process to read all memory, even when it is not authorized to do so.

Intel microcode is microcode that runs inside x86 processors made by Intel. Since the P6 microarchitecture introduced in the mid-1990s, the microcode programs can be patched by the operating system or BIOS firmware to work around bugs found in the CPU after release. Intel had originally designed microcode updates for processor debugging under its design for testing (DFT) initiative.

The Intel 8231 and 8232 were early designs of floating-point maths coprocessors (FPUs), marketed for use with their i8080 line of primary CPUs. They were licensed versions of AMD's Am9511 and Am9512 FPUs, from 1977 and 1979, themselves claimed by AMD as the world's first single-chip FPU solutions.

References

  1. 1 2 Edelman, Alan (January 1, 1997). "The Mathematics of the Pentium Division Bug" (PDF). SIAM Review. 39 (1): 54–67. Bibcode:1997SIAMR..39...54E. doi:10.1137/S0036144595293959 . Retrieved April 11, 2021.
  2. 1 2 "'A Discussion of and Fix for the Pentium FDIV Bug' from the Notebook Archive (2002)". notebookarchive.org. Wolfram Research, Inc. Retrieved April 11, 2021.
  3. 1 2 Tom R. Halfhill (March 1995). "An error in a lookup table created the infamous bug in Intel's latest processor". BYTE . No. March 1995. Archived from the original on February 9, 2006. Retrieved December 19, 2006.
  4. 1 2 3 Carlton, Jim; Yoder, Stephen K. (December 21, 1994). "Computers: Humble Pie: Intel to Replace its Pentium Chips". The Wall Street Journal (Eastern ed.). p. B1.
  5. "1994 - Annual Report". Intel. June 20, 2020. Archived from the original on February 26, 2017. Retrieved June 20, 2020.
  6. Sharangpani, H. P.; Barton, M. L. (November 30, 1994). Statistical Analysis of Floating Point Flaw in the Pentium Processor (1994) (PDF) (Report). Intel Corporation. Archived from the original (PDF) on March 19, 2022. Retrieved April 11, 2021.
  7. "Pentium FDIV bug – a Picture". Kansas University Institute for Policy and Social Research. November 30, 1994. Retrieved November 3, 2010.
  8. 1 2 Coe, T.; Mathisen, T.; Moler, C.; Pratt, V. (1995). "Computational aspects of the Pentium affair" (PDF). IEEE Computational Science and Engineering. 2 (1): 18–30. doi:10.1109/99.372929 . Retrieved April 13, 2021.
  9. 1 2 3 4 Nicely, Thomas (August 19, 2011). "Pentium FDIV flaw FAQ". trnicely.net. Archived from the original on June 18, 2019. Retrieved June 18, 2019.
  10. 1 2 Markoff, John (November 24, 1994). "COMPANY NEWS; Flaw Undermines Accuracy of Pentium Chips". The New York Times. Retrieved April 11, 2021.
  11. Alexander Wolfe (November 9, 1994). "Intel fixes a Pentium FPU glitch". Electronic Engineering Times.
  12. Moler, Cleve (Winter 1995). "A Tale of Two Numbers" (PDF). MATLAB News and Notes. MathWorks. Retrieved April 21, 2021.
  13. "BTD12: The Programming Principles of Id Software". TNG Technology Consulting GmbH. August 6, 2019. Retrieved July 17, 2023.
  14. 1 2 Yeraswork, Zewde (March 30, 2011). "Lessons Learned: Pentium Flaws Aid Intel In Sandy Bridge Chipset Recall". CRN. Retrieved April 11, 2021.
  15. "Intel adopts upon-request replacement policy on Pentium processors with floating point flaw; Will take Q4 charge against earnings". Business Wire. December 20, 1994. Archived from the original on July 10, 2012. Retrieved December 24, 2006.
  16. Johnston, Louis; Williamson, Samuel H. (2023). "What Was the U.S. GDP Then?". MeasuringWorth . Retrieved November 30, 2023. United States Gross Domestic Product deflator figures follow the MeasuringWorth series.
  17. Cipra, Barry Arthur (January 13, 1995). "How number theory got the best of the Pentium chip". Science. 267 (5195): 175. Bibcode:1995Sci...267..175C. doi:10.1126/science.267.5195.175. PMID   17791336. S2CID   19898103.
  18. Price, D. (April 1995). "Pentium FDIV flaw-lessons learned". IEEE Micro. 15 (2): 86–88. doi:10.1109/40.372360.
  19. Clarke, E. M.; Khaira, M.; Zhao, X. (1996). "Word level model checking---avoiding the Pentium FDIV error". Proceedings of the 33rd annual conference on Design automation conference - DAC '96. pp. 645–648. doi:10.1145/240518.240640. ISBN   0897917790. S2CID   2500033 . Retrieved April 29, 2021.
  20. O'Leary, J. (2004). "Formal verification in intel cpu design". Proceedings. Second ACM and IEEE International Conference on Formal Methods and Models for Co-Design, 2004. MEMOCODE '04. p. 152. doi:10.1109/MEMCOD.2004.1459841. ISBN   0-7803-8509-8 . Retrieved April 29, 2021.
  21. Kaivola, Roope; Ghughal, Rajnish; Narasimhan, Naren; Telfer, Amber; Whittemore, Jesse; Pandav, Sudhindra; Slobodová, Anna; Taylor, Christopher; Frolov, Vladimir; Reeber, Erik; Naik, Armaghan (2009). "Replacing Testing with Formal Verification in Intel® Core™ i7 Processor Execution Engine Validation". Computer Aided Verification. 5643: 414–429. doi: 10.1007/978-3-642-02658-4_32 .
  22. "P5 (586) Fifth-Generation Processors | Microprocessor Types and Specifications | InformIT". www.informit.com. June 8, 2001. Retrieved April 13, 2021.
  23. "FDIV Replacement Program: Frequently asked questions". Intel. March 20, 2009. Solution ID CS-012748. Archived from the original on May 11, 2009. Retrieved November 10, 2009.
  24. Slob, Arie. "Windows 95 Troubleshooting: How to Check for a Faulty Math Coprocessor". www.helpwithwindows.com. Retrieved April 23, 2019.
  25. "Pentnt". Microsoft TechNet . Microsoft. September 11, 2009. Retrieved April 23, 2019.