This section needs additional citations for verification .(March 2014)
|Launched||November 1, 1995|
|Max. CPU clock rate||150 MHz to 200 MHz|
|FSB speeds||60 MHz to 66 MHz|
|Architecture and classification|
|Min. feature size||0.35 μm to 0.50 μm|
|Successor||Pentium II Xeon|
The Pentium Pro is a sixth-generation x86 microprocessor developed and manufactured by Intel and introduced on November 1, 1995. million transistors, respectively, the Pentium Pro contained 5.5 million transistors. Later, it was reduced to a more narrow role as a server and high-end desktop processor and was used in supercomputers like ASCI Red, the first computer to reach the trillion floating point operations per second (teraFLOPS) performance mark. The Pentium Pro was capable of both dual- and quad-processor configurations. It only came in one form factor, the relatively large rectangular Socket 8. The Pentium Pro was succeeded by the Pentium II Xeon in 1998.It introduced the P6 microarchitecture (sometimes termed i686) and was originally intended to replace the original Pentium in a full range of applications. While the Pentium and Pentium MMX had 3.1 and 4.5
The lead architect of Pentium Pro was Fred Pollack who was specialized in superscalarity and had also worked as the lead engineer of the Intel iAPX 432.
The Pentium Pro incorporated a new microarchitecture, different from the Pentium's P5 microarchitecture. It has a decoupled, 14-stage superpipelined architecture which used an instruction pool. The Pentium Pro (P6) featured many advanced concepts not found in the Pentium, although it wasn't the first or only x86 processor to implement them (see NexGen Nx586 or Cyrix 6x86). The Pentium Pro pipeline had extra decode stages to dynamically translate IA-32 instructions into buffered micro-operation sequences which could then be analysed, reordered, and renamed in order to detect parallelizable operations that may be issued to more than one execution unit at once. The Pentium Pro thus featured out of order execution, including speculative execution via register renaming. It also had a wider 36-bit address bus, usable by Physical Address Extension (PAE), allowing it to access up to 64 GB of memory.
The Pentium Pro has an 8 KB instruction cache, from which up to 16 bytes are fetched on each cycle and sent to the instruction decoders. There are three instruction decoders. The decoders are unequal in ability: only one can decode any x86 instruction, while the other two can only decode simple x86 instructions. This restricts the Pentium Pro's ability to decode multiple instructions simultaneously, limiting superscalar execution. x86 instructions are decoded into 118-bit micro-operations (micro-ops). The micro-ops are reduced instruction set computer (RISC)-like; that is, they encode an operation, two sources, and a destination. The general decoder can generate up to four micro-ops per cycle, whereas the simple decoders can generate one micro-op each per cycle. Thus, x86 instructions that operate on the memory (e.g., add this register to this location in the memory) can only be processed by the general decoder, as this operation requires a minimum of three micro-ops. Likewise, the simple decoders are limited to instructions that can be translated into one micro-op. Instructions that require more micro-ops than four are translated with the assistance of a sequencer, which generates the required micro-ops over multiple clock cycles. The Pentium Pro was the first processor in the x86-family to support upgradeable microcode under BIOS and/or operating system (OS) control.
Micro-ops exit the re-order buffer (ROB) and enter a reserve station (RS), where they await dispatch to the execution units. In each clock cycle, up to five micro-ops can be dispatched to five execution units. The Pentium Pro has a total of six execution units: two integer units, one floating-point unit (FPU), a load unit, store address unit, and a store data unit.One of the integer units shares the same ports as the FPU, and therefore the Pentium Pro can only dispatch one integer micro-op and one floating-point micro-op, or two integer micro-ops per a cycle, in addition to micro-ops for the other three execution units. Of the two integer units, only the one that shares the path with the FPU on port 0 has the full complement of functions such as a barrel shifter, multiplier, divider, and support for LEA instructions. The second integer unit, which is connected to port 1, does not have these facilities and is limited to simple operations such as add, subtract, and the calculation of branch target addresses.
The FPU executes floating-point operations. Addition and multiplication are pipelined and have a latency of three and five cycles, respectively. Division and square-root are not pipelined and are executed in separate units that share the FPU's ports. Division and square root have a latency of 18-36 and 29-69 cycles, respectively. The smallest number is for single precision (32-bit) floating-point numbers and the largest for extended precision (80-bit) numbers. Division and square root can operate simultaneously with adds and multiplies, preventing them from executing only when the result has to be stored in the ROB.
After the microprocessor was released, a bug was discovered in the floating point unit, commonly called the "Pentium Pro and Pentium II FPU bug" and by Intel as the "flag erratum". The bug occurs under some circumstances during floating point-to-integer conversion when the floating point number won't fit into the smaller integer format, causing the FPU to deviate from its documented behaviour. The bug is considered to be minor and occurs under such special circumstances that very few, if any, software programs are affected.
The Pentium Pro P6 microarchitecture was used in one form or another by Intel for more than a decade. The pipeline would scale from its initial 150 MHz start, all the way up to 1.4 GHz with the "Tualatin" Pentium III. The design's various traits would continue after that in the derivative core called "Banias" in Pentium M and Intel Core (Yonah), which itself would evolve into the Core microarchitecture (Core 2 processor) in 2006 and onward.
The Pentium Pro (P6) introduced new instructions into the Intel range; the CMOVxx (‘conditional move’) instructions can move a value that is either the contents of a register or memory location into another register or not, according to some predicate logical condition xx on the flags register, xx being a flags predicate code as given in the condition for conditional jump instructions. So for example CMOVNE moves a specified value into a register or not depending on whether the NE (not-equal) condition is true in the flags register ie Z flag = 0. This allows the evaluation of if-then-else operations and for example the ? : operation in C. These instructions give a performance boost by allowing the avoidance of costly jump and branch instructions. In eg CMOVxx destreg1, source_operand2 the first operand is the destination register, the second the source register or memory location. The second operand unfortunately can not be an immediate (in-line constant) value and such a constant would have to be placed in a register first. The predicate code xx can take the full range of values as allowed in conditional branches.
A second development was the documentation of the UD2 illegal instruction. This op code is reserved and guaranteed to cause an illegal instruction exception on the P6 and all later processors. This allows developers to easily crash the current program in a future-proof fashion when a bug is detected by software.
Despite being advanced for the time, the Pentium Pro's out-of-order register renaming architecture had trouble running 16-bit code and mixed code (8-bit with 16-bit (8/16), or 16-bit with 32-bit (16/32), as using partial registers cause frequent pipeline flushing.Specific use of partial registers was then a common performance optimization, as it incurred no performance penalty on pre-P6 Intel processors; also, the dominant operating systems at the time of the Pentium Pro's release were 16-bit DOS, and mixed 16/32-bit Windows 3.1x and Windows 95 (although the latter requires a 32-bit 80386 CPU, much of its code is still 16-bit for performance reasons, such as USER.exe). This, with the high cost of Pentium Pro systems, led to tepid sales among PC buyers at the time. To fully use the Pentium Pro's P6 microarchitecture, a fully 32-bit operating system is needed, such as Windows NT, Linux, Unix, or OS/2. The performance issues on legacy code were later partly mitigated by Intel with the Pentium II.
Compared to RISC microprocessors, the Pentium Pro, when introduced, slightly outperformed the fastest RISC microprocessors on integer performance when running the SPECint95 benchmark,but floating-point performance was significantly lower, half that of some RISC microprocessors. The Pentium Pro's integer performance lead disappeared rapidly, first overtaken by the MIPS Technologies R10000 in January 1996, and then by Digital Equipment Corporation's EV56 variant of the Alpha 21164.
Reviewers quickly noted the very slow writes to video memory as the weak spot of the P6 platform, with performance here being as low as 10% of an identically clocked Pentium system in benchmarks such as VIDSPEED. Methods to circumvent this included setting VESA drawing to system memory instead of video memory in games such as Quake ,and later on utilities such as FASTVID emerged, which could double performance in certain games by enabling the write combining features of the CPU. memory type range registers (MTRRs) are set automatically by Windows video drivers starting from ~1997, and there the improved cache/memory subsystem and FPU performance caused it to outclass the Pentium clock-for-clock in the emerging 3D games of the mid–to–late 1990s, particularly when using NT4. However, its lack of MMX implementation reduces performance in multimedia applications that made use of those instructions.
Likely Pentium Pro's most noticeable addition was its on-package L2 cache, which ranged from 256 KB at introduction to 1 MB in 1997. At the time, manufacturing technology did not feasibly allow a large L2 cache to be integrated into the processor core. Intel instead placed the L2 die(s) separately in the package which still allowed it to run at the same clock speed as the CPU core. Additionally, unlike most motherboard-based cache schemes that shared the main system bus with the CPU, the Pentium Pro's cache had its own back-side bus (called dual independent bus by Intel). Because of this, the CPU could read main memory and cache concurrently, greatly reducing a traditional bottleneck. The cache was also "non-blocking", meaning that the processor could issue more than one cache request at a time (up to 4), reducing cache-miss penalties. (This is an example of MLP, Memory Level Parallelism.) These properties combined to produce an L2 cache that was immensely faster than the motherboard-based caches of older processors. This cache alone gave the CPU an advantage in input/output performance over older x86 CPUs. In multiprocessor configurations, Pentium Pro's integrated cache skyrocketed performance in comparison to architectures which had each CPU sharing a central cache.
However, this far faster L2 cache did come with some complications. The Pentium Pro's "on-package cache" arrangement was unique. The processor and the cache were on separate dies in the same package and connected closely by a full-speed bus. The two or three dies had to be bonded together early in the production process, before testing was possible. This meant that a single, tiny flaw in either die made it necessary to discard the entire assembly, which was one of the reasons for the Pentium Pro's relatively low production yield and high cost. All versions of the chip were expensive, those with 1024 KB being particularly so, since it required two 512 KB cache dies as well as the processor die.
Pentium Pro clock speeds were 150, 166, 180 or 200 MHz with a 60 or 66 MHz external bus clock. Some users chose to overclock their Pentium Pro chips, with the 200 MHz version often being run at 233 MHz, the 180 MHz version often being run at 200 MHz, and the 150 MHz version often being run at 166 MHz. The chip was popular in symmetric multiprocessing configurations, with dual and quad SMP server and workstation setups being commonplace.
In Intel's "Family/Model/Stepping" scheme, the Pentium Pro is family 6, model 1, and its Intel Product code is 80521.
The process used to fabricate the Pentium Pro processor die and its separate cache memory die changed, leading to a combination of processes used in the same package:
The Pentium Pro (up to 512 KB cache) is packaged in a ceramic multi-chip module (MCM). The MCM contains two underside cavities in which the microprocessor die and its companion cache die reside. The dies are bonded to a heat slug, whose exposed top helps the heat from the dies to be transferred more directly to cooling apparatus such as a heat sink. The dies are connected to the package using conventional wire bonding. The cavities are capped with a ceramic plate.
The Pentium Pro with 1 MB of cache uses a plastic MCM. Instead of two cavities, there is only one, in which the three dies reside, bonded to the package instead of a heat slug. The cavities are filled in with epoxy.
The MCM has 387 pins, of which approximately half are arranged in a pin grid array (PGA) and half in an interstitial pin grid array (IPGA). The packaging was designed for Socket 8.
In 1998, the 300/333 MHz Pentium II Overdrive processor for Socket 8 was released. Featuring 512 KB of full-speed cache, it was produced by Intel as a drop-in upgrade option for owners of Pentium Pro systems. However, it only supported two-way glueless multiprocessing, not four-way or higher, which did not make it a usable upgrade for quad-processor systems. These specially packaged Pentium II Xeon processors were used to upgrade ASCI Red, which became the first computer to reach the teraFLOPS performance mark with the Pentium Pro processor and then the first to exceed 2 teraFLOPS after the upgrade to Pentium II Xeon processors.
As Slot 1 motherboards became prevalent, several manufacturers released slocket adapters, such as the Tyan M2020, Asus C-P6S1, Tekram P6SL1, and the Abit KP6. The slockets allowed Pentium Pro processors to be used with Slot 1 motherboards. The Intel 440FX chipset explicitly supported both Pentium Pro and Pentium II processors, but the Intel 440BX and later Slot 1 chipsets did not explicitly support the Pentium Pro, so the Socket 8 slockets did not see wide use. Slockets, in the form of Socket 370 to Slot 1 adapters, saw renewed popularity when Intel introduced Socket 370 Celeron and Pentium III processors.
The Pentium Pro used GTL+ signaling in its front-side bus.The Pentium Pro could be used by itself on up to four-way designs. Eight-way Pentium Pro computers were also built, but these used multiple buses.
The design of the Pentium Pro bus was influenced by Futurebus, the Intel iAPX 432 bus, and elements of the Intel i960 bus.Futurebus has been intended as an advanced bus to replace VMEbus used with the Motorola 68000 from the late 1970s, but it stagnated in standardization committee for more than a decade if you count all the twists and turns. Intel's iAPX 432 initiative was also a commercial failure, but in the process they did learn how to build a split-transaction bus to support a cacheless multiprocessor system. The i960 had further developed the split-transaction iAPX 432 bus to include a cache coherency protocol, ending up with a feature set highly reminiscent of the original Futurebus ambitions.
The lead architect of i960 was superscalarity specialist Fred Pollack who was also the lead engineer of the Intel iAPX 432 and the lead architect of the i686 chip, the Pentium Pro. He was no doubt intimately familiar with all this history. The Pentium Pro was designed to include the 4-way SMP split-transaction cache-coherent bus as a mandatory feature of every chip produced.This also served to deny competition access to the socket to produce cloned processors.
While the Pentium Pro was not successful as a machine for the masses, due to poor 16-bit support for Windows 95, it did become highly successful in the file server space due to its advanced, integrated bus design,introducing many advanced features that had formerly only been available in the pricey workstation segment into the commodity marketplace.
Athlon is the brand name applied to a series of x86-compatible microprocessors designed and manufactured by Advanced Micro Devices (AMD). The original Athlon was the first seventh-generation x86 processor and was the first desktop processor to reach speeds of one gigahertz (GHz). It made its debut as AMD's high-end processor brand on June 23, 1999. Over the years AMD has used the Athlon name with the 64-bit Athlon 64 architecture, the Athlon II, and Accelerated Processing Unit (APU) chips targeting the Socket AM1 desktop SoC architecture, and Socket AM4 Zen microarchitecture. The modern Zen-based Athlon with a Radeon Graphics processor was introduced in 2019 as AMD's highest-performance entry-level processor.
The Cyrix 6x86 is a sixth-generation, 32-bit x86 microprocessor designed by Cyrix and manufactured by IBM and SGS-Thomson. It was originally released in 1996.
The Intel 486, officially named i486 and also known as 80486, is a higher-performance follow-up to the Intel 386 microprocessor. The i486 was introduced in 1989 and was the first tightly pipelined x86 design as well as the first x86 chip to use more than a million transistors, due to a large on-chip cache and an integrated floating-point unit. It represents a fourth generation of binary compatible CPUs since the original 8086 of 1978.
The K6 microprocessor was launched by AMD in 1997. The main advantage of this particular microprocessor is that it was designed to fit into existing desktop designs for Pentium-branded CPUs. It was marketed as a product that could perform as well as its Intel Pentium II equivalent but at a significantly lower price. The K6 had a considerable impact on the PC market and presented Intel with serious competition.
The original Pentium microprocessor was introduced by Intel on March 22, 1993. It was instruction set compatible with the 80486 but was a new and very different microarchitecture design. The P5 Pentium was the first superscalar x86 microarchitecture and the world’s first superscalar microprocessor to be in mass production. It included dual integer pipelines, a faster floating-point unit, wider data bus, separate code and data caches, and many other techniques and features to enhance performance and support security, encryption, and multiprocessing, for workstations and servers.
x86 is a family of instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was introduced in 1978 as a fully 16-bit extension of Intel's 8-bit 8080 microprocessor, with memory segmentation as a solution for addressing more memory than can be covered by a plain 16-bit address. The term "x86" came into being because the names of several successors to Intel's 8086 processor end in "86", including the 80186, 80286, 80386 and 80486 processors.
The Pentium II brand refers to Intel's sixth-generation microarchitecture ("P6") and x86-compatible microprocessors introduced on May 7, 1997. Containing 7.5 million transistors, the Pentium II featured an improved version of the first P6-generation core of the Pentium Pro, which contained 5.5 million transistors. However, its L2 cache subsystem was a downgrade when compared to the Pentium Pros.
The Pentium III brand refers to Intel's 32-bit x86 desktop and mobile microprocessors based on the sixth-generation P6 microarchitecture introduced on February 26, 1999. The brand's initial processors were very similar to the earlier Pentium II-branded microprocessors. The most notable differences were the addition of the Streaming SIMD Extensions (SSE) instruction set, and the introduction of a controversial serial number embedded in the chip during manufacturing.
Intel's i960 was a RISC-based microprocessor design that became popular during the early 1990s as an embedded microcontroller. It became a best-selling CPU in that segment, along with the competing AMD 29000. In spite of its success, Intel stopped marketing the i960 in the late 1990s, as a result of a settlement with DEC whereby Intel received the rights to produce the StrongARM CPU. The processor continues to be used for a few military applications.
The NetBurst microarchitecture, called P68 inside Intel, was the successor to the P6 microarchitecture in the x86 family of central processing units (CPUs) made by Intel. The first CPU to use this architecture was the Willamette-core Pentium 4, released on November 20, 2000 and the first of the Pentium 4 CPUs; all subsequent Pentium 4 and Pentium D variants have also been based on NetBurst. In mid-2004, Intel released the Foster core, which was also based on NetBurst, thus switching the Xeon CPUs to the new architecture as well. Pentium 4-based Celeron CPUs also use the NetBurst architecture.
The K5 is AMD's first x86 processor to be developed entirely in-house. Introduced in March 1996, its primary competition was Intel's Pentium microprocessor. The K5 was an ambitious design, closer to a Pentium Pro than a Pentium regarding technical solutions and internal architecture. However, the final product was closer to the Pentium regarding performance, although faster clock-for-clock compared to the Pentium.
The P6 microarchitecture is the sixth-generation Intel x86 microarchitecture, implemented by the Pentium Pro microprocessor that was introduced in November 1995. It is frequently referred to as i686. It was succeeded by the NetBurst microarchitecture in 2000, but eventually revived in the Pentium M line of microprocessors. The successor to the Pentium M variant of the P6 microarchitecture is the Core microarchitecture which in turn is also derived from the P6 microarchitecture.
The Intel Core microarchitecture is a multi-core processor microarchitecture unveiled by Intel in Q1 2006. It is based on the Yonah processor design and can be considered an iteration of the P6 microarchitecture introduced in 1995 with Pentium Pro. High power consumption and heat intensity, the resulting inability to effectively increase clock rate, and other shortcomings such as an inefficient pipeline were the primary reasons why Intel abandoned the NetBurst microarchitecture and switched to a different architectural design, delivering high efficiency through a small pipeline rather than high clock rates. The Core microarchitecture initially did not reach the clock rates of the NetBurst microarchitecture, even after moving to 45 nm lithography. However after many generations of successor microarchitectures which used Core as their basis, Intel managed to eventually surpass the clock rates of Netburst with the Devil's Canyon microarchitecture reaching a base frequency of 4 GHz and a maximum tested frequency of 4.4 GHz using 22 nm lithography.
The AMD Bulldozer Family 15h is a microprocessor microarchitecture for the FX and Opteron line of processors, developed by AMD for the desktop and server markets. Bulldozer is the codename for this family of microarchitectures. It was released on October 12, 2011, as the successor to the K10 microarchitecture.
The TurboSPARC is a microprocessor that implements the SPARC V8 instruction set architecture (ISA) developed by Fujitsu Microelectronics, Inc. (FMI), the United States subsidiary of the Japanese multinational information technology equipment and services company Fujitsu Limited located in San Jose, California. It was a low-end microprocessor primarily developed as an upgrade for the Sun Microsystems microSPARC-II-based SPARCstation 5 workstation. It was introduced on 30 September 1996, with a 170 MHz version priced at US$499 in quantities of 1,000. The TurboSPARC was mostly succeeded in the low-end SPARC market by the UltraSPARC IIi in late 1997, but remained available.
Goldmont Plus is a microarchitecture for low-power Atom, Celeron and Pentium Silver branded processors used in systems on a chip (SoCs) made by Intel. The Gemini Lake platform with 14 nm Goldmont Plus core was officially launched on December 11, 2017. Intel launched Gemini Lake Refresh platform on November 4, 2019.