The AMD Am29000, commonly shortened to 29k, is a family of 32-bit RISC microprocessors and microcontrollers developed and fabricated by Advanced Micro Devices (AMD). Based on the seminal Berkeley RISC, the 29k added a number of significant improvements. They were, for a time, the most popular RISC chips on the market,[ citation needed ] widely used in laser printers from a variety of manufacturers.
Developed since 1984–1985, announced in March 1987 and released in May 1988, [1] [2] [3] the initial Am29000 was followed by several versions, ending with the Am29040 in 1995. [4] The 29050 was notable for being early to feature a floating point unit capable of executing one multiply–add operation per cycle.
AMD was designing a superscalar version until late 1995, when AMD dropped the development of the 29k because the design team was transferred to support the PC (x86) side of the business. What remained of AMD's embedded business was realigned towards the embedded 186 family of 80186 derivatives. By then the majority of AMD's resources were concentrated on their high-performance x86 processors for desktop PCs, using many of the ideas and individual parts of the 29k designs to produce the AMD K5.
The 29k evolved from the same Berkeley RISC design that also led to the Sun SPARC, Intel i960, ARM and RISC-V.
One design element used in some of the Berkeley RISC-derived designs is the concept of register windows, a technique used to speed up procedure calls significantly. The idea is to use a large set of registers as a stack, loading local data into a set of registers during a call, and marking them "dead" when the procedure returns. Values being returned from the routines would be placed in the "global page", the top eight registers in the SPARC (for instance). The competing early RISC design from Stanford University, the Stanford MIPS, also looked at this concept but decided that improved compilers could make more efficient use of general purpose registers than a hard-wired window.
In the original Berkeley design, SPARC, and i960, the windows were fixed in size. A routine using only one local variable would still use up eight registers on the SPARC, wasting this expensive resource. It was here that the 29000 differed from these earlier designs, using a variable window size. In this example only two registers would be used, one for the local variable, another for the return address. It also added more registers, including the same 128 registers for the procedure stack, but adding another 64 for global access. In comparison, the SPARC had 128 registers in total, and the global set was a standard window of eight. This change resulted in much better register use in the 29000 under a wide variety of workloads.
The 29000 also extended the register window stack with an in-memory (and in theory, in-cache) stack. When the window filled the calls would be pushed off the end of the register stack into memory, restored as required when the routine returned. Generally, the 29000's register usage was considerably more advanced than competing designs based on the Berkeley concepts.
Another difference with the Berkeley design is that the 29000 included no special-purpose condition code register. Any register could be used for this purpose, allowing the conditions to be easily saved at the expense of complicating some code. A Branch Target Cache (512 bytes on the 29000 and 1024 bytes on the 29050) stored sets of 4 or 2 sequential instructions found at the branch target address, reducing the instruction fetch latency during taken branches—the 29000 did not include any branch prediction system so there was a delay if a branch was taken. It means the 29000 has a single branch delay slot. [5] The buffer mitigated this by storing four or two instructions from the target address of the branch, which could be run instantly while the fetch buffer was re-filled with new instructions from memory. [6]
Support for virtual address translation followed a similar approach to that of the MIPS architecture. A 64-entry translation lookaside buffer (TLB) retained mappings from virtual to physical addresses, and upon an untranslated address being encountered, the resulting TLB "miss" would cause the processor to trap to a software routine responsible for providing any appropriate mapping to physical memory. In contrast to the MIPS approach which employed a random register to select the TLB entry to be replaced upon a TLB miss event, the 29000 provided a dedicated lru (least recently used) register. [7] Some products in the 29000 family provided only 16 TLB entries to be able to dedicate part of the silicon to peripherals. To compensate, the maximum page size employed by a mapping was increased from 8 KB to 16 MB. [8] : 305–306
The first Am29000 was released in 1988, including a built-in MMU but floating point support was offloaded to the Am29027 FPU. Units with failed MMU or Branch Target Cache were sold as the Am29005. [6]
In 1991 the line was extended with the Am29030 and Am29035, which included an 8 KB or 4 KB of instruction cache, respectively. [9] By then [10] the Am29050 had also become available, without on-chip cache but featuring a floating-point unit with fully pipelined multiply–accumulate operations, a larger 1 KB Branch Target Cache with a claimed 80% hit rate, and better-pipelined load operations sped up by a 4-entry TLB-like Physical Address Cache. Though it is not a superscalar processor, it permits a floating-point operation and an integer operation to complete at the same cycle. The integer and floating-point sides each have an own write port to the registers. [11] It contained 428,000 transistors [12] on a 1-micron process [13] with a 0.8-micron effective channel length [11] and was available at 20, 25, 33, and 40 MHz. Later the Am29040 was released at 33, 40, and 50 MHz, being like the Am29030 except for featuring a 4 KB data cache, a multiplication unit, and a few other enhancements. [14] The 119 mm2 Am29040 contained 1.2 million transistors on a 0.7-micron process. [15] [16]
A superscalar version of 29K was being designed, but canceled in favor of x86. It was codenamed Jaguar, [3] and was described in November 1994 and August 1995. [17] [18] It was an advanced design, capable of four-way dispatch into six reservation stations and speculative out-of-order execution of instructions, with four-way retire. The register file permitted four reads and two writes at once. The caches for instructions and data were 8 KB each. Loads from cache could bypass stores. It had no on-chip FPU due to cost reasons and the target market. It was expected to attain 100 MHz frequency on a 0.4-micron process. [17] [19]
AMD used the unreleased 29K microarchitecture as the basis of the K5 series of x86-compatible processors. The ALUs were carried over, as was the re-order buffer with a slight modification. The FPU was taken from the 29050, but extended to 80 bits precision. The K5 translated the x86 instructions into "RISC-OPs" upon decoding, aided by the predecode information held of the cached instructions. AMD claimed that the superscalar 29K would have only a slightly lower performance than the K5, but much lower cost due to the size difference. [20] [17]
The Honeywell 29KII is a CPU based on the AMD 29050, and it was extensively used in real-time avionics.
Positioned as a product for "medium- to high-performance embedded applications" with potential for use in Unix workstations, [7] the 29000 was used in a variety of products such as X terminals, laser printer controller cards, graphics accelerator cards, optical character recognition solutions, and network bridges. [21] The memory architecture of the 29000 was a particular attraction for product designers, allowing them to forego external cache memory and to employ dynamic RAM directly while maintaining acceptable performance, [21] : 1 permitting a degree of flexibility in the choice of memory technologies used to retain program instructions and data. [22]
The 29k saw some use as a computational accelerator or coprocessor, particularly on the Macintosh and IBM PC-compatible platforms. For instance, Yarc Systems Corporation produced 29k-based "RISC coprocessor" cards for Macintosh II and PC AT systems, alongside other "CISC coprocessor" cards featuring Motorola 68020 and 68030 processors, and "parallel coprocessor" cards featuring T800 transputer processors. [23] Its NuSuper (originally named the McCray [24] ) and AT-Super cards, employing the Am29000 CPU and Am29027 floating-point accelerator, [23] were followed by the MacRageous, upgrading the CPU to the Am29050. [25] Such accelerator cards offered performance several times that of the Macintosh II itself and benchmarked competitively with RISC workstations such as the DECstation 3100. Multiple cards could also be fitted to a system. However, the cost of a Macintosh II system combined with such a card approached that of established RISC workstations running Unix. [26] The AT-Super was priced at around $4,600 and was reported as running Unix, competing with similar products employing Intel's i860 processor. [27]
One notable product utilising the 29k was Apple's Macintosh Display Card 8·24 GC for its Macintosh IIfx, featuring a 30 MHz Am29000 processor, 64 KB static RAM cache, and 2 MB of video RAM, with the option of an additional 2 MB of dynamic RAM for use by the QuickDraw graphical toolkit. The inclusion of the 29k differentiated this particular version of the card from other versions sold by Apple, significantly improving performance when handling 24-bits-per-pixel images. [28]
A complex instruction set computer is a computer architecture in which single instructions can execute several low-level operations or are capable of multi-step operations or addressing modes within single instructions. The term was retroactively coined in contrast to reduced instruction set computer (RISC) and has therefore become something of an umbrella term for everything that is not RISC, where the typical differentiating characteristic is that most RISC designs use uniform instruction length for almost all instructions, and employ strictly separate load and store instructions.
The Intel 486, officially named i486 and also known as 80486, is a microprocessor. It is a higher-performance follow-up to the Intel 386. The i486 was introduced in 1989. It represents the fourth generation of binary compatible CPUs following the 8086 of 1978, the Intel 80286 of 1982, and 1985's i386.
The K6 microprocessor was launched by AMD in 1997. The main advantage of this particular microprocessor is that it was designed to fit into existing desktop designs for Pentium-branded CPUs. It was marketed as a product that could perform as well as its Intel Pentium II equivalent but at a significantly lower price. The K6 had a considerable impact on the PC market and presented Intel with serious competition.
The Pentium is a x86 microprocessor introduced by Intel on March 22, 1993. It is the first CPU using the Pentium brand. Considered the fifth generation in the 8086 compatible line of processors, its implementation and microarchitecture was internally called P5.
x86 is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was introduced in 1978 as a fully 16-bit extension of Intel's 8-bit 8080 microprocessor, with memory segmentation as a solution for addressing more memory than can be covered by a plain 16-bit address. The term "x86" came into being because the names of several successors to Intel's 8086 processor end in "86", including the 80186, 80286, 80386 and 80486 processors. Colloquially, their names were "186", "286", "386" and "486".
A superscalar processor is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor, which can execute at most one single instruction per clock cycle, a superscalar processor can execute more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to different execution units on the processor. It therefore allows more throughput than would otherwise be possible at a given clock rate. Each execution unit is not a separate processor, but an execution resource within a single CPU such as an arithmetic logic unit.
IA-64 is the instruction set architecture (ISA) of the discontinued Itanium family of 64-bit Intel microprocessors. The basic ISA specification originated at Hewlett-Packard (HP), and was subsequently implemented by Intel in collaboration with HP. The first Itanium processor, codenamed Merced, was released in 2001.
The Pentium Pro is a sixth-generation x86 microprocessor developed and manufactured by Intel and introduced on November 1, 1995. It introduced the P6 microarchitecture and was originally intended to replace the original Pentium in a full range of applications. Later, it was reduced to a more narrow role as a server and high-end desktop processor. The Pentium Pro was also used in supercomputers, most notably ASCI Red, which used two Pentium Pro CPUs on each computing nodes and was the first computer to reach over one teraFLOPS in 1996, holding the number one spot in the TOP500 list from 1997 to 2000.
Intel's i960 was a RISC-based microprocessor design that became popular during the early 1990s as an embedded microcontroller. It became a best-selling CPU in that segment, along with the competing AMD 29000. In spite of its success, Intel stopped marketing the i960 in the late 1990s, as a result of a settlement with DEC whereby Intel received the rights to produce the StrongARM CPU. The processor continues to be used for a few military applications.
The WinChip series was a low-power Socket 7-based x86 processor designed by Centaur Technology and marketed by its parent company IDT.
The K5 is AMD's first x86 processor to be developed entirely in-house. Introduced in March 1996, its primary competition was Intel's Pentium microprocessor. The K5 was an ambitious design, closer to a Pentium Pro than a Pentium regarding technical solutions and internal architecture. However, the final product was closer to the Pentium regarding performance, although faster clock-for-clock compared to the Pentium.
The R3000 is a 32-bit RISC microprocessor chipset developed by MIPS Computer Systems that implemented the MIPS I instruction set architecture (ISA). Introduced in June 1988, it was the second MIPS implementation, succeeding the R2000 as the flagship MIPS microprocessor. It operated at 20, 25 and 33.33 MHz.
The R4000 is a microprocessor developed by MIPS Computer Systems that implements the MIPS III instruction set architecture (ISA). Officially announced on 1 October 1991, it was one of the first 64-bit microprocessors and the first MIPS III implementation. In the early 1990s, when RISC microprocessors were expected to replace CISC microprocessors such as the Intel i486, the R4000 was selected to be the microprocessor of the Advanced Computing Environment (ACE), an industry standard that intended to define a common RISC platform. ACE ultimately failed for a number of reasons, but the R4000 found success in the workstation and server markets.
William Michael Johnson is a technologist, and pioneer in superscalar microprocessor design in the United States.
The Alpha 21264 is a Digital Equipment Corporation RISC microprocessor launched on 19 October 1998. The 21264 implemented the Alpha instruction set architecture (ISA).
The PA-8000 (PCX-U), code-named Onyx, is a microprocessor developed and fabricated by Hewlett-Packard (HP) that implemented the PA-RISC 2.0 instruction set architecture (ISA). It was a completely new design with no circuitry derived from previous PA-RISC microprocessors. The PA-8000 was introduced on 2 November 1995 when shipments began to members of the Precision RISC Organization (PRO). It was used exclusively by PRO members and was not sold on the merchant market. All follow-on PA-8x00 processors are based on the basic PA-8000 processor core.
The PA-7100LC is a microprocessor that implements the PA-RISC 1.1 instruction set architecture (ISA) developed by Hewlett-Packard (HP). It is also known as the PCX-L, and by its code-name, Hummingbird. It was designed as a low-cost microprocessor for low-end systems. The first systems to feature the PA-7100LC were introduced in January 1994. These systems used 60 and 80 MHz clock rates. A 100 MHz part debuted in June 1994. The PA-7100LC was the first PA-RISC microprocessor to implement the MAX-1 multimedia instructions, an early single instruction, multiple data (SIMD) multimedia instruction set extension that provided instructions for improving the performance of MPEG video decoding.
Since 1985, many processors implementing some version of the MIPS architecture have been designed and used widely.
Well, it started in '85. And it took I would say about three years and maybe four revs till it was functional.