Processor design

Last updated

Processor design is a subfield of computer science and computer engineering (fabrication) that deals with creating a processor, a key component of computer hardware.

Contents

The design process involves choosing an instruction set and a certain execution paradigm (e.g. VLIW or RISC) and results in a microarchitecture, which might be described in e.g. VHDL or Verilog. For microprocessor design, this description is then manufactured employing some of the various semiconductor device fabrication processes, resulting in a die which is bonded onto a chip carrier. This chip carrier is then soldered onto, or inserted into a socket on, a printed circuit board (PCB).

The mode of operation of any processor is the execution of lists of instructions. Instructions typically include those to compute or manipulate data values using registers, change or retrieve values in read/write memory, perform relational tests between data values and to control program flow.

Processor designs are often tested and validated on one or several FPGAs before sending the design of the processor to a foundry for semiconductor fabrication. [1]

Details

Basics

CPU design is divided into multiple components. Information is transferred through datapaths (such as ALUs and pipelines). These datapaths are controlled through logic by control units. Memory components include register files and caches to retain information, or certain actions. Clock circuitry maintains internal rhythms and timing through clock drivers, PLLs, and clock distribution networks. Pad transceiver circuitry with allows signals to be received and sent and a logic gate cell library which is used to implement the logic.

CPUs designed for high-performance markets might require custom (optimized or application specific (see below)) designs for each of these items to achieve frequency, power-dissipation, and chip-area goals whereas CPUs designed for lower performance markets might lessen the implementation burden by acquiring some of these items by purchasing them as intellectual property. Control logic implementation techniques (logic synthesis using CAD tools) can be used to implement datapaths, register files, and clocks. Common logic styles used in CPU design include unstructured random logic, finite-state machines, microprogramming (common from 1965 to 1985), and Programmable logic arrays (common in the 1980s, no longer common).

Implementation logic

Device types used to implement the logic include:

A CPU design project generally has these major tasks:

Re-designing a CPU core to a smaller die area helps to shrink everything (a "photomask shrink"), resulting in the same number of transistors on a smaller die. It improves performance (smaller transistors switch faster), reduces power (smaller wires have less parasitic capacitance) and reduces cost (more CPUs fit on the same wafer of silicon). Releasing a CPU on the same size die, but with a smaller CPU core, keeps the cost about the same but allows higher levels of integration within one very-large-scale integration chip (additional cache, multiple CPUs or other components), improving performance and reducing overall system cost.

As with most complex electronic designs, the logic verification effort (proving that the design does not have bugs) now dominates the project schedule of a CPU.

Key CPU architectural innovations include index register, cache, virtual memory, instruction pipelining, superscalar, CISC, RISC, virtual machine, emulators, microprogram, and stack.

Micro-architectural concepts

Research topics

A variety of new CPU design ideas have been proposed, including reconfigurable logic, clockless CPUs, computational RAM, and optical computing.

Performance analysis and benchmarking

Benchmarking is a way of testing CPU speed. Examples include SPECint and SPECfp, developed by Standard Performance Evaluation Corporation, and ConsumerMark developed by the Embedded Microprocessor Benchmark Consortium EEMBC.

Some of the commonly used metrics include:

There may be tradeoffs in optimizing some of these metrics. In particular, many design techniques that make a CPU run faster make the "performance per watt", "performance per dollar", and "deterministic response" much worse, and vice versa.

Markets

There are several different markets in which CPUs are used. Since each of these markets differ in their requirements for CPUs, the devices designed for one market are in most cases inappropriate for the other markets.

General purpose computing

As of 2010 in the general-purpose computing market, that is, desktop, laptop, and server computers commonly used in businesses and homes, the Intel IA-32 and the 64-bit version x86-64 architecture dominate the market, with its rivals PowerPC and SPARC maintaining much smaller customer bases. Yearly, hundreds of millions of IA-32 architecture CPUs are used by this market. A growing percentage of these processors are for mobile implementations such as netbooks and laptops. [4]

Since these devices are used to run countless different types of programs, these CPU designs are not specifically targeted at one type of application or one function. The demands of being able to run a wide range of programs efficiently has made these CPU designs among the more advanced technically, along with some disadvantages of being relatively costly, and having high power consumption.

High-end processor economics

In 1984, most high-performance CPUs required four to five years to develop. [5]

Scientific computing

Scientific computing is a much smaller niche market (in revenue and units shipped). It is used in government research labs and universities. Before 1990, CPU design was often done for this market, but mass market CPUs organized into large clusters have proven to be more affordable. The main remaining area of active hardware design and research for scientific computing is for high-speed data transmission systems to connect mass market CPUs.

Embedded design

As measured by units shipped, most CPUs are embedded in other machinery, such as telephones, clocks, appliances, vehicles, and infrastructure. Embedded processors sell in the volume of many billions of units per year, however, mostly at much lower price points than that of the general purpose processors.

These single-function devices differ from the more familiar general-purpose CPUs in several ways:

Embedded processor economics

The embedded CPU family with the largest number of total units shipped is the 8051, averaging nearly a billion units per year. [6] The 8051 is widely used because it is very inexpensive. The design time is now roughly zero, because it is widely available as commercial intellectual property. It is now often embedded as a small part of a larger system on a chip. The silicon cost of an 8051 is now as low as US$0.001, because some implementations use as few as 2,200 logic gates and take 0.4730 square millimeters of silicon. [7] [8]

As of 2009, more CPUs are produced using the ARM architecture family instruction sets than any other 32-bit instruction set. [9] [10] The ARM architecture and the first ARM chip were designed in about one and a half years and 5 human years of work time. [11]

The 32-bit Parallax Propeller microcontroller architecture and the first chip were designed by two people in about 10 human years of work time. [12]

The 8-bit AVR architecture and first AVR microcontroller was conceived and designed by two students at the Norwegian Institute of Technology.

The 8-bit 6502 architecture and the first MOS Technology 6502 chip were designed in 13 months by a group of about 9 people. [13]

Research and educational CPU design

The 32-bit Berkeley RISC I and RISC II processors were mostly designed by a series of students as part of a four quarter sequence of graduate courses. [14] This design became the basis of the commercial SPARC processor design.

For about a decade, every student taking the 6.004 class at MIT was part of a team—each team had one semester to design and build a simple 8 bit CPU out of 7400 series integrated circuits. One team of 4 students designed and built a simple 32 bit CPU during that semester. [15]

Some undergraduate courses require a team of 2 to 5 students to design, implement, and test a simple CPU in a FPGA in a single 15-week semester. [16]

The MultiTitan CPU was designed with 2.5 man years of effort, which was considered "relatively little design effort" at the time. [17] 24 people contributed to the 3.5 year MultiTitan research project, which included designing and building a prototype CPU. [18]

Soft microprocessor cores

For embedded systems, the highest performance levels are often not needed or desired due to the power consumption requirements. This allows for the use of processors which can be totally implemented by logic synthesis techniques. These synthesized processors can be implemented in a much shorter amount of time, giving quicker time-to-market.

See also

Related Research Articles

<span class="mw-page-title-main">Central processing unit</span> Central computer component which executes instructions

A central processing unit (CPU)—also called a central processor or main processor—is the most important processor in a given computer. Its electronic circuitry executes instructions of a computer program, such as arithmetic, logic, controlling, and input/output (I/O) operations. This role contrasts with that of external components, such as main memory and I/O circuitry, and specialized coprocessors such as graphics processing units (GPUs).

<span class="mw-page-title-main">Microprocessor</span> Computer processor contained on an integrated-circuit chip

A microprocessor is a computer processor for which the data processing logic and control is included on a single integrated circuit (IC), or a small number of ICs. The microprocessor contains the arithmetic, logic, and control circuitry required to perform the functions of a computer's central processing unit (CPU). The IC is capable of interpreting and executing program instructions and performing arithmetic operations. The microprocessor is a multipurpose, clock-driven, register-based, digital integrated circuit that accepts binary data as input, processes it according to instructions stored in its memory, and provides results as output. Microprocessors contain both combinational logic and sequential digital logic, and operate on numbers and symbols represented in the binary number system.

<span class="mw-page-title-main">Microcontroller</span> Small computer on a single integrated circuit

A microcontroller or microcontroller unit (MCU) is a small computer on a single integrated circuit. A microcontroller contains one or more CPUs along with memory and programmable input/output peripherals. Program memory in the form of ferroelectric RAM, NOR flash or OTP ROM is also often included on chip, as well as a small amount of RAM. Microcontrollers are designed for embedded applications, in contrast to the microprocessors used in personal computers or other general purpose applications consisting of various discrete chips.

<span class="mw-page-title-main">PowerPC</span> RISC instruction set architecture by AIM alliance

PowerPC is a reduced instruction set computer (RISC) instruction set architecture (ISA) created by the 1991 Apple–IBM–Motorola alliance, known as AIM. PowerPC, as an evolving instruction set, has been named Power ISA since 2006, while the old name lives on as a trademark for some implementations of Power Architecture–based processors.

<span class="mw-page-title-main">Reduced instruction set computer</span> Processor executing one instruction in minimal clock cycles

In computer science, a reduced instruction set computer (RISC) is a computer architecture designed to simplify the individual instructions given to the computer to accomplish tasks. Compared to the instructions given to a complex instruction set computer (CISC), a RISC computer might require more instructions in order to accomplish a task because the individual instructions are written in simpler code. The goal is to offset the need to process more instructions by increasing the speed of each instruction, in particular by implementing an instruction pipeline, which may be simpler to achieve given simpler instructions.

The NS32000, sometimes known as the 32k, is a series of microprocessors produced by National Semiconductor. The first member of the family came to market in 1982, briefly known as the 16032 before becoming the 32016. It was the first general-purpose microprocessor on the market that used 32-bit data throughout: the Motorola 68000 used 32-bit data but had a 16-bit ALU and thus took twice as long perform many operations. However, the 32016 contained many bugs and often could not be run at its rated speed. These problems, and the presence of the otherwise similar 68000 which had been available since 1980, led to little use in the market.

Very long instruction word (VLIW) refers to instruction set architectures designed to exploit instruction level parallelism (ILP). Whereas conventional central processing units mostly allow programs to specify instructions to execute in sequence only, a VLIW processor allows programs to explicitly specify instructions to execute in parallel. This design is intended to allow higher performance without the complexity inherent in some other designs.

<span class="mw-page-title-main">Transputer</span> Series of pioneering microprocessors from the 1980s

The transputer is a series of pioneering microprocessors from the 1980s, intended for parallel computing. To support this, each transputer had its own integrated memory and serial communication links to exchange data with other transputers. They were designed and produced by Inmos, a semiconductor company based in Bristol, United Kingdom.

<span class="mw-page-title-main">System on a chip</span> Micro-electronic component

A system on a chip or system-on-chip is an integrated circuit that integrates most or all components of a computer or other electronic system. These components almost always include on-chip central processing unit (CPU), memory interfaces, input/output devices and interfaces, and secondary storage interfaces, often alongside other components such as radio modems and a graphics processing unit (GPU) – all on a single substrate or microchip. SoCs may contain digital and also analog, mixed-signal and often radio frequency signal processing functions.

Reconfigurable computing is a computer architecture combining some of the flexibility of software with the high performance of hardware by processing with flexible hardware platforms like field-programmable gate arrays (FPGAs). The principal difference when compared to using ordinary microprocessors is the ability to add custom computational blocks using FPGAs. On the other hand, the main difference from custom hardware, i.e. application-specific integrated circuits (ASICs) is the possibility to adapt the hardware during runtime by "loading" a new circuit on the reconfigurable fabric, thus providing new computational blocks without the need to manufacture and add new chips to the existing system.

<span class="mw-page-title-main">Hardware acceleration</span> Specialized computer hardware

Hardware acceleration is the use of computer hardware designed to perform specific functions more efficiently when compared to software running on a general-purpose central processing unit (CPU). Any transformation of data that can be calculated in software running on a generic CPU can also be calculated in custom-made hardware, or in some mix of both.

<span class="mw-page-title-main">R3000</span> RISC microprocessor

The R3000 is a 32-bit RISC microprocessor chipset developed by MIPS Computer Systems that implemented the MIPS I instruction set architecture (ISA). Introduced in June 1988, it was the second MIPS implementation, succeeding the R2000 as the flagship MIPS microprocessor. It operated at 20, 25 and 33.33 MHz.

The transistor count is the number of transistors in an electronic device. It is the most common measure of integrated circuit complexity. The rate at which MOS transistor counts have increased generally follows Moore's law, which observes that transistor count doubles approximately every two years. However, being directly proportional to the area of a chip, transistor count does not represent how advanced the corresponding manufacturing technology is: a better indication of this is transistor density.

<span class="mw-page-title-main">History of general-purpose CPUs</span> History of processors used in general purpose computers

The history of general-purpose CPUs is a continuation of the earlier history of computing hardware.

The R4200 is a microprocessor designed by MIPS Technologies, Inc. (MTI) that implemented the MIPS III instruction set architecture (ISA). It was also known as the VRX during development. The microprocessor was licensed to NEC, and the company fabricated and marketed it as the VR4200. The first VR4200, an 80 MHz part, was introduced in 1993. A faster 100 MHz part became available in 1994.

<span class="mw-page-title-main">Computer architecture</span> Set of rules describing computer system

In computer science and computer engineering, computer architecture is a description of the structure of a computer system made from component parts. It can sometimes be a high-level description that ignores details of the implementation. At a more detailed level, the description may include the instruction set architecture design, microarchitecture design, logic design, and implementation.

IBM POWER is a reduced instruction set computer (RISC) instruction set architecture (ISA) developed by IBM. The name is an acronym for Performance Optimization With Enhanced RISC.

IBM Power microprocessors are designed and sold by IBM for servers and supercomputers. The name "POWER" was originally presented as an acronym for "Performance Optimization With Enhanced RISC". The Power line of microprocessors has been used in IBM's RS/6000, AS/400, pSeries, iSeries, System p, System i, and Power Systems lines of servers and supercomputers. They have also been used in data storage devices and workstations by IBM and by other server manufacturers like Bull and Hitachi.

Since 1985, many processors implementing some version of the MIPS architecture have been designed and used widely.

References

  1. Cutress, Ian (August 27, 2019). "Xilinx Announces World Largest FPGA: Virtex Ultrascale+ VU19P with 9m Cells". AnandTech .
  2. "EEMBC ConsumerMark". Archived from the original on March 27, 2005.
  3. Stephen Shankland (December 9, 2005). "Power could cost more than servers, Google warns". ZDNet .
  4. Kerr, Justin. "AMD Loses Market Share as Mobile CPU Sales Outsell Desktop for the First Time." Maximum PC. Published 2010-10-26.
  5. "New system manages hundreds of transactions per second" article by Robert Horst and Sandra Metz, of Tandem Computers Inc., "Electronics" magazine, 1984 April 19: "While most high-performance CPUs require four to five years to develop, The NonStop TXP processor took just 2+1/2 years -- six months to develop a complete written specification, one year to construct a working prototype, and another year to reach volume production."
  6. Curtis A. Nelson. "8051 Overview" (PDF). Archived from the original (PDF) on 2011-10-09. Retrieved 2011-07-10.
  7. "T8051 Tiny 8051-compatible Microcontroller" (PDF). Archived from the original (PDF) on 2011-09-29.
  8. To figure dollars per square millimeter, see , and note that an SOC component has no pin or packaging costs.
  9. "ARM Cores Climb Into 3G Territory" by Mark Hachman, 2002.
  10. "The Two Percent Solution" by Jim Turley 2002.
  11. "ARM's way" 1998
  12. "Why the Propeller Works" by Chip Gracey
  13. "Interview with William Mensch". Archived from the original on 2016-03-04. Retrieved 2009-02-01.
  14. C.H. Séquin; D.A. Patterson. "Design and Implementation of RISC I" (PDF). Archived (PDF) from the original on 2006-03-05.
  15. "the VHS". Archived from the original on 2010-02-27.
  16. Jan Gray. "Teaching Computer Design with FPGAs".
  17. Jouppi, N.P.; Tang, J.Y.-F. (October 1989). "A 20-MIPS sustained 32-bit CMOS microprocessor with high ratio of sustained to peak performance". IEEE Journal of Solid-State Circuits. 24 (5): 1348–1359. Bibcode:1989IJSSC..24.1348J. doi:10.1109/JSSC.1989.572612.
  18. "MultiTitan: Four Architecture Papers" (PDF). 1988. pp. 4–5. Archived (PDF) from the original on 2004-08-25.

General references