Complex instruction set computer

Last updated

A complex instruction set computer (CISC /ˈsɪsk/ ) is a computer in which single instructions can execute several low-level operations (such as a load from memory, an arithmetic operation, and a memory store) or are capable of multi-step operations or addressing modes within single instructions. The term was retroactively coined in contrast to reduced instruction set computer (RISC) [1] and has therefore become something of an umbrella term for everything that is not RISC, where the typical differentiating characteristic is that most RISC designs use uniform instruction length for almost all instructions, and employ strictly separate load and store instructions.

Contents

Examples of CISC architectures include complex mainframe computers to simplistic microcontrollers where memory load and store operations are not separated from arithmetic instructions. Specific instruction set architectures that have been retroactively labeled CISC are System/360 through z/Architecture, the PDP-11 and VAX architectures, and many others. Well known microprocessors and microcontrollers that have also been labeled CISC in many academic publications include the Motorola 6800, 6809 and 68000-families; the Intel 8080, iAPX432 and x86-family; the Zilog Z80, Z8 and Z8000-families; the National Semiconductor 32016 and NS320xx-line; the MOS Technology 6502-family; the Intel 8051-family; and others.

Some designs have been regarded as borderline cases by some writers. For instance, the Microchip Technology PIC has been labeled RISC in some circles and CISC in others. The 6502 and 6809 have both been described as RISC-like, although they have complex addressing modes as well as arithmetic instructions that operate on memory, contrary to the RISC principles.

Incitements and benefits

Before the RISC philosophy became prominent, many computer architects tried to bridge the so-called semantic gap, i.e., to design instruction sets that directly support high-level programming constructs such as procedure calls, loop control, and complex addressing modes, allowing data structure and array accesses to be combined into single instructions. Instructions are also typically highly encoded in order to further enhance the code density. The compact nature of such instruction sets results in smaller program sizes and fewer main memory accesses (which were often slow), which at the time (early 1960s and onwards) resulted in a tremendous saving on the cost of computer memory and disc storage, as well as faster execution. It also meant good programming productivity even in assembly language, as high level languages such as Fortran or Algol were not always available or appropriate. Indeed, microprocessors in this category are sometimes still programmed in assembly language for certain types of critical applications[ citation needed ].

New instructions

In the 1970s, analysis of high-level languages indicated compilers produced some complex corresponding machine language. It was determined that new instructions could improve performance. Some instructions were added that were never intended to be used in assembly language but fit well with compiled high-level languages. Compilers were updated to take advantage of these instructions. The benefits of semantically rich instructions with compact encodings can be seen in modern processors as well, particularly in the high-performance segment where caches are a central component (as opposed to most embedded systems). This is because these fast, but complex and expensive, memories are inherently limited in size, making compact code beneficial. Of course, the fundamental reason they are needed is that main memories (i.e., dynamic RAM today) remain slow compared to a (high-performance) CPU core.

Design issues

While many designs achieved the aim of higher throughput at lower cost and also allowed high-level language constructs to be expressed by fewer instructions, it was observed that this was not always the case. For instance, low-end versions of complex architectures (i.e. using less hardware) could lead to situations where it was possible to improve performance by not using a complex instruction (such as a procedure call or enter instruction), but instead using a sequence of simpler instructions.

One reason for this was that architects (microcode writers) sometimes "over-designed" assembly language instructions, including features which could not be implemented efficiently on the basic hardware available. There could, for instance, be "side effects" (above conventional flags), such as the setting of a register or memory location that was perhaps seldom used; if this was done via ordinary (non duplicated) internal buses, or even the external bus, it would demand extra cycles every time, and thus be quite inefficient.

Even in balanced high-performance designs, highly encoded and (relatively) high-level instructions could be complicated to decode and execute efficiently within a limited transistor budget. Such architectures therefore required a great deal of work on the part of the processor designer in cases where a simpler, but (typically) slower, solution based on decode tables and/or microcode sequencing is not appropriate. At a time when transistors and other components were a limited resource, this also left fewer components and less opportunity for other types of performance optimizations.

The RISC idea

The circuitry that performs the actions defined by the microcode in many (but not all) CISC processors is, in itself, a processor which in many ways is reminiscent in structure to very early CPU designs. In the early 1970s, this gave rise to ideas to return to simpler processor designs in order to make it more feasible to cope without (then relatively large and expensive) ROM tables and/or PLA structures for sequencing and/or decoding.

An early (retroactively) RISC-labeled processor (IBM 801   IBM's Watson Research Center, mid-1970s) was a tightly pipelined simple machine originally intended to be used as an internal microcode kernel, or engine, in CISC designs,[ citation needed ] but also became the processor that introduced the RISC idea to a somewhat larger public. Simplicity and regularity also in the visible instruction set would make it easier to implement overlapping processor stages (pipelining) at the machine code level (i.e. the level seen by compilers). However, pipelining at that level was already used in some high performance CISC "supercomputers" in order to reduce the instruction cycle time (despite the complications of implementing within the limited component count and wiring complexity feasible at the time). Internal microcode execution in CISC processors, on the other hand, could be more or less pipelined depending on the particular design, and therefore more or less akin to the basic structure of RISC processors.

The CDC 6600 supercomputer, first delivered in 1965, has also been retroactively described as RISC. It had a load–store architecture which allowed up to five loads and two stores to be in progress simultaneously under programmer control. It also had multiple function units which could operate at the same time.

Superscalar

In a more modern context, the complex variable-length encoding used by some of the typical CISC architectures makes it complicated, but still feasible, to build a superscalar implementation of a CISC programming model directly; the in-order superscalar original Pentium and the out-of-order superscalar Cyrix 6x86 are well known examples of this. The frequent memory accesses for operands of a typical CISC machine may limit the instruction level parallelism that can be extracted from the code, although this is strongly mediated by the fast cache structures used in modern designs, as well as by other measures. Due to inherently compact and semantically rich instructions, the average amount of work performed per machine code unit (i.e. per byte or bit) is higher for a CISC than a RISC processor, which may give it a significant advantage in a modern cache based implementation.

Transistors for logic, PLAs, and microcode are no longer scarce resources; only large high-speed cache memories are limited by the maximum number of transistors today. Although complex, the transistor count of CISC decoders do not grow exponentially like the total number of transistors per processor (the majority typically used for caches). Together with better tools and enhanced technologies, this has led to new implementations of highly encoded and variable length designs without load–store limitations (i.e. non-RISC). This governs re-implementations of older architectures such as the ubiquitous x86 (see below) as well as new designs for microcontrollers for embedded systems, and similar uses. The superscalar complexity in the case of modern x86 was solved by converting instructions into one or more micro-operations and dynamically issuing those micro-operations, i.e. indirect and dynamic superscalar execution; the Pentium Pro and AMD K5 are early examples of this. It allows a fairly simple superscalar design to be located after the (fairly complex) decoders (and buffers), giving, so to speak, the best of both worlds in many respects. This technique is also used in IBM z196 and later z/Architecture microprocessors.

CISC and RISC terms

The terms CISC and RISC have become less meaningful with the continued evolution of both CISC and RISC designs and implementations. The first highly (or tightly) pipelined x86 implementations, the 486 designs from Intel, AMD, Cyrix, and IBM, supported every instruction that their predecessors did, but achieved maximum efficiency only on a fairly simple x86 subset that was only a little more than a typical RISC instruction set (i.e., without typical RISC load–store limits). The Intel P5 Pentium generation was a superscalar version of these principles. However, modern x86 processors also (typically) decode and split instructions into dynamic sequences of internally buffered micro-operations, which helps execute a larger subset of instructions in a pipelined (overlapping) fashion, and facilitates more advanced extraction of parallelism out of the code stream, for even higher performance.

Contrary to popular simplifications (present also in some academic texts,[ which? ]) not all CISCs are microcoded or have "complex" instructions. As CISC became a catch-all term meaning anything that's not a load–store (RISC) architecture, it's not the number of instructions, nor the complexity of the implementation or of the instructions, that define CISC, but that arithmetic instructions also perform memory accesses.[ citation needed ] Compared to a small 8-bit CISC processor, a RISC floating-point instruction is complex. CISC does not even need to have complex addressing modes; 32- or 64-bit RISC processors may well have more complex addressing modes than small 8-bit CISC processors.

A PDP-10, a PDP-8, an Intel 80386, an Intel 4004, a Motorola 68000, a System z mainframe, a Burroughs B5000, a VAX, a Zilog Z80000, and a MOS Technology 6502 all vary widely in the number, sizes, and formats of instructions, the number, types, and sizes of registers, and the available data types. Some have hardware support for operations like scanning for a substring, arbitrary-precision BCD arithmetic, or transcendental functions, while others have only 8-bit addition and subtraction. But they are all in the CISC category because they have "load-operate" instructions that load and/or store memory contents within the same instructions that perform the actual calculations. For instance, the PDP-8, having only 8 fixed-length instructions and no microcode at all, is a CISC because of how the instructions work, PowerPC, which has over 230 instructions (more than some VAXes), and complex internals like register renaming and a reorder buffer, is a RISC, while Minimal CISC has 8 instructions, but is clearly a CISC because it combines memory access and computation in the same instructions.

See also

Related Research Articles

Central processing unit Central component of any computer system which executes input/output, arithmetical, and logical operations

A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, and input/output (I/O) operations specified by the instructions in the program. This contrasts with external components such as main memory and I/O circuitry, and specialized processors such as graphics processing units (GPUs).

The control unit (CU) is a component of a computer's central processing unit (CPU) that directs the operation of the processor. A CU typically uses a binary decoder to convert coded instructions into timing and control signals that direct the operation of the other units.

In processor design, Microcode is a technique that interposes a layer of computer organization between the CPU hardware and the programmer-visible instruction set architecture of the computer. Microcode is a layer of hardware-level instructions that implement higher-level machine code instructions or internal state machine sequencing in many digital processing elements. Microcode is used in general-purpose central processing units, although in current desktop CPUs, it is only a fallback path for cases that the faster hardwired control unit cannot handle.

Pentium (original) Intel microporocessor

The Pentium microprocessor was introduced by Intel on March 22, 1993, as the first CPU in the Pentium brand. It was instruction set compatible with the 80486 but was a new and very different microarchitecture design. The P5 Pentium was the first superscalar x86 microarchitecture and the world's first superscalar microprocessor to be in mass production. It included dual integer pipelines, a faster floating-point unit, wider data bus, separate code and data caches, and many other techniques and features to enhance performance and support security, encryption, and multiprocessing, for workstations and servers.

Reduced instruction set computer Processor executing one instruction in minimal clock cycles

A reduced instruction set computer, or RISC, is a computer with a small, highly optimized set of instructions, rather than the more specialized set often found in other types of architecture, such as in a complex instruction set computer (CISC). The main distinguishing feature of RISC architecture is that the instruction set is optimized with a large number of registers and a highly regular instruction pipeline, allowing a low number of clock cycles per instruction (CPI). Core features of a RISC philosophy are a load/store architecture, in which memory is accessed through specific instructions rather than as a part of most instructions in the set, and requiring only single-cycle instructions.

In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an implementation.

Superscalar processor CPU that implements instruction-level parallelism within a single processor

A superscalar processor is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor that can execute at most one single instruction per clock cycle, a superscalar processor can execute more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to different execution units on the processor. It therefore allows for more throughput than would otherwise be possible at a given clock rate. Each execution unit is not a separate processor, but an execution resource within a single CPU such as an arithmetic logic unit.

Very long instruction word (VLIW) refers to instruction set architectures designed to exploit instruction level parallelism (ILP). Whereas conventional central processing units mostly allow programs to specify instructions to execute in sequence only, a VLIW processor allows programs to explicitly specify instructions to execute in parallel. This design is intended to allow higher performance without the complexity inherent in some other designs.

In the history of computer hardware, some early reduced instruction set computer central processing units used a very similar architectural solution, now called a classic RISC pipeline. Those CPUs were: MIPS, SPARC, Motorola 88000, and later the notional CPU DLX invented for education.

Intel iAPX 432

The iAPX 432 is a discontinued computer architecture introduced in 1981. It was Intel's first 32-bit processor design. The main processor of the architecture, the general data processor, is implemented as a set of two separate integrated circuits, due to technical limitations at the time. Although some early 8086, 80186 and 80286-based systems and manuals also used the iAPX prefix for marketing reasons, the iAPX 432 and the 8086 processor lines are completely separate designs with completely different instruction sets.

In computer science, computer engineering and programming language implementations, a stack machine is a computer processor or a virtual machine in which the primary interaction is moving short-lived temporary values to and from a push down stack. In the case of a hardware processor, a hardware stack is used. The use of a stack significantly reduces the required number of processor registers. Stack machines extend push-down automaton with additional load/store operations or multiple stacks and hence are Turing-complete.

Explicitly parallel instruction computing (EPIC) is a term coined in 1997 by the HP–Intel alliance to describe a computing paradigm that researchers had been investigating since the early 1980s. This paradigm is also called Independence architectures. It was the basis for Intel and HP development of the Intel Itanium architecture, and HP later asserted that "EPIC" was merely an old term for the Itanium architecture. EPIC permits microprocessors to execute software instructions in parallel by using the compiler, rather than complex on-die circuitry, to control parallel instruction execution. This was intended to allow simple performance scaling without resorting to higher clock frequencies.

Berkeley RISC is one of two seminal research projects into reduced instruction set computer (RISC) based microprocessor design taking place under the Advanced Research Projects Agency (ARPA) VLSI project. RISC was led by David Patterson at the University of California, Berkeley between 1980 and 1984. The other project took place a short distance away at Stanford University under their MIPS effort starting in 1981 and running until 1984.

Microarchitecture Component of computer engineering

In computer engineering, microarchitecture, also called computer organization and sometimes abbreviated as µarch or uarch, is the way a given instruction set architecture (ISA) is implemented in a particular processor. A given ISA may be implemented with different microarchitectures; implementations may vary due to different goals of a given design or due to shifts in technology.

Micro-operation

In computer central processing units, micro-operations are detailed low-level instructions used in some designs to implement complex machine instructions.

History of general-purpose CPUs History of processors used in general purpose computers

The history of general-purpose CPUs is a continuation of the earlier history of computing hardware.

Explicit data graph execution, or EDGE, is a type of instruction set architecture (ISA) which intends to improve computing performance compared to common processors like the Intel x86 line. EDGE combines many individual instructions into a larger group known as a "hyperblock". Hyperblocks are designed to be able to easily run in parallel.

Computer architecture Set of rules describing computer system

In computer engineering, computer architecture is a set of rules and methods that describe the functionality, organization, and implementation of computer systems.

The zEC12 microprocessor is a chip made by IBM for their zEnterprise EC12 and zEnterprise BC12 mainframe computers, announced on August 28, 2012. It is manufactured at the East Fishkill, New York fabrication plant. The processor began shipping in the fall of 2012. IBM stated that it was the world's fastest microprocessor and is about 25% faster than its predecessor the z196.

The IBM POWER ISA is a reduced instruction set computer (RISC) instruction set architecture (ISA) developed by IBM. The name is an acronym for Performance Optimization With Enhanced RISC.

References

  1. Patterson, D. A.; Ditzel, D. R. (October 1980). "The case for the reduced instruction set computer". SIGARCH Computer Architecture News. ACM. 8 (6): 25–33. doi:10.1145/641914.641917.

General references

  • Tanenbaum, Andrew S. (2006) Structured Computer Organization, Fifth Edition, Pearson Education, Inc. Upper Saddle River, NJ.

Further reading