Cycles per instruction

Last updated

In computer architecture, cycles per instruction (aka clock cycles per instruction, clocks per instruction, or CPI) is one aspect of a processor's performance: the average number of clock cycles per instruction for a program or program fragment. [1] It is the multiplicative inverse of instructions per cycle.

Contents

Definition

The average of Cycles Per Instruction in a given process (CPI) is defined by the following weighted average:

Where is the number of instructions for a given instruction type , is the clock-cycles for that instruction type and is the total instruction count. The summation sums over all instruction types for a given benchmarking process.

Explanation

Let us assume a classic RISC pipeline, with the following five stages:

  1. Instruction fetch cycle (IF).
  2. Instruction decode/Register fetch cycle (ID).
  3. Execution/Effective address cycle (EX).
  4. Memory access (MEM).
  5. Write-back cycle (WB).

Each stage requires one clock cycle and an instruction passes through the stages sequentially. Without pipelining, in a multi-cycle processor, a new instruction is fetched in stage 1 only after the previous instruction finishes at stage 5, therefore the number of clock cycles it takes to execute an instruction is five (CPI = 5 > 1). In this case, the processor is said to be subscalar. With pipelining, a new instruction is fetched every clock cycle by exploiting instruction-level parallelism, therefore, since one could theoretically have five instructions in the five pipeline stages at once (one instruction per stage), a different instruction would complete stage 5 in every clock cycle and on average the number of clock cycles it takes to execute an instruction is 1 (CPI = 1). In this case, the processor is said to be scalar.

With a single-execution-unit processor, the best CPI attainable is 1. However, with a multiple-execution-unit processor, one may achieve even better CPI values (CPI < 1). In this case, the processor is said to be superscalar . To get better CPI values without pipelining, the number of execution units must be greater than the number of stages. For example, with six executions units, six new instructions are fetched in stage 1 only after the six previous instructions finish at stage 5, therefore on average the number of clock cycles it takes to execute an instruction is 5/6 (CPI = 5/6 < 1). To get better CPI values with pipelining, there must be at least two execution units. For example, with two executions units, two new instructions are fetched every clock cycle by exploiting instruction-level parallelism, therefore two different instructions would complete stage 5 in every clock cycle and on average the number of clock cycles it takes to execute an instruction is 1/2 (CPI = 1/2 < 1).

Examples

Example 1

For the multi-cycle MIPS, there are five types of instructions:

If a program has:

then, the CPI is:

Example 2

[2] A 400MHz processor was used to execute a benchmark program with the following instruction mix and clock cycle count:

Instruction TYPEInstruction countClock cycle count
Integer Arithmetic450001
Data transfer320002
Floating point150002
Control transfer80002

Determine the effective CPI, MIPS (Millions of instructions per second) rate, and execution time for this program.

since: and

Therefore:

See also

Related Research Articles

<span class="mw-page-title-main">Central processing unit</span> Central computer component which executes instructions

A central processing unit (CPU)—also called a central processor or main processor—is the most important processor in a given computer. Its electronic circuitry executes instructions of a computer program, such as arithmetic, logic, controlling, and input/output (I/O) operations. This role contrasts with that of external components, such as main memory and I/O circuitry, and specialized coprocessors such as graphics processing units (GPUs).

Conversion of units is the conversion between different units of measurement for the same quantity, typically through multiplicative conversion factors which change the measured quantity value without changing its effects. Unit conversion is often easier within the metric or the SI than in others, due to the regular 10-base in all units and the prefixes that increase or decrease by 3 powers of 10 at a time.

<span class="mw-page-title-main">Frequency</span> Number of occurrences or cycles per unit time

Frequency, measured in hertz, is the number of occurrences of a repeating event per unit of time. It is also occasionally referred to as temporal frequency for clarity and to distinguish it from spatial frequency. Ordinary frequency is related to angular frequency by a scaling factor of 2π. The period is the interval of time between events, so the period is the reciprocal of the frequency, f=1/T.

<span class="mw-page-title-main">Intel 8088</span> Intel microprocessor model

The Intel 8088 microprocessor is a variant of the Intel 8086. Introduced on June 1, 1979, the 8088 has an eight-bit external data bus instead of the 16-bit bus of the 8086. The 16-bit registers and the one megabyte address range are unchanged, however. In fact, according to the Intel documentation, the 8086 and 8088 have the same execution unit (EU)—only the bus interface unit (BIU) is different. The 8088 was used in the original IBM PC and in IBM PC compatible clones.

i486 Successor to the Intel 386

The Intel 486, officially named i486 and also known as 80486, is a microprocessor. It is a higher-performance follow-up to the Intel 386. The i486 was introduced in 1989. It represents the fourth generation of binary compatible CPUs following the 8086 of 1978, the Intel 80286 of 1982, and 1985's i386.

<span class="mw-page-title-main">Instructions per second</span> Measure of a computers processing speed

Instructions per second (IPS) is a measure of a computer's processor speed. For complex instruction set computers (CISCs), different instructions take different amounts of time, so the value measured depends on the instruction mix; even for comparing processors in the same family the IPS measurement can be problematic. Many reported IPS values have represented "peak" execution rates on artificial instruction sequences with few branches and no cache contention, whereas realistic workloads typically lead to significantly lower IPS values. Memory hierarchy also greatly affects processor performance, an issue barely considered in IPS calculations. Because of these problems, synthetic benchmarks such as Dhrystone are now generally used to estimate computer performance in commonly used applications, and raw IPS has fallen into disuse.

In computer engineering, instruction pipelining is a technique for implementing instruction-level parallelism within a single processor. Pipelining attempts to keep every part of the processor busy with some instruction by dividing incoming instructions into a series of sequential steps performed by different processor units with different parts of instructions processed in parallel.

In computing, the clock rate or clock speed typically refers to the frequency at which the clock generator of a processor can generate pulses, which are used to synchronize the operations of its components, and is used as an indicator of the processor's speed. It is measured in the SI unit of frequency hertz (Hz).

In the history of computer hardware, some early reduced instruction set computer central processing units used a very similar architectural solution, now called a classic RISC pipeline. Those CPUs were: MIPS, SPARC, Motorola 88000, and later the notional CPU DLX invented for education.

The DLX is a RISC processor architecture designed by John L. Hennessy and David A. Patterson, the principal designers of the Stanford MIPS and the Berkeley RISC designs (respectively), the two benchmark examples of RISC design.

<span class="mw-page-title-main">Emotion Engine</span> Central processing unit by Sony Computer Entertainment and Toshiba

The Emotion Engine is a central processing unit developed and manufactured by Sony Computer Entertainment and Toshiba for use in the PlayStation 2 video game console. It was also used in early PlayStation 3 models sold in Japan and North America to provide PlayStation 2 game support. Mass production of the Emotion Engine began in 1999 and ended in late 2012 with the discontinuation of the PlayStation 2.

<span class="mw-page-title-main">Microarchitecture</span> Component of computer engineering

In electronics, computer science and computer engineering, microarchitecture, also called computer organization and sometimes abbreviated as µarch or uarch, is the way a given instruction set architecture (ISA) is implemented in a particular processor. A given ISA may be implemented with different microarchitectures; implementations may vary due to different goals of a given design or due to shifts in technology.

In computer architecture, speedup is a number that measures the relative performance of two systems processing the same problem. More technically, it is the improvement in speed of execution of a task executed on two similar architectures with different resources. The notion of speedup was established by Amdahl's law, which was particularly focused on parallel processing. However, speedup can be used more generally to show the effect on performance after any resource enhancement.

<span class="mw-page-title-main">POWER4</span> 2001 family of microprocessors by IBM

The POWER4 is a microprocessor developed by International Business Machines (IBM) that implemented the 64-bit PowerPC and PowerPC AS instruction set architectures. Released in 2001, the POWER4 succeeded the POWER3 and RS64 microprocessors, enabling RS/6000 and eServer iSeries models of AS/400 computer servers to run on the same processor, as a step toward converging the two lines. The POWER4 was a multicore microprocessor, with two cores on a single die, the first non-embedded microprocessor to do so. POWER4 Chip was first commercially available multiprocessor chip. The original POWER4 had a clock speed of 1.1 and 1.3 GHz, while an enhanced version, the POWER4+, reached a clock speed of 1.9 GHz. The PowerPC 970 is a derivative of the POWER4.

<span class="mw-page-title-main">R10000</span> MIPS microprocessor

The R10000, code-named "T5", is a RISC microprocessor implementation of the MIPS IV instruction set architecture (ISA) developed by MIPS Technologies, Inc. (MTI), then a division of Silicon Graphics, Inc. (SGI). The chief designers are Chris Rowen and Kenneth C. Yeager. The R10000 microarchitecture is known as ANDES, an abbreviation for Architecture with Non-sequential Dynamic Execution Scheduling. The R10000 largely replaces the R8000 in the high-end and the R4400 elsewhere. MTI was a fabless semiconductor company; the R10000 was fabricated by NEC and Toshiba. Previous fabricators of MIPS microprocessors such as Integrated Device Technology (IDT) and three others did not fabricate the R10000 as it was more expensive to do so than the R4000 and R4400.

<span class="mw-page-title-main">R4000</span> MIPS microprocessor

The R4000 is a microprocessor developed by MIPS Computer Systems that implements the MIPS III instruction set architecture (ISA). Officially announced on 1 October 1991, it was one of the first 64-bit microprocessors and the first MIPS III implementation. In the early 1990s, when RISC microprocessors were expected to replace CISC microprocessors such as the Intel i486, the R4000 was selected to be the microprocessor of the Advanced Computing Environment (ACE), an industry standard that intended to define a common RISC platform. ACE ultimately failed for a number of reasons, but the R4000 found success in the workstation and server markets.

The R8000 is a microprocessor chipset developed by MIPS Technologies, Inc. (MTI), Toshiba, and Weitek. It was the first implementation of the MIPS IV instruction set architecture. The R8000 is also known as the TFP, for Tremendous Floating-Point, its name during development.

In computer architecture, frequency scaling is the technique of increasing a processor's frequency so as to enhance the performance of the system containing the processor in question. Frequency ramping was the dominant force in commodity processor performance increases from the mid-1980s until roughly the end of 2004.

In computing, computer performance is the amount of useful work accomplished by a computer system. Outside of specific contexts, computer performance is estimated in terms of accuracy, efficiency and speed of executing computer program instructions. When it comes to high computer performance, one or more of the following factors might be involved:

In computer architecture, the iron law of processor performance describes the performance trade-off between complexity and the number of primitive instructions that processors use to perform calculations. This formulation of the trade-off spurred the development of Reduced Instruction Set Computers (RISC) whose instruction set architectures (ISAs) leverage a smaller set of core instructions to improve performance. The term was coined by Douglas Clark based on research performed by Clark and Joel Emer in the 1980s.

References

  1. Patterson, David A.; Hennessy, John L. (1994). Computer Organization and Design: The Hardware/Software Interface . ISBN   9781558602816.
  2. Advanced Computer Architecture by Kai Hwang, Chapter 1, Exercise Problem 1.1