Cycles per instruction

Last updated October 03, 2024

In computer architecture, cycles per instruction (aka clock cycles per instruction, clocks per instruction, or CPI) is one aspect of a processor's performance: the average number of clock cycles per instruction for a program or program fragment.^[1] It is the multiplicative inverse of instructions per cycle.

Definition

The average of Cycles Per Instruction in a given process ( $CPI$ ) is defined by the following weighted average:

\mathrm {CPI

Where $\mathrm {IC} _{i}$ is the number of instructions for a given instruction type $i$ , $\mathrm {CC} _{i}$ is the clock-cycles for that instruction type and $\mathrm {IC} =\Sigma _{i}(\mathrm {IC} _{i})$ is the total instruction count. The summation sums over all instruction types for a given benchmarking process.

Explanation

Let us assume a classic RISC pipeline, with the following five stages:

Instruction fetch cycle (IF).
Instruction decode/Register fetch cycle (ID).
Execution/Effective address cycle (EX).
Memory access (MEM).
Write-back cycle (WB).

Each stage requires one clock cycle and an instruction passes through the stages sequentially. Without pipelining, in a multi-cycle processor, a new instruction is fetched in stage 1 only after the previous instruction finishes at stage 5, therefore the number of clock cycles it takes to execute an instruction is five (CPI = 5 > 1). In this case, the processor is said to be subscalar. With pipelining, a new instruction is fetched every clock cycle by exploiting instruction-level parallelism, therefore, since one could theoretically have five instructions in the five pipeline stages at once (one instruction per stage), a different instruction would complete stage 5 in every clock cycle and on average the number of clock cycles it takes to execute an instruction is 1 (CPI = 1). In this case, the processor is said to be scalar.

With a single-execution-unit processor, the best CPI attainable is 1. However, with a multiple-execution-unit processor, one may achieve even better CPI values (CPI < 1). In this case, the processor is said to be superscalar . To get better CPI values without pipelining, the number of execution units must be greater than the number of stages. For example, with six executions units, six new instructions are fetched in stage 1 only after the six previous instructions finish at stage 5, therefore on average the number of clock cycles it takes to execute an instruction is 5/6 (CPI = 5/6 < 1). To get better CPI values with pipelining, there must be at least two execution units. For example, with two executions units, two new instructions are fetched every clock cycle by exploiting instruction-level parallelism, therefore two different instructions would complete stage 5 in every clock cycle and on average the number of clock cycles it takes to execute an instruction is 1/2 (CPI = 1/2 < 1).

Examples

Example 1

For the multi-cycle MIPS, there are five types of instructions:

Load (5 cycles)
Store (4 cycles)
R-type (4 cycles)
Branch (3 cycles)
Jump (3 cycles)

If a program has:

50% load instructions
25% store instructions
15% R-type instructions
8% branch instructions
2% jump instructions

then, the CPI is:

${\text{CPI}}={\frac {5\times 50+4\times 25+4\times 15+3\times 8+3\times 2}{100}}=4.4$

Example 2

^[2] A 400MHz processor was used to execute a benchmark program with the following instruction mix and clock cycle count:

Instruction TYPE	Instruction count	Clock cycle count
Integer Arithmetic	45000	1
Data transfer	32000	2
Floating point	15000	2
Control transfer	8000	2

Determine the effective CPI, MIPS (Millions of instructions per second) rate, and execution time for this program.

${\text{CPI}}={\frac {45000\times 1+32000\times 2+15000\times 2+8000\times 2}{100000}}={\frac {155000}{100000}}=1.55$

$400\,{\text{MHz}}=400,000,000\,{\text{Hz}}$

since: ${\text{MIPS}}\propto 1/{\text{CPI}}$ and ${\text{MIPS}}\propto {\text{clock frequency}}$

${\text{Effective processor performance}}={\text{MIPS}}={\frac {\text{clock frequency}}{\text{CPI}}}\times {\frac {1}{\text{1 Million}}}$ $={\frac {400,000,000}{1.55\times 1000000}}={\frac {400}{1.55}}=258\,{\text{MIPS}}$

Therefore:

${\text{Execution time}}(T)={\text{CPI}}\times {\text{Instruction count}}\times {\text{clock time}}={\frac {{\text{CPI}}\times {\text{Instruction Count}}}{\text{frequency}}}$ $={\frac {1.55\times 100000}{400\times 1000000}}={\frac {1.55}{4000}}=0.0003875\,{\text{sec}}=0.3875\,{\text{ms}}$

Related Research Articles

A central processing unit (CPU), also called a central processor, main processor, or just processor, is the most important processor in a given computer. Its electronic circuitry executes instructions of a computer program, such as arithmetic, logic, controlling, and input/output (I/O) operations. This role contrasts with that of external components, such as main memory and I/O circuitry, and specialized coprocessors such as graphics processing units (GPUs).

Conversion of units is the conversion of the unit of measurement in which a quantity is expressed, typically through a multiplicative conversion factor that changes the unit without changing the quantity. This is also often loosely taken to include replacement of a quantity with a corresponding quantity that describes the same physical property.

<span class="mw-page-title-main">Frequency</span> Number of occurrences or cycles per unit time

Frequency, most often measured in hertz, is the number of occurrences of a repeating event per unit of time. It is also occasionally referred to as temporal frequency for clarity and to distinguish it from spatial frequency. Ordinary frequency is related to angular frequency by a factor of 2 $π$ . The period is the interval of time between events, so the period is the reciprocal of the frequency: T = 1/f.

The Intel 8088 microprocessor is a variant of the Intel 8086. Introduced on June 1, 1979, the 8088 has an eight-bit external data bus instead of the 16-bit bus of the 8086. The 16-bit registers and the one megabyte address range are unchanged, however. In fact, according to the Intel documentation, the 8086 and 8088 have the same execution unit (EU)—only the bus interface unit (BIU) is different. The 8088 was used in the original IBM PC and in IBM PC compatible clones.

Instructions per second (IPS) is a measure of a computer's processor speed. For complex instruction set computers (CISCs), different instructions take different amounts of time, so the value measured depends on the instruction mix; even for comparing processors in the same family the IPS measurement can be problematic. Many reported IPS values have represented "peak" execution rates on artificial instruction sequences with few branches and no cache contention, whereas realistic workloads typically lead to significantly lower IPS values. Memory hierarchy also greatly affects processor performance, an issue barely considered in IPS calculations. Because of these problems, synthetic benchmarks such as Dhrystone are now generally used to estimate computer performance in commonly used applications, and raw IPS has fallen into disuse.

In computer engineering, instruction pipelining is a technique for implementing instruction-level parallelism within a single processor. Pipelining attempts to keep every part of the processor busy with some instruction by dividing incoming instructions into a series of sequential steps performed by different processor units with different parts of instructions processed in parallel.

In physics and engineering, the quality factor or Q factor is a dimensionless parameter that describes how underdamped an oscillator or resonator is. It is defined as the ratio of the initial energy stored in the resonator to the energy lost in one radian of the cycle of oscillation. Q factor is alternatively defined as the ratio of a resonator's centre frequency to its bandwidth when subject to an oscillating driving force. These two definitions give numerically similar, but not identical, results. Higher Q indicates a lower rate of energy loss and the oscillations die out more slowly. A pendulum suspended from a high-quality bearing, oscillating in air, has a high Q, while a pendulum immersed in oil has a low one. Resonators with high quality factors have low damping, so that they ring or vibrate longer.

In the history of computer hardware, some early reduced instruction set computer central processing units used a very similar architectural solution, now called a classic RISC pipeline. Those CPUs were: MIPS, SPARC, Motorola 88000, and later the notional CPU DLX invented for education.

The DLX is a RISC processor architecture designed by John L. Hennessy and David A. Patterson, the principal designers of the Stanford MIPS and the Berkeley RISC designs (respectively), the two benchmark examples of RISC design.

The Emotion Engine is a central processing unit developed and manufactured by Sony Computer Entertainment and Toshiba for use in the PlayStation 2 video game console. It was also used in early PlayStation 3 models sold in Japan and North America to provide PlayStation 2 game support. Mass production of the Emotion Engine began in 1999 and ended in late 2012 with the discontinuation of the PlayStation 2.

Revolutions per minute is a unit of rotational speed for rotating machines. One revolution per minute is equivalent to ⁠1/60⁠ hertz.

In electronics, computer science and computer engineering, microarchitecture, also called computer organization and sometimes abbreviated as μarch or uarch, is the way a given instruction set architecture (ISA) is implemented in a particular processor. A given ISA may be implemented with different microarchitectures; implementations may vary due to different goals of a given design or due to shifts in technology.

In computer architecture, speedup is a number that measures the relative performance of two systems processing the same problem. More technically, it is the improvement in speed of execution of a task executed on two similar architectures with different resources. The notion of speedup was established by Amdahl's law, which was particularly focused on parallel processing. However, speedup can be used more generally to show the effect on performance after any resource enhancement.

<span class="mw-page-title-main">POWER4</span> 2001 family of microprocessors by IBM

The POWER4 is a microprocessor developed by International Business Machines (IBM) that implemented the 64-bit PowerPC and PowerPC AS instruction set architectures. Released in 2001, the POWER4 succeeded the POWER3 and RS64 microprocessors, enabling RS/6000 and eServer iSeries models of AS/400 computer servers to run on the same processor, as a step toward converging the two lines. The POWER4 was a multicore microprocessor, with two cores on a single die, the first non-embedded microprocessor to do so. POWER4 Chip was first commercially available multiprocessor chip. The original POWER4 had a clock speed of 1.1 and 1.3 GHz, while an enhanced version, the POWER4+, reached a clock speed of 1.9 GHz. The PowerPC 970 is a derivative of the POWER4.

The R10000, code-named "T5", is a RISC microprocessor implementation of the MIPS IV instruction set architecture (ISA) developed by MIPS Technologies, Inc. (MTI), then a division of Silicon Graphics, Inc. (SGI). The chief designers are Chris Rowen and Kenneth C. Yeager. The R10000 microarchitecture is known as ANDES, an abbreviation for Architecture with Non-sequential Dynamic Execution Scheduling. The R10000 largely replaces the R8000 in the high-end and the R4400 elsewhere. MTI was a fabless semiconductor company; the R10000 was fabricated by NEC and Toshiba. Previous fabricators of MIPS microprocessors such as Integrated Device Technology (IDT) and three others did not fabricate the R10000 as it was more expensive to do so than the R4000 and R4400.

The R4000 is a microprocessor developed by MIPS Computer Systems that implements the MIPS III instruction set architecture (ISA). Officially announced on 1 October 1991, it was one of the first 64-bit microprocessors and the first MIPS III implementation. In the early 1990s, when RISC microprocessors were expected to replace CISC microprocessors such as the Intel i486, the R4000 was selected to be the microprocessor of the Advanced Computing Environment (ACE), an industry standard that intended to define a common RISC platform. ACE ultimately failed for a number of reasons, but the R4000 found success in the workstation and server markets.

The R8000 is a microprocessor chipset developed by MIPS Technologies, Inc. (MTI), Toshiba, and Weitek. It was the first implementation of the MIPS IV instruction set architecture. The R8000 is also known as the TFP, for Tremendous Floating-Point, its name during development.

In computer architecture, frequency scaling is the technique of increasing a processor's frequency so as to enhance the performance of the system containing the processor in question. Frequency ramping was the dominant force in commodity processor performance increases from the mid-1980s until roughly the end of 2004.

In computing, computer performance is the amount of useful work accomplished by a computer system. Outside of specific contexts, computer performance is estimated in terms of accuracy, efficiency and speed of executing computer program instructions. When it comes to high computer performance, one or more of the following factors might be involved:

In computer architecture, the iron law of processor performance describes the performance trade-off between complexity and the number of primitive instructions that processors use to perform calculations. This formulation of the trade-off spurred the development of Reduced Instruction Set Computers (RISC) whose instruction set architectures (ISAs) leverage a smaller set of core instructions to improve performance. The term was coined by Douglas Clark based on research performed by Clark and Joel Emer in the 1980s.

References

↑ Patterson, David A.; Hennessy, John L. (1994). Computer Organization and Design: The Hardware/Software Interface . Morgan Kaufmann. ISBN 9781558602816.
↑ Advanced Computer Architecture by Kai Hwang, Chapter 1, Exercise Problem 1.1

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Patterson, David A.; Hennessy, John L. (1994). Computer Organization and Design: The Hardware/Software Interface . Morgan Kaufmann. ISBN 9781558602816.

[2] Advanced Computer Architecture by Kai Hwang, Chapter 1, Exercise Problem 1.1

[1]

[2]