Instructions per cycle

Last updated

In computer architecture, instructions per cycle (IPC), commonly called instructions per clock, is one aspect of a processor's performance: the average number of instructions executed for each clock cycle. It is the multiplicative inverse of cycles per instruction. [1] [2] [3]

Contents

Explanation

While early generations of CPUs carried out all the steps to execute an instruction sequentially, modern CPUs can do many things in parallel. As it is impossible to just keep doubling the speed of the clock, instruction pipelining and superscalar processor design have evolved so CPUs can use a variety of execution units in parallel - looking ahead through the incoming instructions in order to optimise them. This leads to the instructions per cycle completed being much higher than 1 and is responsible for much of the speed improvements in subsequent CPU generations.

Calculation of IPC

The calculation of IPC is done through running a set piece of code, calculating the number of machine-level instructions required to complete it, then using high-performance timers to calculate the number of clock cycles required to complete it on the actual hardware. The final result comes from dividing the number of instructions by the number of CPU clock cycles.

The number of instructions per second and floating point operations per second for a processor can be derived by multiplying the number of instructions per cycle with the clock rate (cycles per second given in Hertz) of the processor in question. The number of instructions per second is an approximate indicator of the likely performance of the processor.

The number of instructions executed per clock is not a constant for a given processor; it depends on how the particular software being run interacts with the processor, and indeed the entire machine, particularly the memory hierarchy. However, certain processor features tend to lead to designs that have higher-than-average IPC values; the presence of multiple arithmetic logic units (an ALU is a processor subsystem that can perform elementary arithmetic and logical operations), and short pipelines. When comparing different instruction sets, a simpler instruction set may lead to a higher IPC figure than an implementation of a more complex instruction set using the same chip technology; however, the more complex instruction set may be able to achieve more useful work with fewer instructions. As such comparing IPC figures between different instruction sets (for example x86 vs ARM) is usually meaningless.

Computer speed

The useful work that can be done with any computer depends on many factors besides the processor speed. These factors include the instruction set architecture, the processor's microarchitecture, and the computer system organization (such as the design of the disk storage system and the capabilities and performance of other attached devices), the efficiency of the operating system, and the high-level design of application software.

For computer users and purchasers, application benchmarks, rather than instructions per cycle, are typically a much more useful indication of system performance. However, IPC does provide an example of why clock speed is not the only factor relevant to computer performance.

See also

Related Research Articles

<span class="mw-page-title-main">Central processing unit</span> Central computer component which executes instructions

A central processing unit (CPU) is the most important processor in a given computer. Its electronic circuitry executes instructions of a computer program, such as arithmetic, logic, controlling, and input/output (I/O) operations. This role contrasts with that of external components, such as main memory and I/O circuitry, and specialized coprocessors such as graphics processing units (GPUs).

<span class="mw-page-title-main">Instructions per second</span> Measure of a computers processing speed

Instructions per second (IPS) is a measure of a computer's processor speed. For complex instruction set computers (CISCs), different instructions take different amounts of time, so the value measured depends on the instruction mix; even for comparing processors in the same family the IPS measurement can be problematic. Many reported IPS values have represented "peak" execution rates on artificial instruction sequences with few branches and no cache contention, whereas realistic workloads typically lead to significantly lower IPS values. Memory hierarchy also greatly affects processor performance, an issue barely considered in IPS calculations. Because of these problems, synthetic benchmarks such as Dhrystone are now generally used to estimate computer performance in commonly used applications, and raw IPS has fallen into disuse.

In computer science, an instruction set architecture (ISA) is a part of the abstract model of a computer, which generally defines how software controls the CPU. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an implementation.

<span class="mw-page-title-main">Superscalar processor</span> CPU that implements instruction-level parallelism within a single processor

A superscalar processor is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor, which can execute at most one single instruction per clock cycle, a superscalar processor can execute more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to different execution units on the processor. It therefore allows more throughput than would otherwise be possible at a given clock rate. Each execution unit is not a separate processor, but an execution resource within a single CPU such as an arithmetic logic unit.

<span class="mw-page-title-main">Parallel computing</span> Programming paradigm in which many processes are executed simultaneously

Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different forms of parallel computing: bit-level, instruction-level, data, and task parallelism. Parallelism has long been employed in high-performance computing, but has gained broader interest due to the physical constraints preventing frequency scaling. As power consumption by computers has become a concern in recent years, parallel computing has become the dominant paradigm in computer architecture, mainly in the form of multi-core processors.

<span class="mw-page-title-main">Clock rate</span> Frequency at which CPU chip or core is operating

In computing, the clock rate or clock speed typically refers to the frequency at which the clock generator of a processor can generate pulses, which are used to synchronize the operations of its components, and is used as an indicator of the processor's speed. It is measured in the SI unit of frequency hertz (Hz).

In the history of computer hardware, some early reduced instruction set computer central processing units used a very similar architectural solution, now called a classic RISC pipeline. Those CPUs were: MIPS, SPARC, Motorola 88000, and later the notional CPU DLX invented for education.

The DLX is a RISC processor architecture designed by John L. Hennessy and David A. Patterson, the principal designers of the Stanford MIPS and the Berkeley RISC designs (respectively), the two benchmark examples of RISC design.

A translation lookaside buffer (TLB) is a memory cache that stores the recent translations of virtual memory to physical memory. It is used to reduce the time taken to access a user memory location. It can be called an address-translation cache. It is a part of the chip's memory-management unit (MMU). A TLB may reside between the CPU and the CPU cache, between CPU cache and the main memory or between the different levels of the multi-level cache. The majority of desktop, laptop, and server processors include one or more TLBs in the memory-management hardware, and it is nearly always present in any processor that utilizes paged or segmented virtual memory.

<span class="mw-page-title-main">Microarchitecture</span> Component of computer engineering

In electronics, computer science and computer engineering, microarchitecture, also called computer organization and sometimes abbreviated as µarch or uarch, is the way a given instruction set architecture (ISA) is implemented in a particular processor. A given ISA may be implemented with different microarchitectures; implementations may vary due to different goals of a given design or due to shifts in technology.

In computer architecture, speedup is a number that measures the relative performance of two systems processing the same problem. More technically, it is the improvement in speed of execution of a task executed on two similar architectures with different resources. The notion of speedup was established by Amdahl's law, which was particularly focused on parallel processing. However, speedup can be used more generally to show the effect on performance after any resource enhancement.

<span class="mw-page-title-main">Benchmark (computing)</span> Comparing the relative performance of computers by running the same program on all of them

In computing, a benchmark is the act of running a computer program, a set of programs, or other operations, in order to assess the relative performance of an object, normally by running a number of standard tests and trials against it.

The megahertz myth, or in more recent cases the gigahertz myth, refers to the misconception of only using clock rate to compare the performance of different microprocessors. While clock rates are a valid way of comparing the performance of different speeds of the same model and type of processor, other factors such as an amount of execution units, pipeline depth, cache hierarchy, branch prediction, and instruction sets can greatly affect the performance when considering different processors. For example, one processor may take two clock cycles to add two numbers and another clock cycle to multiply by a third number, whereas another processor may do the same calculation in two clock cycles. Comparisons between different types of processors are difficult because performance varies depending on the type of task. A benchmark is a more thorough way of measuring and comparing computer performance.

In computer architecture, cycles per instruction is one aspect of a processor's performance: the average number of clock cycles per instruction for a program or program fragment. It is the multiplicative inverse of instructions per cycle.

In computing, computer performance is the amount of useful work accomplished by a computer system. Outside of specific contexts, computer performance is estimated in terms of accuracy, efficiency and speed of executing computer program instructions. When it comes to high computer performance, one or more of the following factors might be involved:

<span class="mw-page-title-main">History of general-purpose CPUs</span>

The history of general-purpose CPUs is a continuation of the earlier history of computing hardware.

<span class="mw-page-title-main">Computer architecture</span> Set of rules describing computer system

In computer science and computer engineering, computer architecture is a description of the structure of a computer system made from component parts. It can sometimes be a high-level description that ignores details of the implementation. At a more detailed level, the description may include the instruction set architecture design, microarchitecture design, logic design, and implementation.

<span class="mw-page-title-main">MikroSim</span>

MikroSim is an educational software computer program for hardware-non-specific explanation of the general functioning and behaviour of a virtual processor, running on the Microsoft Windows operating system. Devices like miniaturized calculators, microcontroller, microprocessors, and computer can be explained on custom-developed instruction code on a register transfer level controlled by sequences of micro instructions (microcode). Based on this it is possible to develop an instruction set to control a virtual application board at higher level of abstraction.

Latency oriented processor architecture is the microarchitecture of a microprocessor designed to serve a serial computing thread with a low latency. This is typical of most central processing units (CPU) being developed since the 1970s. These architectures, in general, aim to execute as many instructions as possible belonging to a single serial thread, in a given window of time; however, the time to execute a single instruction completely from fetch to retire stages may vary from a few cycles to even a few hundred cycles in some cases. Latency oriented processor architectures are the opposite of throughput-oriented processors which concern themselves more with the total throughput of the system, rather than the service latencies for all individual threads that they work on.

<span class="mw-page-title-main">Cache hierarchy</span> Memory hierarchy concept applied to CPU caches with multiple levels

Cache hierarchy, or multi-level cache, is a memory architecture that uses a hierarchy of memory stores based on varying access speeds to cache data. Highly requested data is cached in high-speed access memory stores, allowing swifter access by central processing unit (CPU) cores.

References

  1. Patterson, David A.; Hennessy, John L. (2014). Computer organization and design: the hardware/software interface. The Morgan Kaufmann series in computer architecture and design (5th ed.). Amsterdam ; Boston: Elsevier/Morgan Kaufmann, Morgan Kaufmann is an imprint of Elsevier. ISBN   978-0-12-407726-3. OCLC   859555917.
  2. Hennessy, John L.; Patterson, David A. (2006-11-03). Computer Architecture: A Quantitative Approach. Elsevier. ISBN   978-0-08-047502-8.
  3. Stallings, William (2016). Computer organization and architecture: designing for performance (Tenth ed.). Boston: Pearson-Prentice Hall. ISBN   978-0-13-410161-3.