Pipeline burst cache

Last updated July 21, 2024

In computer engineering, the creation and development of the pipeline burst cache memory is an integral part in the development of the superscalar architecture. It was introduced in the mid 1990s as a replacement for the Synchronous Burst Cache and the Asynchronous Cache and is still in use today in computers. It basically increases the speed of the operation of the cache memory by minimizing the wait states and hence maximizing the processor computing speed. Implementing the techniques of pipelining and bursting, high performance computing is assured. It works on the principle of parallelism, the very principle on which the development of superscalar architecture rests. Pipeline burst cache can be found in DRAM controllers and chipset designs.^[1]

Introduction

In a processor-based system, the speed of the processor is always more than that of the main memory. As a result, unnecessary wait-states are developed when instructions or data are being fetched from the main memory. This causes a hampering of the performance of the system. A cache memory is basically developed to increase the efficiency of the system and to maximise the utilisation of the entire computational speed of the processor.^[2]

The performance of the processor is highly influenced by the methods employed to transfer data and instructions to and from the processor. The less the time needed for the transfers the better the processor performance.

The pipeline burst cache is basically a storage area for a processor that is designed to be read from or written to in a pipelined succession of four data transfers. As the name suggests 'pipelining', the transfers after the first transfer happen before the first transfer has arrived at the processor. It was developed as an alternative to asynchronous cache and synchronous burst cache.

Pipeline burst cache gained widespread adoption starting with the release of the Intel 430FX chipset in 1995.

Principles of operation

The pipeline burst cache is based on two principles of operation, namely:

Burst mode

In this mode, the memory contents are prefetched before they are requested.
For a typical cache, each line is 32 bytes wide meaning that, transfers, to and from the cache, occur 32 bytes (256 bits) at a time. The data paths are however only 8 bytes wide. This means that four operations are needed for a single cache transfer. If not for burst mode each transfer would require a separate address to be provided. But since the transfers are to be done from consecutive memory locations there is no need to specify a different address after the first one. Using the technique of Bursting, the transfers of successive data bytes can take place without specifying the remaining addresses. This helps in speed improvement.^[3]

Pipelining mode

In this mode, one memory value can be accessed in Cache at the same time that another memory value is accessed in DRAM. The pipelining operation suggests that the transfer of data and instructions from or to the cache is divided into stages. Each stage is kept busy by one operation all the time. This is just like the concept used in an assembly line. This operation overcame the defects of sequential memory operations which involved a lot of time wastage and decrease in the processor speed.^[4]

Operation

With the help of the above two principles of operations explained, a pipeline burst cache is implemented. In this cache, transferring of data, from or to a new location, takes multiple cycles for initial transfer but subsequent transfers are done in a single cycle.^[5]^[6]

Trade-off

The circuitry involved in this cache is very complex due to the simultaneous involvement of pipelining and burst mode. Hence, more time is required initially to set up the "pipeline".

Related Research Articles

A central processing unit (CPU), also called a central processor, main processor, or just processor, is the most important processor in a given computer. Its electronic circuitry executes instructions of a computer program, such as arithmetic, logic, controlling, and input/output (I/O) operations. This role contrasts with that of external components, such as main memory and I/O circuitry, and specialized coprocessors such as graphics processing units (GPUs).

A complex instruction set computer is a computer architecture in which single instructions can execute several low-level operations or are capable of multi-step operations or addressing modes within single instructions. The term was retroactively coined in contrast to reduced instruction set computer (RISC) and has therefore become something of an umbrella term for everything that is not RISC, where the typical differentiating characteristic is that most RISC designs use uniform instruction length for almost all instructions, and employ strictly separate load and store instructions.

The Pentium is a x86 microprocessor introduced by Intel on March 22, 1993. It is the first CPU using the Pentium brand. Considered the fifth generation in the 8086 compatible line of processors, its implementation and microarchitecture was internally called P5.

Direct memory access (DMA) is a feature of computer systems that allows certain hardware subsystems to access main system memory independently of the central processing unit (CPU).

Static random-access memory is a type of random-access memory (RAM) that uses latching circuitry (flip-flop) to store each bit. SRAM is volatile memory; data is lost when power is removed.

<span class="mw-page-title-main">Dynamic random-access memory</span> Type of computer memory

Dynamic random-access memory is a type of random-access semiconductor memory that stores each bit of data in a memory cell, usually consisting of a tiny capacitor and a transistor, both typically based on metal–oxide–semiconductor (MOS) technology. While most DRAM memory cell designs use a capacitor and transistor, some only use two transistors. In the designs where a capacitor is used, the capacitor can either be charged or discharged; these two states are taken to represent the two values of a bit, conventionally called 0 and 1. The electric charge on the capacitors gradually leaks away; without intervention the data on the capacitor would soon be lost. To prevent this, DRAM requires an external memory refresh circuit which periodically rewrites the data in the capacitors, restoring them to their original charge. This refresh process is the defining characteristic of dynamic random-access memory, in contrast to static random-access memory (SRAM) which does not require data to be refreshed. Unlike flash memory, DRAM is volatile memory, since it loses its data quickly when power is removed. However, DRAM does exhibit limited data remanence.

Synchronous dynamic random-access memory is any DRAM where the operation of its external pin interface is coordinated by an externally supplied clock signal.

Instruction-level parallelism (ILP) is the parallel or simultaneous execution of a sequence of instructions in a computer program. More specifically ILP refers to the average number of instructions run per step of this parallel execution.

In the history of computer hardware, some early reduced instruction set computer central processing units used a very similar architectural solution, now called a classic RISC pipeline. Those CPUs were: MIPS, SPARC, Motorola 88000, and later the notional CPU DLX invented for education.

A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, which stores copies of the data from frequently used main memory locations. Most CPUs have a hierarchy of multiple cache levels, with different instruction-specific and data-specific caches at level 1. The cache memory is typically implemented with static random-access memory (SRAM), in modern CPUs by far the largest part of them by chip area, but SRAM is not always used for all levels, or even any level, sometimes some latter or all levels are implemented with eDRAM.

Semiconductor memory is a digital electronic semiconductor device used for digital data storage, such as computer memory. It typically refers to devices in which data is stored within metal–oxide–semiconductor (MOS) memory cells on a silicon integrated circuit memory chip. There are numerous different types using different semiconductor technologies. The two main types of random-access memory (RAM) are static RAM (SRAM), which uses several transistors per memory cell, and dynamic RAM (DRAM), which uses a transistor and a MOS capacitor per cell. Non-volatile memory uses floating-gate memory cells, which consist of a single floating-gate transistor per cell.

Column address strobe latency, also called CAS latency or CL, is the delay in clock cycles between the READ command and the moment data is available. In asynchronous DRAM, the interval is specified in nanoseconds. In synchronous DRAM, the interval is specified in clock cycles. Because the latency is dependent upon a number of clock ticks instead of absolute time, the actual time for an SDRAM module to respond to a CAS event might vary between uses of the same module if the clock rate differs.

XDR DRAM is a high-performance dynamic random-access memory interface. It is based on and succeeds RDRAM. Competing technologies include DDR2 and GDDR4.

The R8000 is a microprocessor chipset developed by MIPS Technologies, Inc. (MTI), Toshiba, and Weitek. It was the first implementation of the MIPS IV instruction set architecture. The R8000 is also known as the TFP, for Tremendous Floating-Point, its name during development.

The history of general-purpose CPUs is a continuation of the earlier history of computing hardware.

The Alpha 21164, also known by its code name, EV5, is a microprocessor developed and fabricated by Digital Equipment Corporation that implemented the Alpha instruction set architecture (ISA). It was introduced in January 1995, succeeding the Alpha 21064A as Digital's flagship microprocessor. It was succeeded by the Alpha 21264 in 1998.

The Alpha 21264 is a RISC microprocessor developed by Digital Equipment Corporation launched on 19 October 1998. The 21264 implemented the Alpha instruction set architecture (ISA).

Low-Power Double Data Rate (LPDDR), also known as LPDDR SDRAM, is a type of synchronous dynamic random-access memory that consumes less power and is targeted for mobile computers and devices such as mobile phones. Older variants are also known as Mobile DDR, and abbreviated as mDDR.

Apollo VP3 is a x86 based Socket 7 chipset which was manufactured by VIA Technologies and was launched in 1997. On its time Apollo VP3 was a high performance, cost effective, and energy efficient chipset. It offered AGP support for Socket 7 processors which was not supported at that moment by Intel, SiS and ALi chipsets. In November 1997 FIC released motherboard PA-2012, which uses Apollo VP3 and has AGP bus. This was the first Socket 7 motherboard supporting AGP.

The WD16 is a 16-bit microprocessor introduced by Western Digital in October 1976. It is based on the MCP-1600 chipset, a general-purpose design that was also used to implement the DEC LSI-11 low-end minicomputer and the Pascal MicroEngine processor. The three systems differed primarily in their microcode, giving each system a unique instruction set architecture (ISA).

References

External links

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "Network dictionary".

[2] "How cache works". April 2000.

[3] "Cache Bursting". Pcguide.

[4] "Modes of operation". 28 July 1997.

[5] "Operation".

[6] "Pipeline Burst Cache". Pcguide.

[1]

[2]

[3]

[4]

[5]

[6]