Memory bandwidth

Last updated

Memory bandwidth is the rate at which data can be read from or stored into a semiconductor memory by a processor. Memory bandwidth is usually expressed in units of bytes/second, though this can vary for systems with natural data sizes that are not a multiple of the commonly used 8-bit bytes.

Contents

Memory bandwidth that is advertised for a given memory or system is usually the maximum theoretical bandwidth. In practice the observed memory bandwidth will be less than (and is guaranteed not to exceed) the advertised bandwidth. A variety of computer benchmarks exist to measure sustained memory bandwidth using a variety of access patterns. These are intended to provide insight into the memory bandwidth that a system should sustain on various classes of real applications.

Measurement conventions

There are three different conventions for defining the quantity of data transferred in the numerator of "bytes/second":

  1. The bcopy convention: counts the amount of data copied from one location in memory to another location per unit time. For example, copying 1 million bytes from one location in memory to another location in memory in one second would be counted as 1 million bytes per second. The bcopy convention is self-consistent, but is not easily extended to cover cases with more complex access patterns, for example three reads and one write.
  2. The Stream convention: sums the amount of data that the application code explicitly reads plus the amount of data that the application code explicitly writes. [1] Using the previous 1 million byte copy example, the STREAM bandwidth would be counted as 1 million bytes read plus 1 million bytes written in one second, for a total of 2 million bytes per second. The STREAM convention is most directly tied to the user code, but may not count all the data traffic that the hardware is actually required to perform.
  3. The hardware convention: counts the actual amount of data read or written by the hardware, whether the data motion was explicitly requested by the user code or not. Using the same 1 million byte copy example, the hardware bandwidth on computer systems with a write allocate cache policy would include an additional 1 million bytes of traffic because the hardware reads the target array from memory into cache before performing the stores. This gives a total of 3 million bytes per second actually transferred by the hardware. The hardware convention is most directly tied to the hardware, but may not represent the minimum amount of data traffic required to implement the user's code.
For example, some computer systems have the ability to avoid write allocate traffic using special instructions, leading to the possibility of misleading comparisons of bandwidth based on different amounts of data traffic performed.

Bandwidth computation and nomenclature

The nomenclature differs across memory technologies, but for commodity DDR SDRAM, DDR2 SDRAM, and DDR3 SDRAM memory, the total bandwidth is the product of:

For example, a computer with dual-channel memory and one DDR2-800 module per channel running at 400 MHz would have a theoretical maximum memory bandwidth of:

400,000,000 clocks per second × 2 lines per clock × 64 bits per line × 2 interfaces =
102,400,000,000 (102.4 billion) bits per second (in bytes, 12,800 MB/s or 12.8 GB/s)

This theoretical maximum memory bandwidth is referred to as the "burst rate," which may not be sustainable.

The naming convention for DDR, DDR2 and DDR3 modules specifies either a maximum speed (e.g., DDR2-800) or a maximum bandwidth (e.g., PC2-6400). The speed rating (800) is not the maximum clock speed, but twice that (because of the doubled data rate). The specified bandwidth (6400) is the maximum megabytes transferred per second using a 64-bit width. In a dual-channel mode configuration, this is effectively a 128-bit width. Thus, the memory configuration in the example can be simplified as: two DDR2-800 modules running in dual-channel mode.

Two memory interfaces per module is a common configuration for PC system memory, but single-channel configurations are common in older, low-end, or low-power devices. Some personal computers and most modern graphics cards use more than two memory interfaces (e.g., four for Intel's LGA 2011 platform and the NVIDIA GeForce GTX 980). High-performance graphics cards running many interfaces in parallel can attain very high total memory bus width (e.g., 384 bits in the NVIDIA GeForce GTX TITAN and 512 bits in the AMD Radeon R9 290X using six and eight 64-bit interfaces respectively).

ECC bits

In systems with error-correcting memory (ECC), the additional width of the interfaces (typically 72 rather than 64 bits) is not counted in bandwidth specifications because the extra bits are unavailable to store user data. ECC bits are better thought of as part of the memory hardware rather than as information stored in that hardware.

See also

Related Research Articles

<span class="mw-page-title-main">DDR SDRAM</span> Type of computer memory

Double Data Rate Synchronous Dynamic Random-Access Memory is a double data rate (DDR) synchronous dynamic random-access memory (SDRAM) class of memory integrated circuits used in computers. DDR SDRAM, also retroactively called DDR1 SDRAM, has been superseded by DDR2 SDRAM, DDR3 SDRAM, DDR4 SDRAM and DDR5 SDRAM. None of its successors are forward or backward compatible with DDR1 SDRAM, meaning DDR2, DDR3, DDR4 and DDR5 memory modules will not work on DDR1-equipped motherboards, and vice versa.

<span class="mw-page-title-main">Synchronous dynamic random-access memory</span> Type of computer memory

Synchronous dynamic random-access memory is any DRAM where the operation of its external pin interface is coordinated by an externally supplied clock signal.

<span class="mw-page-title-main">DIMM</span> Computer memory module

A DIMM, commonly called a RAM stick, comprises a series of dynamic random-access memory integrated circuits. These memory modules are mounted on a printed circuit board and designed for use in personal computers, workstations, printers, and servers. They are the predominant method for adding memory into a computer system. The vast majority of DIMMs are standardized through JEDEC standards, although there are proprietary DIMMs. DIMMs come in a variety of speeds and sizes, but generally are one of two lengths - PC which are 133.35 mm (5.25 in) and laptop (SO-DIMM) which are about half the size at 67.60 mm (2.66 in).

Rambus DRAM (RDRAM), and its successors Concurrent Rambus DRAM (CRDRAM) and Direct Rambus DRAM (DRDRAM), are types of synchronous dynamic random-access memory (SDRAM) developed by Rambus from the 1990s through to the early 2000s. The third-generation of Rambus DRAM, DRDRAM was replaced by XDR DRAM. Rambus DRAM was developed for high-bandwidth applications and was positioned by Rambus as replacement for various types of contemporary memories, such as SDRAM.

<span class="mw-page-title-main">DDR2 SDRAM</span> Second generation of double-data-rate synchronous dynamic random-access memory

Double Data Rate 2 Synchronous Dynamic Random-Access Memory is a double data rate (DDR) synchronous dynamic random-access memory (SDRAM) interface. It is a JEDEC standard (JESD79-2); first published in September 2003. DDR2 succeeded the original DDR SDRAM specification, and was itself succeeded by DDR3 SDRAM in 2007. DDR2 DIMMs are neither forward compatible with DDR3 nor backward compatible with DDR.

<span class="mw-page-title-main">Double data rate</span>

In computing, double data rate (DDR) describes a computer bus that transfers data on both the rising and falling edges of the clock signal. This is also known as double pumped, dual-pumped, and double transition. The term toggle mode is used in the context of NAND flash memory.

Column address strobe latency, also called CAS latency or CL, is the delay in clock cycles between the READ command and the moment data is available. In asynchronous DRAM, the interval is specified in nanoseconds. In synchronous DRAM, the interval is specified in clock cycles. Because the latency is dependent upon a number of clock ticks instead of absolute time, the actual time for an SDRAM module to respond to a CAS event might vary between uses of the same module if the clock rate differs.

XDR DRAM is a high-performance dynamic random-access memory interface. It is based on and succeeds RDRAM. Competing technologies include DDR2 and GDDR4.

The AMD Family 10h, or K10, is a microprocessor microarchitecture by AMD based on the K8 microarchitecture. The first third-generation Opteron products for servers were launched on September 10, 2007, with the Phenom processors for desktops following and launching on November 11, 2007 as the immediate successors to the K8 series of processors.

In computing, serial presence detect (SPD) is a standardized way to automatically access information about a memory module. Earlier 72-pin SIMMs included five pins that provided five bits of parallel presence detect (PPD) data, but the 168-pin DIMM standard changed to a serial presence detect to encode much more information.

Double Data Rate 3 Synchronous Dynamic Random-Access Memory is a type of synchronous dynamic random-access memory (SDRAM) with a high bandwidth interface, and has been in use since 2007. It is the higher-speed successor to DDR and DDR2 and predecessor to DDR4 synchronous dynamic random-access memory (SDRAM) chips. DDR3 SDRAM is neither forward nor backward compatible with any earlier type of random-access memory (RAM) because of different signaling voltages, timings, and other factors.

<span class="mw-page-title-main">Fully Buffered DIMM</span>

Fully Buffered DIMM is a memory technology that can be used to increase reliability and density of memory systems. Unlike the parallel bus architecture of traditional DRAMs, an FB-DIMM has a serial interface between the memory controller and the advanced memory buffer (AMB). Conventionally, data lines from the memory controller have to be connected to data lines in every DRAM module, i.e. via multidrop buses. As the memory width increases together with the access speed, the signal degrades at the interface between the bus and the device. This limits the speed and memory density, so FB-DIMMs take a different approach to solve the problem.

GDDR4 SDRAM, an abbreviation for Graphics Double Data Rate 4 Synchronous Dynamic Random-Access Memory, is a type of graphics card memory (SGRAM) specified by the JEDEC Semiconductor Memory Standard. It is a rival medium to Rambus's XDR DRAM. GDDR4 is based on DDR3 SDRAM technology and was intended to replace the DDR2-based GDDR3, but it ended up being replaced by GDDR5 within a year.

The memory controller is a digital circuit that manages the flow of data going to and from the computer's main memory. A memory controller can be a separate chip or integrated into another chip, such as being placed on the same die or as an integral part of a microprocessor; in the latter case, it is usually called an integrated memory controller (IMC). A memory controller is sometimes also called a memory chip controller (MCC) or a memory controller unit (MCU).

<span class="mw-page-title-main">Memory module</span>

In computing, a memory module or RAM stick is a printed circuit board on which memory integrated circuits are mounted. Memory modules permit easy installation and replacement in electronic systems, especially computers such as personal computers, workstations, and servers. The first memory modules were proprietary designs that were specific to a model of computer from a specific manufacturer. Later, memory modules were standardized by organizations such as JEDEC and could be used in any system designed to use them.

Double Data Rate 4 Synchronous Dynamic Random-Access Memory is a type of synchronous dynamic random-access memory with a high bandwidth interface.

<span class="mw-page-title-main">LPDDR</span> Computer hardware

Low-Power Double Data Rate (LPDDR), also known as LPDDR SDRAM, is a type of synchronous dynamic random-access memory that consumes less power and is targeted for mobile computers and devices such as mobile phones. Older variants are also known as Mobile DDR, and abbreviated as mDDR.

<span class="mw-page-title-main">GDDR3 SDRAM</span> Type of graphics card memory

GDDR3 SDRAM is a type of DDR SDRAM specialized for graphics processing units (GPUs) offering less access latency and greater device bandwidths. Its specification was developed by ATI Technologies in collaboration with DRAM vendors including Elpida Memory, Hynix Semiconductor, Infineon and Micron. It was later adopted as a JEDEC standard.

<span class="mw-page-title-main">DDR5 SDRAM</span> Fifth generation of double-data-rate synchronous dynamic random-access memory

Double Data Rate 5 Synchronous Dynamic Random-Access Memory is a type of synchronous dynamic random-access memory. Compared to its predecessor DDR4 SDRAM, DDR5 was planned to reduce power consumption, while doubling bandwidth. The standard, originally targeted for 2018, was released on July 14, 2020.

References

  1. STREAM Benchmark FAQ: Counting Bytes and FLOPS: http://www.cs.virginia.edu/stream/ref.html#counting

General