Memory bandwidth

Last updated August 05, 2024

Memory bandwidth is the rate at which data can be read from or stored into a semiconductor memory by a processor. Memory bandwidth is usually expressed in units of bytes/second, though this can vary for systems with natural data sizes that are not a multiple of the commonly used 8-bit bytes.

Memory bandwidth that is advertised for a given memory or system is usually the maximum theoretical bandwidth. In practice the observed memory bandwidth will be less than (and is guaranteed not to exceed) the advertised bandwidth. A variety of computer benchmarks exist to measure sustained memory bandwidth using a variety of access patterns. These are intended to provide insight into the memory bandwidth that a system should sustain on various classes of real applications.

Measurement conventions

There are three different conventions for defining the quantity of data transferred in the numerator of "bytes/second":

The bcopy convention: counts the amount of data copied from one location in memory to another location per unit time. For example, copying 1 million bytes from one location in memory to another location in memory in one second would be counted as 1 million bytes per second. The bcopy convention is self-consistent, but is not easily extended to cover cases with more complex access patterns, for example three reads and one write.
The Stream convention: sums the amount of data that the application code explicitly reads plus the amount of data that the application code explicitly writes.^[1] Using the previous 1 million byte copy example, the STREAM bandwidth would be counted as 1 million bytes read plus 1 million bytes written in one second, for a total of 2 million bytes per second. The STREAM convention is most directly tied to the user code, but may not count all the data traffic that the hardware is actually required to perform.
The hardware convention: counts the actual amount of data read or written by the hardware, whether the data motion was explicitly requested by the user code or not. Using the same 1 million byte copy example, the hardware bandwidth on computer systems with a write allocate cache policy would include an additional 1 million bytes of traffic because the hardware reads the target array from memory into cache before performing the stores. This gives a total of 3 million bytes per second actually transferred by the hardware. The hardware convention is most directly tied to the hardware, but may not represent the minimum amount of data traffic required to implement the user's code.

For example, some computer systems have the ability to avoid write allocate traffic using special instructions, leading to the possibility of misleading comparisons of bandwidth based on different amounts of data traffic performed.

Bandwidth computation and nomenclature

The nomenclature differs across memory technologies, but for commodity DDR SDRAM, DDR2 SDRAM, and DDR3 SDRAM memory, the total bandwidth is the product of:

Base DRAM clock frequency
Number of data transfers per clock: Two, in the case of "double data rate" (DDR, DDR2, DDR3, DDR4) memory.
Memory bus (interface) width: Each DDR, DDR2, or DDR3 memory interface is 64 bits wide. Those 64 bits are sometimes referred to as a "line."
Number of interfaces: Modern personal computers typically use two memory interfaces (dual-channel mode) for an effective 128-bit bus width.

For example, a computer with dual-channel memory and one DDR2-800 module per channel running at 400 MHz would have a theoretical maximum memory bandwidth of:

400,000,000 clocks per second × 2 lines per clock × 64 bits per line × 2 interfaces =

102,400,000,000 (102.4 billion) bits per second (in bytes, 12,800 MB/s or 12.8 GB/s)

This theoretical maximum memory bandwidth is referred to as the "burst rate," which may not be sustainable.

The naming convention for DDR, DDR2 and DDR3 modules specifies either a maximum speed (e.g., DDR2-800) or a maximum bandwidth (e.g., PC2-6400). The speed rating (800) is not the maximum clock speed, but twice that (because of the doubled data rate). The specified bandwidth (6400) is the maximum megabytes transferred per second using a 64-bit width. In a dual-channel mode configuration, this is effectively a 128-bit width. Thus, the memory configuration in the example can be simplified as: two DDR2-800 modules running in dual-channel mode.

Two memory interfaces per module is a common configuration for PC system memory, but single-channel configurations are common in older, low-end, or low-power devices. Some personal computers and most modern graphics cards use more than two memory interfaces (e.g., four for Intel's LGA 2011 platform and the NVIDIA GeForce GTX 980). High-performance graphics cards running many interfaces in parallel can attain very high total memory bus width (e.g., 384 bits in the NVIDIA GeForce GTX TITAN and 512 bits in the AMD Radeon R9 290X using six and eight 64-bit interfaces respectively).

ECC bits

In systems with error-correcting memory (ECC), the additional width of the interfaces (typically 72 rather than 64 bits) is not counted in bandwidth specifications because the extra bits are unavailable to store user data. ECC bits are better thought of as part of the memory hardware rather than as information stored in that hardware.

Related Research Articles

Double Data Rate Synchronous Dynamic Random-Access Memory is a double data rate (DDR) synchronous dynamic random-access memory (SDRAM) class of memory integrated circuits used in computers. DDR SDRAM, also retroactively called DDR1 SDRAM, has been superseded by DDR2 SDRAM, DDR3 SDRAM, DDR4 SDRAM and DDR5 SDRAM. None of its successors are forward or backward compatible with DDR1 SDRAM, meaning DDR2, DDR3, DDR4 and DDR5 memory modules will not work on DDR1-equipped motherboards, and vice versa.

<span class="mw-page-title-main">Dynamic random-access memory</span> Type of computer memory

Dynamic random-access memory is a type of random-access semiconductor memory that stores each bit of data in a memory cell, usually consisting of a tiny capacitor and a transistor, both typically based on metal–oxide–semiconductor (MOS) technology. While most DRAM memory cell designs use a capacitor and transistor, some only use two transistors. In the designs where a capacitor is used, the capacitor can either be charged or discharged; these two states are taken to represent the two values of a bit, conventionally called 0 and 1. The electric charge on the capacitors gradually leaks away; without intervention the data on the capacitor would soon be lost. To prevent this, DRAM requires an external memory refresh circuit which periodically rewrites the data in the capacitors, restoring them to their original charge. This refresh process is the defining characteristic of dynamic random-access memory, in contrast to static random-access memory (SRAM) which does not require data to be refreshed. Unlike flash memory, DRAM is volatile memory, since it loses its data quickly when power is removed. However, DRAM does exhibit limited data remanence.

Synchronous dynamic random-access memory is any DRAM where the operation of its external pin interface is coordinated by an externally supplied clock signal.

A DIMM, or Dual In-Line Memory Module, is a popular type of memory module used in computers. It is a printed circuit board with one or both sides holding DRAM chips and pins. The vast majority of DIMMs are standardized through JEDEC standards, although there are proprietary DIMMs. DIMMs come in a variety of speeds and sizes, but generally are one of two lengths - PC which are 133.35 mm (5.25 in) and laptop (SO-DIMM) which are about half the size at 67.60 mm (2.66 in).

Rambus DRAM (RDRAM), and its successors Concurrent Rambus DRAM (CRDRAM) and Direct Rambus DRAM (DRDRAM), are types of synchronous dynamic random-access memory (SDRAM) developed by Rambus from the 1990s through to the early 2000s. The third-generation of Rambus DRAM, DRDRAM was replaced by XDR DRAM. Rambus DRAM was developed for high-bandwidth applications and was positioned by Rambus as replacement for various types of contemporary memories, such as SDRAM.

Double Data Rate 2 Synchronous Dynamic Random-Access Memory is a double data rate (DDR) synchronous dynamic random-access memory (SDRAM) interface. It is a JEDEC standard (JESD79-2); first published in September 2003. DDR2 succeeded the original DDR SDRAM specification, and was itself succeeded by DDR3 SDRAM in 2007. DDR2 DIMMs are neither forward compatible with DDR3 nor backward compatible with DDR.

In computing, double data rate (DDR) describes a computer bus that transfers data on both the rising and falling edges of the clock signal and hence doubles the memory bandwidth by transferring data twice per clock cycle. This is also known as double pumped, dual-pumped, and double transition. The term toggle mode is used in the context of NAND flash memory.

Column address strobe latency, also called CAS latency or CL, is the delay in clock cycles between the READ command and the moment data is available. In asynchronous DRAM, the interval is specified in nanoseconds. In synchronous DRAM, the interval is specified in clock cycles. Because the latency is dependent upon a number of clock ticks instead of absolute time, the actual time for an SDRAM module to respond to a CAS event might vary between uses of the same module if the clock rate differs.

XDR DRAM is a high-performance dynamic random-access memory interface. It is based on and succeeds RDRAM. Competing technologies include DDR2 and GDDR4.

The AMD Family 10h, or K10, is a microprocessor microarchitecture by AMD based on the K8 microarchitecture. The first third-generation Opteron products for servers were launched on September 10, 2007, with the Phenom processors for desktops following and launching on November 11, 2007 as the immediate successors to the K8 series of processors.

In computing, serial presence detect (SPD) is a standardized way to automatically access information about a memory module. Earlier 72-pin SIMMs included five pins that provided five bits of parallel presence detect (PPD) data, but the 168-pin DIMM standard changed to a serial presence detect to encode more information.

Double Data Rate 3 Synchronous Dynamic Random-Access Memory is a type of synchronous dynamic random-access memory (SDRAM) with a high bandwidth interface, and has been in use since 2007. It is the higher-speed successor to DDR and DDR2 and predecessor to DDR4 synchronous dynamic random-access memory (SDRAM) chips. DDR3 SDRAM is neither forward nor backward compatible with any earlier type of random-access memory (RAM) because of different signaling voltages, timings, and other factors.

<span class="mw-page-title-main">Fully Buffered DIMM</span>

A Fully Buffered DIMM (FB-DIMM) is a type of memory module used in computer systems. It is designed to improve memory performance and capacity by allowing multiple memory modules to be each connected to the memory controller using a serial interface, rather than a parallel one. Unlike the parallel bus architecture of traditional DRAMs, an FB-DIMM has a serial interface between the memory controller and the advanced memory buffer (AMB). Conventionally, data lines from the memory controller have to be connected to data lines in every DRAM module, i.e. via multidrop buses. As the memory width increases together with the access speed, the signal degrades at the interface between the bus and the device. This limits the speed and memory density, so FB-DIMMs take a different approach to solve the problem.

GDDR4 SDRAM, an abbreviation for Graphics Double Data Rate 4 Synchronous Dynamic Random-Access Memory, is a type of graphics card memory (SGRAM) specified by the JEDEC Semiconductor Memory Standard. It is a rival medium to Rambus's XDR DRAM. GDDR4 is based on DDR3 SDRAM technology and was intended to replace the DDR2-based GDDR3, but it ended up being replaced by GDDR5 within a year.

A memory controller, also known as memory chip controller (MCC) or a memory controller unit (MCU), is a digital circuit that manages the flow of data going to and from a computer's main memory. When a memory controller is integrated into another chip, such as an integral part of a microprocessor, it is usually called an integrated memory controller (IMC).

Double Data Rate 4 Synchronous Dynamic Random-Access Memory is a type of synchronous dynamic random-access memory with a high bandwidth interface.

Low-Power Double Data Rate (LPDDR), also known as LPDDR SDRAM, is a type of synchronous dynamic random-access memory that consumes less power and is targeted for mobile computers and devices such as mobile phones. Older variants are also known as Mobile DDR, and abbreviated as mDDR.

GDDR3 SDRAM is a type of DDR SDRAM specialized for graphics processing units (GPUs) offering less access latency and greater device bandwidths. Its specification was developed by ATI Technologies in collaboration with DRAM vendors including Elpida Memory, Hynix Semiconductor, Infineon and Micron. It was later adopted as a JEDEC standard.

Double Data Rate 5 Synchronous Dynamic Random-Access Memory is a type of synchronous dynamic random-access memory. Compared to its predecessor DDR4 SDRAM, DDR5 was planned to reduce power consumption, while doubling bandwidth. The standard, originally targeted for 2018, was released on July 14, 2020.

References

↑ STREAM Benchmark FAQ: Counting Bytes and FLOPS: http://www.cs.virginia.edu/stream/ref.html#counting

General

BSS Random Access Benchmark Performance Evaluation and Optimization of Random Memory Access on Multicores with High Productivity at ACM/IEEE HiPC 2010

External links

STREAM Benchmark

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] STREAM Benchmark FAQ: Counting Bytes and FLOPS: http://www.cs.virginia.edu/stream/ref.html#counting

[1]