Interleaved memory

Last updated May 15, 2023

In computing, interleaved memory is a design which compensates for the relatively slow speed of dynamic random-access memory (DRAM) or core memory, by spreading memory addresses evenly across memory banks. That way, contiguous memory reads and writes use each memory bank in turn, resulting in higher memory throughput due to reduced waiting for memory banks to become ready for the operations.

It is different from multi-channel memory architectures, primarily as interleaved memory does not add more channels between the main memory and the memory controller. However, channel interleaving is also possible, for example in freescale i.MX6 processors, which allow interleaving to be done between two channels.^{[ citation needed ]}

Overview

With interleaved memory, memory addresses are allocated to each memory bank in turn. For example, in an interleaved system with two memory banks (assuming word-addressable memory), if logical address 32 belongs to bank 0, then logical address 33 would belong to bank 1, logical address 34 would belong to bank 0, and so on. An interleaved memory is said to be n-way interleaved when there are $n$ banks and memory location $i$ resides in bank $i mod n$ .

Memory interleaving example with 4 banks. Red banks are refreshing and can't be used. Interleaving.gif — Memory interleaving example with 4 banks. Red banks are refreshing and can't be used.

Interleaved memory results in contiguous reads (which are common both in multimedia and execution of programs) and contiguous writes (which are used frequently when filling storage or communication buffers) actually using each memory bank in turn, instead of using the same one repeatedly. This results in significantly higher memory throughput as each bank has a minimum waiting time between reads and writes.

Interleaved DRAM

Main memory (random-access memory, RAM) is usually composed of a collection of DRAM memory chips, where a number of chips can be grouped together to form a memory bank. It is then possible, with a memory controller that supports interleaving, to lay out these memory banks so that the memory banks will be interleaved.

Data in DRAM is stored in units of pages. Each DRAM bank has a row buffer that serves as a cache for accessing any page in the bank. Before a page in the DRAM bank is read, it is first loaded into the row-buffer. If the page is immediately read from the row-buffer (or a row-buffer hit), it has the shortest memory access latency in one memory cycle. If it is a row buffer miss, which is also called a row-buffer conflict, it is slower because the new page has to be loaded into the row-buffer before it is read. Row-buffer misses happen as access requests on different memory pages in the same bank are serviced. A row-buffer conflict incurs a substantial delay for a memory access. In contrast, memory accesses to different banks can proceed in parallel with a high throughput.

The issue of row-buffer conflicts has been well studied with an effective solution.^[1] The size of a row-buffer is normally the size of a memory page managed by the operating system. Row-buffer conflicts or misses come from a sequence of accesses to difference pages in the same memory bank. The study^[1] shows that a conventional memory interleaving method would propagate address-mapping conflicts at a cache level to the memory address space, causing row-buffer misses in a memory bank. The permutation-based interleaved memory method solved the problem with a trivial microarchitecture cost.^[1] Sun Microsystems adopted this the permutation interleaving method quickly in their products.^[2] This patent-free method can be found in many commercial microprocessors, such as AMD, Intel and NVIDIA, for embedded systems, laptops, desktops, and enterprise servers.^[3]

In traditional (flat) layouts, memory banks can be allocated a contiguous block of memory addresses, which is very simple for the memory controller and gives equal performance in completely random access scenarios, when compared to performance levels achieved through interleaving. However, in reality memory reads are rarely random due to locality of reference, and optimizing for close together access gives far better performance in interleaved layouts.

The way memory is addressed has no effect on the access time for memory locations which are already cached, having an impact only on memory locations which need to be retrieved from DRAM.

History

Early research into interleaved memory was performed at IBM in the 60s and 70s in relation to the IBM 7030 Stretch computer,^[4] but development went on for decades improving design, flexibility and performance to produce modern implementations.

Related Research Articles

<span class="mw-page-title-main">Cache (computing)</span> Additional storage that enables faster access to main storage

In computing, a cache is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewhere. A cache hit occurs when the requested data can be found in a cache, while a cache miss occurs when it cannot. Cache hits are served by reading data from the cache, which is faster than recomputing a result or reading from a slower data store; thus, the more requests that can be served from the cache, the faster the system performs.

Direct memory access (DMA) is a feature of computer systems that allows certain hardware subsystems to access main system memory independently of the central processing unit (CPU).

<span class="mw-page-title-main">Dynamic random-access memory</span> Type of computer memory

Dynamic random-access memory is a type of random-access semiconductor memory that stores each bit of data in a memory cell, usually consisting of a tiny capacitor and a transistor, both typically based on metal–oxide–semiconductor (MOS) technology. While most DRAM memory cell designs use a capacitor and transistor, some only use two transistors. In the designs where a capacitor is used, the capacitor can either be charged or discharged; these two states are taken to represent the two values of a bit, conventionally called 0 and 1. The electric charge on the capacitors gradually leaks away; without intervention the data on the capacitor would soon be lost. To prevent this, DRAM requires an external memory refresh circuit which periodically rewrites the data in the capacitors, restoring them to their original charge. This refresh process is the defining characteristic of dynamic random-access memory, in contrast to static random-access memory (SRAM) which does not require data to be refreshed. Unlike flash memory, DRAM is volatile memory, since it loses its data quickly when power is removed. However, DRAM does exhibit limited data remanence.

Synchronous dynamic random-access memory is any DRAM where the operation of its external pin interface is coordinated by an externally supplied clock signal.

In computer data storage, data striping is the technique of segmenting logically sequential data, such as a file, so that consecutive segments are stored on different physical storage devices.

In computer science, thrashing occurs when a computer's virtual memory resources are overused, leading to a constant state of paging and page faults, inhibiting most application-level processing. This causes the performance of the computer to degrade or collapse. The situation can continue indefinitely until either the user closes some running applications or the active processes free up additional virtual memory resources.

A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, which stores copies of the data from frequently used main memory locations. Most CPUs have a hierarchy of multiple cache levels, with different instruction-specific and data-specific caches at level 1. The cache memory is typically implemented with static random-access memory (SRAM), in modern CPUs by far the largest part of them by chip area, but SRAM is not always used for all levels, or even any level, sometimes some latter or all levels are implemented with eDRAM.

Semiconductor memory is a digital electronic semiconductor device used for digital data storage, such as computer memory. It typically refers to devices in which data is stored within metal–oxide–semiconductor (MOS) memory cells on a silicon integrated circuit memory chip. There are numerous different types using different semiconductor technologies. The two main types of random-access memory (RAM) are static RAM (SRAM), which uses several transistors per memory cell, and dynamic RAM (DRAM), which uses a transistor and a MOS capacitor per cell. Non-volatile memory uses floating-gate memory cells, which consist of a single floating-gate transistor per cell.

The Tseng Labs ET4000 was a line of SVGA graphics controller chips during the early 1990s, commonly found in many 386/486 and compatible systems, with some models, notably the ET4000/W32 and later chips, offering graphics acceleration. Offering above average host interface throughput coupled with a moderate price, Tseng Labs' ET4000 chipset family were well regarded for their performance, and were integrated into many companies' lineups, notably with Hercules' Dynamite series, the Diamond Stealth 32 and several Speedstar cards, and on many generic boards.

XDR DRAM is a high-performance dynamic random-access memory interface. It is based on and succeeds RDRAM. Competing technologies include DDR2 and GDDR4.

Dual-ported video RAM, or VRAM, is a dual-ported variant of dynamic RAM (DRAM), which was once commonly used to store the framebuffer in graphics adapters. Note that most computers and game consoles do not use this form of memory, and dual-ported VRAM should not be confused with other forms of video memory.

Memory refresh is the process of periodically reading information from an area of computer memory and immediately rewriting the read information to the same area without modification, for the purpose of preserving the information. Memory refresh is a background maintenance process required during the operation of semiconductor dynamic random-access memory (DRAM), the most widely used type of computer memory, and in fact is the defining characteristic of this class of memory.

Error correction code memory is a type of computer data storage that uses an error correction code (ECC) to detect and correct n-bit data corruption which occurs in memory. ECC memory is used in most computers where data corruption cannot be tolerated, like industrial control applications, critical databases, and infrastructural memory caches.

The memory controller is a digital circuit that manages the flow of data going to and from the computer's main memory. A memory controller can be a separate chip or integrated into another chip, such as being placed on the same die or as an integral part of a microprocessor; in the latter case, it is usually called an integrated memory controller (IMC). A memory controller is sometimes also called a memory chip controller (MCC) or a memory controller unit (MCU).

A memory bank is a logical unit of storage in electronics, which is hardware-dependent. In a computer, the memory bank may be determined by the memory controller along with physical organization of the hardware memory slots. In a typical synchronous dynamic random-access memory (SDRAM) or double data rate SDRAM, a bank consists of multiple rows and columns of storage units, and is usually spread out across several chips. In a single read or write operation, only one bank is accessed, therefore the number of bits in a column or a row, per bank and per chip, equals the memory bus width in bits. The size of a bank is further determined by the number of bits in a column and a row, per chip, multiplied by the number of chips in a bank.

Random-access memory is a form of computer memory that can be read and changed in any order, typically used to store working data and machine code. A random-access memory device allows data items to be read or written in almost the same amount of time irrespective of the physical location of data inside the memory, in contrast with other direct-access data storage media, where the time required to read and write data items varies significantly depending on their physical locations on the recording medium, due to mechanical limitations such as media rotation speeds and arm movement.

The PA-8000 (PCX-U), code-named Onyx, is a microprocessor developed and fabricated by Hewlett-Packard (HP) that implemented the PA-RISC 2.0 instruction set architecture (ISA). It was a completely new design with no circuitry derived from previous PA-RISC microprocessors. The PA-8000 was introduced on 2 November 1995 when shipments began to members of the Precision RISC Organization (PRO). It was used exclusively by PRO members and was not sold on the merchant market. All follow-on PA-8x00 processors are based on the basic PA-8000 processor core.

A memory rank is a set of DRAM chips connected to the same chip select, which are therefore accessed simultaneously. In practice all DRAM chips share all of the other command and control signals, and only the chip select pins for each rank are separate.

Apollo VP3 is a x86 based Socket 7 chipset which was manufactured by VIA Technologies and was launched in 1997. On its time Apollo VP3 was a high performance, cost effective, and energy efficient chipset. It offered AGP support for Socket 7 processors which was not supported at that moment by Intel, SiS and ALi chipsets. In November 1997 FIC released motherboard PA-2012, which uses Apollo VP3 and has AGP bus. This was the first Socket 7 motherboard supporting AGP.

Row hammer is a security exploit that takes advantage of an unintended and undesirable side effect in dynamic random-access memory (DRAM) in which memory cells interact electrically between themselves by leaking their charges, possibly changing the contents of nearby memory rows that were not addressed in the original memory access. This circumvention of the isolation between DRAM memory cells results from the high cell density in modern DRAM, and can be triggered by specially crafted memory access patterns that rapidly activate the same memory rows numerous times.

References

1 2 3 Zhao Zhang, Zhichun Zhu, and Xiaodong Zhang (2000). A Permutation-based Page Interleaving Scheme to Reduce Row-buffer Conflicts and Exploit Data Locality. MICRO' 33.{{cite conference}}: CS1 maint: multiple names: authors list (link)
↑ "Sun letter to the Director of the Technology Transfer Office of the College of William and Mary" (PDF). July 15, 2005.
↑ "Professor Xiaodong Zhang Receives 2020 ACM Microarchitecture Test of Time Award". Department of Computer Science and Engineering, College of Engineering, Ohio State University . January 19, 2021.
↑ Mark Smotherman (July 2010). "IBM Stretch (7030) — Aggressive Uniprocessor Parallelism". clemson.edu. Retrieved 2013-12-07.

External links

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Interleaving-1] 1 2 3 Zhao Zhang, Zhichun Zhu, and Xiaodong Zhang (2000). A Permutation-based Page Interleaving Scheme to Reduce Row-buffer Conflicts and Exploit Data Locality. MICRO' 33.{{cite conference}}: CS1 maint: multiple names: authors list (link)

[2] "Sun letter to the Director of the Technology Transfer Office of the College of William and Mary" (PDF). July 15, 2005.

[3] "Professor Xiaodong Zhang Receives 2020 ACM Microarchitecture Test of Time Award". Department of Computer Science and Engineering, College of Engineering, Ohio State University . January 19, 2021.

[4] Mark Smotherman (July 2010). "IBM Stretch (7030) — Aggressive Uniprocessor Parallelism". clemson.edu. Retrieved 2013-12-07.

[1]

[2]

[3]

[4]