Content-addressable memory

Last updated
Content addressable memory Content-addressable-memory.png
Content addressable memory

Content-addressable memory (CAM) is a special type of computer memory used in certain very-high-speed searching applications. It is also known as associative memory or associative storage and compares input search data against a table of stored data, and returns the address of matching data. [1]

Contents

CAM is frequently used in networking devices where it speeds up forwarding information base and routing table operations. This kind of associative memory is also used in cache memory. In associative cache memory, both address and content is stored side by side. When the address matches, the corresponding content is fetched from cache memory.

History

Dudley Allen Buck invented the concept of content-addressable memory in 1955. Buck is credited with the idea of recognition unit. [2]

Hardware associative array

Unlike standard computer memory, random-access memory (RAM), in which the user supplies a memory address and the RAM returns the data word stored at that address, a CAM is designed such that the user supplies a data word and the CAM searches its entire memory to see if that data word is stored anywhere in it. If the data word is found, the CAM returns a list of one or more storage addresses where the word was found. Thus, a CAM is the hardware embodiment of what in software terms would be called an associative array.

A similar concept can be found in the data word recognition unit, as proposed by Dudley Allen Buck in 1955. [3]

Standards

A major interface definition for CAMs and other network search engines was specified in an interoperability agreement called the Look-Aside Interface (LA-1 and LA-1B) developed by the Network Processing Forum. [4] Numerous devices conforming to the interoperability agreement have been produced by Integrated Device Technology, Cypress Semiconductor, IBM, Broadcom and others. On December 11, 2007, the OIF published the serial look-aside (SLA) interface agreement.[ citation needed ]

Semiconductor implementations

CMOS binary CAM Cell consisting of a 6T SRAM cell plus 4 comparison transistors. When the data on the search lines (SL) differs from the data stored in the cell through the bit lines (BL), the match line (ML) will be pulled low to indicate a mismatch. If none of the cells on a match line indicate a mismatched bit, the match line will remain high at the precharge level to indicate a word match. Both search lines can be held at logic '0' as a don't care search condition. Search lines and bit lines can be merged into a single pair of data lines. Binary CAM cell schematic.jpg
CMOS binary CAM Cell consisting of a 6T SRAM cell plus 4 comparison transistors. When the data on the search lines (SL) differs from the data stored in the cell through the bit lines (BL), the match line (ML) will be pulled low to indicate a mismatch. If none of the cells on a match line indicate a mismatched bit, the match line will remain high at the precharge level to indicate a word match. Both search lines can be held at logic '0' as a don't care search condition. Search lines and bit lines can be merged into a single pair of data lines.

CAM is much faster than RAM in data search applications. There are cost disadvantages to CAM, however. Unlike a RAM chip, which has simple storage cells, each individual memory bit in a fully parallel CAM must have its own associated comparison circuit to detect a match between the stored bit and the input bit. Additionally, match outputs from each cell in the data word must be combined to yield a complete data word match signal. The additional circuitry increases the physical size and manufacturing cost of the CAM chip. The extra circuitry also increases power dissipation since every comparison circuit is active on every clock cycle. Consequently, CAM is used only in specialized applications where searching speed cannot be accomplished using a less costly method. One successful early implementation was a General Purpose Associative Processor IC and System. [5]

In the early 2000s several semiconductor companies including Cypress, IDT, Netlogic, Sibercore, [6] and MOSAID introduced CAM products targeting networking applications. These products were labelled Network Search Engines (NSE), Network Search Accelerators (NSA), and Knowledge-based Processors (KBP) but were essentially CAM with specialized interfaces and features optimized for networking. Currently Broadcom offers several families of KBPs. [7]

Alternative implementations

To achieve a different balance between speed, memory size and cost, some implementations emulate the function of CAM by using standard tree search or hashing designs in hardware, using hardware tricks like replication or pipelining to speed up effective performance. These designs are often used in routers.[ citation needed ] The Lulea algorithm is an efficient implementation for longest prefix match searches as required in internet routing tables.

Ternary CAMs

CMOS Ternary CAM cell consisting of two 6T SRAM cells plus 4 comparison transistors. Normally opposite logic levels, either '0' and '1' or '1' and '0' will be stored in the two cells. For a don't care condition '0' will be stored in both cells so that the match line ML will not be pulled low for any combination of search line (SL) data. Ternary CAM cell schematic.jpg
CMOS Ternary CAM cell consisting of two 6T SRAM cells plus 4 comparison transistors. Normally opposite logic levels, either '0' and '1' or '1' and '0' will be stored in the two cells. For a don't care condition '0' will be stored in both cells so that the match line ML will not be pulled low for any combination of search line (SL) data.

Binary CAM is the simplest type of CAM and uses data search words consisting entirely of 1s and 0s. Ternary CAM (TCAM) [8] allows a third matching state of X or don't care for one or more bits in the stored word, thus adding flexibility to the search. For example, a stored word of 10XX0 in a ternary CAM will match any of the four search words 10000, 10010, 10100, or 10110. The added search flexibility comes at an additional cost over binary CAM as the internal memory cell must now encode three possible states instead of the two for the binary CAM. This additional state is typically implemented by adding a mask bit (care or don't care bit) to every memory cell. In 2013, IBM fabricated a nonvolatile TCAM using 2-transistor/2-resistive-storage (2T-2R) cells. [9] A design of TCAM using hybrid Ferroelectric FeFET was recently published by a group of International scientists. [10]

Example applications

Content-addressable memory is often used in computer networking devices. For example, when a network switch receives a data frame from one of its ports, it updates an internal table with the frame's source MAC address and the port it was received on. It then looks up the destination MAC address in the table to determine what port the frame needs to be forwarded to, and sends it out on that port. The MAC address table is usually implemented with a binary CAM so the destination port can be found very quickly, reducing the switch's latency.

Ternary CAMs are often used in network routers, where each address has two parts: the network prefix, which can vary in size depending on the subnet configuration, and the host address, which occupies the remaining bits. Each subnet has a network mask that specifies which bits of the address are the network prefix and which bits are the host address. Routing is done by consulting a routing table maintained by the router which contains each known destination network prefix, the associated network mask, and the information needed to route packets to that destination. In a simple software implementation, the router compares the destination address of the packet to be routed with each entry in the routing table, performing a bitwise AND with the network mask and comparing it with the network prefix. If they are equal, the corresponding routing information is used to forward the packet. Using a ternary CAM for the routing table makes the lookup process very efficient. The addresses are stored using don't care for the host part of the address, so looking up the destination address in the CAM immediately retrieves the correct routing entry; both the masking and comparison are done by the CAM hardware. This works if (a) the entries are stored in order of decreasing network mask length, and (b) the hardware returns only the first matching entry; thus, the match with the longest network mask (longest prefix match) is used. [11]

Other CAM applications include:

See also

Related Research Articles

The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represented as either "1" or "0", but other representations such as true/false, yes/no, on/off, or +/ are also widely used.

<span class="mw-page-title-main">Central processing unit</span> Central computer component which executes instructions

A central processing unit (CPU), also called a central processor, main processor, or just processor, is the most important processor in a given computer. Its electronic circuitry executes instructions of a computer program, such as arithmetic, logic, controlling, and input/output (I/O) operations. This role contrasts with that of external components, such as main memory and I/O circuitry, and specialized coprocessors such as graphics processing units (GPUs).

<span class="mw-page-title-main">Cache (computing)</span> Additional storage that enables faster access to main storage

In computing, a cache is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewhere. A cache hit occurs when the requested data can be found in a cache, while a cache miss occurs when it cannot. Cache hits are served by reading data from the cache, which is faster than recomputing a result or reading from a slower data store; thus, the more requests that can be served from the cache, the faster the system performs.

<span class="mw-page-title-main">Trie</span> K-ary search tree data structure

In computer science, a trie, also called digital tree or prefix tree, is a type of k-ary search tree, a tree data structure used for locating specific keys from within a set. These keys are most often strings, with links between nodes defined not by the entire key, but by individual characters. In order to access a key, the trie is traversed depth-first, following the links between nodes, which represent each character in the key.

<span class="mw-page-title-main">Harvard architecture</span> Computer architecture where code and data each have a separate bus

The Harvard architecture is a computer architecture with separate storage and signal pathways for instructions and data. It is often contrasted with the von Neumann architecture, where program instructions and data share the same memory and pathways.

<span class="mw-page-title-main">Static random-access memory</span> Type of computer memory

Static random-access memory is a type of random-access memory (RAM) that uses latching circuitry (flip-flop) to store each bit. SRAM is volatile memory; data is lost when power is removed.

<span class="mw-page-title-main">Memory management unit</span> Hardware translating virtual addresses to physical address

A memory management unit (MMU), sometimes called paged memory management unit (PMMU), is a computer hardware unit that examines all memory references on the memory bus, translating these requests, known as virtual memory addresses, into physical addresses in main memory.

A translation lookaside buffer (TLB) is a memory cache that stores the recent translations of virtual memory to physical memory. It is used to reduce the time taken to access a user memory location. It can be called an address-translation cache. It is a part of the chip's memory-management unit (MMU). A TLB may reside between the CPU and the CPU cache, between CPU cache and the main memory or between the different levels of the multi-level cache. The majority of desktop, laptop, and server processors include one or more TLBs in the memory-management hardware, and it is nearly always present in any processor that utilizes paged or segmented virtual memory.

A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, which stores copies of the data from frequently used main memory locations. Most CPUs have a hierarchy of multiple cache levels, with different instruction-specific and data-specific caches at level 1. The cache memory is typically implemented with static random-access memory (SRAM), in modern CPUs by far the largest part of them by chip area, but SRAM is not always used for all levels, or even any level, sometimes some latter or all levels are implemented with eDRAM.

Semiconductor memory is a digital electronic semiconductor device used for digital data storage, such as computer memory. It typically refers to devices in which data is stored within metal–oxide–semiconductor (MOS) memory cells on a silicon integrated circuit memory chip. There are numerous different types using different semiconductor technologies. The two main types of random-access memory (RAM) are static RAM (SRAM), which uses several transistors per memory cell, and dynamic RAM (DRAM), which uses a transistor and a MOS capacitor per cell. Non-volatile memory uses floating-gate memory cells, which consist of a single floating-gate transistor per cell.

In computing, a word is the natural unit of data used by a particular processor design. A word is a fixed-sized datum handled as a unit by the instruction set or the hardware of the processor. The number of bits or digits in a word is an important characteristic of any specific processor design or computer architecture.

<span class="mw-page-title-main">Hardware acceleration</span> Specialized computer hardware

Hardware acceleration is the use of computer hardware designed to perform specific functions more efficiently when compared to software running on a general-purpose central processing unit (CPU). Any transformation of data that can be calculated in software running on a generic CPU can also be calculated in custom-made hardware, or in some mix of both.

<span class="mw-page-title-main">ECC memory</span> Self-correcting computer data storage

Error correction code memory is a type of computer data storage that uses an error correction code (ECC) to detect and correct n-bit data corruption which occurs in memory.

A forwarding information base (FIB), also known as a forwarding table or MAC table, is most commonly used in network bridging, routing, and similar functions to find the proper output network interface controller to which the input interface should forward a packet. It is a dynamic table that maps MAC addresses to ports. It is the essential mechanism that separates network switches from Ethernet hubs. Content-addressable memory (CAM) is typically used to efficiently implement the FIB, thus it is sometimes called a CAM table.

<span class="mw-page-title-main">Forwarding plane</span>

In routing, the forwarding plane, sometimes called the data plane or user plane, defines the part of the router architecture that decides what to do with packets arriving on an inbound interface. Most commonly, it refers to a table in which the router looks up the destination address of the incoming packet and retrieves the information necessary to determine the path from the receiving element, through the internal forwarding fabric of the router, and to the proper outgoing interface(s).

The Luleå algorithm of computer science, designed by Degermark et al. (1997), is a technique for storing and searching internet routing tables efficiently. It is named after the Luleå University of Technology, the home institute/university of the technique's authors. The name of the algorithm does not appear in the original paper describing it, but was used in a message from Craig Partridge to the Internet Engineering Task Force describing that paper prior to its publication.

<span class="mw-page-title-main">DEC V-11</span>

The V-11, code-named "Scorpio", is a miniprocessor chip set implementation of the VAX instruction set architecture (ISA) developed and fabricated by Digital Equipment Corporation (DEC).

<span class="mw-page-title-main">Read-only memory</span> Electronic memory that cannot be changed

Read-only memory (ROM) is a type of non-volatile memory used in computers and other electronic devices. Data stored in ROM cannot be electronically modified after the manufacture of the memory device. Read-only memory is useful for storing software that is rarely changed during the life of the system, also known as firmware. Software applications for programmable devices can be distributed as plug-in cartridges containing ROM.

<span class="mw-page-title-main">Alpha 21264</span> RISC microprocessor

The Alpha 21264 is a Digital Equipment Corporation RISC microprocessor launched on 19 October 1998. The 21264 implemented the Alpha instruction set architecture (ISA).

This glossary of computer hardware terms is a list of definitions of terms and concepts related to computer hardware, i.e. the physical and structural components of computers, architectural issues, and peripheral devices.

References

  1. "K. Pagiamtzis* and A. Sheikholeslami, Content-addressable memory (CAM) circuits and architectures: A tutorial and survey, IEEE Journal of Solid-State Circuits, pp. 712-727, March 2006" (PDF). Archived (PDF) from the original on 2007-03-15.
  2. TRW Computer Division. (1963). First interim report on optimum utilization of computers and computing techniques in shipboard weapons control systems. (BuWeps-Project RM1004 M88-3U1). Alexandria, Virginia:Defence Documentation Center for Scientific and Technical Information.
  3. TRW Computer Division Archived August 5, 2011, at the Wayback Machine , 1963, p. 17.
  4. Look-Aside (LA-1B) Interface Implementation Agreement (PDF), 2004-08-04
  5. Stormon, C. D.; Troullinos, N. B.; Saleh, E. M.; Chavan, A. V.; Brule, M. R.; Oldfield, J. V. (December 1992). "A general-purpose CMOS associative processor IC and system". IEEE Micro. 12 (6): 68–78. doi:10.1109/40.180249. S2CID   206432751.
  6. "Sibercore Technologies - Silicon Solutions for Cyberspace". Archived from the original on 2003-04-19.
  7. "16nm Heterogeneous Knowledge-Based Processors (KBPs)". Archived from the original on 2017-05-19.
  8. Hucaby, David (2004). CCNP BCMSN Exam Certification Guide: CCNP Self-study. Cisco Press. ISBN   9781587200779.
  9. Jing Li, R. Montoye, M. Ishii, K. Stawiasz, T. Nishida, K. Maloney, G. Ditlow, S. Lewis, T. Maffitt, R. Jordan, Leland Chang, P. Song, "1Mb 0.41 µm2 2T-2R cell nonvolatile TCAM with two-bit encoding and clocked self-referenced sensing", IEEE Symposium on VLSI Technology, 2013.
  10. Xunzhao Yin, Yu Qian, M. Imani, K. Ni, Chao Li, Grace Li Zhang, Bing Li, Ulf Schlichtmann, Cheng Zhuo, "Ferroelectric Ternary Content Addressable Memories for Energy-Efficient Associative Search", IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, April 2023.
  11. Varghese, George, Network Algorithmics: An Interdisciplinary Approach to Designing Fast Networked Devices, Morgan Kaufmann, 2005
  12. Smith, Alan Jay (September 1982). "Cache Memories" (PDF). Computing Surveys. 14 (3): 473–530. doi:10.1145/356887.356892. S2CID   6023466. Archived from the original (PDF) on 2022-04-03. Retrieved April 3, 2022. The TLB is a small associative memory which maps virtual to real addresses.
  13. Hinton, Geoffrey E. (1984). "Distributed representations". Archived from the original on 2016-05-02. Retrieved 2017-12-14.

Bibliography