Hash consing

Last updated

In computer science, particularly in functional programming, hash consing is a technique used to share values that are structurally equal. When a value is constructed, such as a cons cell, the technique checks if such a value has been constructed before, and if so reuses the previous value, avoiding a new memory allocation. A useful property of hash consing is that two structures can be tested for equality in constant time via pointer equality, which in turn can improve efficiency of divide and conquer algorithms when data sets contain overlapping blocks. [1] Hash consing has been shown to give dramatic performance improvementsboth space and timefor symbolic and dynamic programming algorithms.[ citation needed ]

Contents

Hash consing is most commonly implemented with hash tables storing weak references that may be garbage-collected when the data stored therein contains no references from outside the table. [2] [3]

Example

Simple, not very efficient, but suitable for demonstration of the concept implementation of a memoizer by means of hash table and weak references in Scheme:

;; weak hashes;;(require'hash-table)(define(make-weak-table.args)(applymake-hash-tableargs))(define(weak-table-set!tablekeydata)(let((w(hash-table-reftablekey#f)))(ifw(vector-set!w0data)(let((w(make-weak-vector1)))(vector-set!w0data)(hash-table-set!tablekeyw)))))(define(weak-table-reftablekey)(let((w(hash-table-reftablekey#f)))(ifw(vector-refw0)#f)));; memoizer factory: for given (side-effect-free) procedure,;; return a procedure which does the same memoizing some of results;; in the sense of equal? on the whole list of args;;(define(make-weak-memoizerproc)(let((cache(make-weak-tableequal?)))(lambdaargs(let((x(weak-table-refcacheargs)))(if(bwp-object?x)(let((r(applyprocargs)))(weak-table-set!cacheargsr)r)x)))))

History

A hash consing compilation technique was presented by A.P. Ershov in 1958. [4] [5] The term "hash consing" originates from implementations in the context of Lisp in the 1970s. [6] [7]

See also

Related Research Articles

<span class="mw-page-title-main">Data structure</span> Particular way of storing and organizing data in a computer

In computer science, a data structure is a data organization, management, and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, and the functions or operations that can be applied to the data, i.e., it is an algebraic structure about data.

<span class="mw-page-title-main">Hash function</span> Mapping arbitrary data to fixed-size values

A hash function is any function that can be used to map data of arbitrary size to fixed-size values, though there are some hash functions that support variable length output. The values returned by a hash function are called hash values, hash codes, digests, or simply hashes. The values are usually used to index a fixed-size table called a hash table. Use of a hash function to index a hash table is called hashing or scatter storage addressing.

<span class="mw-page-title-main">Hash table</span> Associative array for storing key-value pairs

In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an index, also called a hash code, into an array of buckets or slots, from which the desired value can be found. During lookup, the key is hashed and the resulting hash indicates where the corresponding value is stored.

<span class="mw-page-title-main">Lisp (programming language)</span> Programming language family

Lisp is a family of programming languages with a long history and a distinctive, fully parenthesized prefix notation. Originally specified in 1960, Lisp is the second-oldest high-level programming language still in common use, after Fortran. Lisp has changed since its early days, and many dialects have existed over its history. Today, the best-known general-purpose Lisp dialects are Common Lisp, Scheme, Racket and Clojure.

<span class="mw-page-title-main">Scheme (programming language)</span> Dialect of Lisp

Scheme is a dialect of the Lisp family of programming languages. Scheme was created during the 1970s at the MIT Computer Science and Artificial Intelligence Laboratory and released by its developers, Guy L. Steele and Gerald Jay Sussman, via a series of memos now known as the Lambda Papers. It was the first dialect of Lisp to choose lexical scope and the first to require implementations to perform tail-call optimization, giving stronger support for functional programming and associated techniques such as recursive algorithms. It was also one of the first programming languages to support first-class continuations. It had a significant influence on the effort that led to the development of Common Lisp.

<span class="mw-page-title-main">Trie</span> K-ary search tree data structure

In computer science, a trie, also called digital tree or prefix tree, is a type of k-ary search tree, a tree data structure used for locating specific keys from within a set. These keys are most often strings, with links between nodes defined not by the entire key, but by individual characters. In order to access a key, the trie is traversed depth-first, following the links between nodes, which represent each character in the key.

<span class="mw-page-title-main">Lua (programming language)</span> Lightweight programming language

Lua is a lightweight, high-level, multi-paradigm programming language designed primarily for embedded use in applications. Lua is cross-platform, since the interpreter of compiled bytecode is written in ANSI C, and Lua has a relatively simple C API to embed it into applications.

In computer science, an associative array, map, symbol table, or dictionary is an abstract data type that stores a collection of pairs, such that each possible key appears at most once in the collection. In mathematical terms, an associative array is a function with finite domain. It supports 'lookup', 'remove', and 'insert' operations.

<span class="mw-page-title-main">Flyweight pattern</span> Software design pattern for objects

In computer programming, the flyweight software design pattern refers to an object that minimizes memory usage by sharing some of its data with other similar objects. The flyweight pattern is one of twenty-three well-known GoF design patterns. These patterns promote flexible object-oriented software design, which is easier to implement, change, test, and reuse.

In object-oriented and functional programming, an immutable object is an object whose state cannot be modified after it is created. This is in contrast to a mutable object, which can be modified after it is created. In some cases, an object is considered immutable even if some internally used attributes change, but the object's state appears unchanging from an external point of view. For example, an object that uses memoization to cache the results of expensive computations could still be considered an immutable object.

In computer science, a set is an abstract data type that can store unique values, without any particular order. It is a computer implementation of the mathematical concept of a finite set. Unlike most other collection types, rather than retrieving a specific element from a set, one typically tests a value for membership in a set.

In computer science, a lookup table (LUT) is an array that replaces runtime computation with a simpler array indexing operation. The process is termed as "direct addressing" and LUTs differ from hash tables in a way that, to retrieve a value with key , a hash table would store the value in the slot where is a hash function i.e. is used to compute the slot, while in the case of LUT, the value is stored in slot , thus directly addressable. The savings in processing time can be significant, because retrieving a value from memory is often faster than carrying out an "expensive" computation or input/output operation. The tables may be precalculated and stored in static program storage, calculated as part of a program's initialization phase (memoization), or even stored in hardware in application-specific platforms. Lookup tables are also used extensively to validate input values by matching against a list of valid items in an array and, in some programming languages, may include pointer functions to process the matching input. FPGAs also make extensive use of reconfigurable, hardware-implemented, lookup tables to provide programmable hardware functionality.

In computing, memoization or memoisation is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls to pure functions and returning the cached result when the same inputs occur again. Memoization has also been used in other contexts, such as in simple mutually recursive descent parsing. It is a type of caching, distinct from other forms of caching such as buffering and page replacement. In the context of some logic programming languages, memoization is also known as tabling.

A bit array is an array data structure that compactly stores bits. It can be used to implement a simple set data structure. A bit array is effective at exploiting bit-level parallelism in hardware to perform operations quickly. A typical bit array stores kw bits, where w is the number of bits in the unit of storage, such as a byte or word, and k is some nonnegative integer. If w does not divide the number of bits to be stored, some space is wasted due to internal fragmentation.

<span class="mw-page-title-main">Linear probing</span> Computer programming method for hashing

Linear probing is a scheme in computer programming for resolving collisions in hash tables, data structures for maintaining a collection of key–value pairs and looking up the value associated with a given key. It was invented in 1954 by Gene Amdahl, Elaine M. McGraw, and Arthur Samuel and first analyzed in 1963 by Donald Knuth.

In computer science, consistent hashing is a special kind of hashing technique such that when a hash table is resized, only keys need to be remapped on average where is the number of keys and is the number of slots. In contrast, in most traditional hash tables, a change in the number of array slots causes nearly all keys to be remapped because the mapping between the keys and the slots is defined by a modular operation.

<span class="mw-page-title-main">Cuckoo hashing</span> Data structure hashing scheme

Cuckoo hashing is a scheme in computer programming for resolving hash collisions of values of hash functions in a table, with worst-case constant lookup time. The name derives from the behavior of some species of cuckoo, where the cuckoo chick pushes the other eggs or young out of the nest when it hatches in a variation of the behavior referred to as brood parasitism; analogously, inserting a new key into a cuckoo hashing table may push an older key to a different location in the table.

In mathematics and computing, universal hashing refers to selecting a hash function at random from a family of hash functions with a certain mathematical property. This guarantees a low number of collisions in expectation, even if the data is chosen by an adversary. Many universal families are known, and their evaluation is often very efficient. Universal hashing has numerous uses in computer science, for example in implementations of hash tables, randomized algorithms, and cryptography.

This Comparison of programming languages (associative arrays) compares the features of associative array data structures or array-lookup processing for over 40 computer programming languages.

A concurrent hash-trie or Ctrie is a concurrent thread-safe lock-free implementation of a hash array mapped trie. It is used to implement the concurrent map abstraction. It has particularly scalable concurrent insert and remove operations and is memory-efficient. It is the first known concurrent data-structure that supports O(1), atomic, lock-free snapshots.

References

  1. Liljenzin, Olle (2013). "Confluently Persistent Sets and Maps". arXiv: 1301.3388 [cs.DS].
  2. Allen, John (1978). Anatomy of Lisp. McGraw Hill. ISBN   0-07-001115-X.
  3. Filliâtre, Jean-Christophe; Conchon, Sylvain (2006). "Type-Safe Modular Hash-Consing". Workshop on ML. ACM.
  4. Ershov, A. P. (1 August 1958). "On programming of arithmetic operations". Communications of the ACM. 1 (8): 3–6. doi: 10.1145/368892.368907 . ISSN   0001-0782. S2CID   15986378.
  5. "Sharing and Common Subexpression Elimination in EDSL compilation". okmij.org. Retrieved 27 April 2023.
  6. Deutsch, Laurence Peter (1973). An Interactive Program Verifier (PDF) (Phd). Palo Alto: Xerox Palo Alto Research Center Technical Report CSL-73-1.
  7. Goto, Eiichi (1974). Monocopy and associative algorithms in extended Lisp (PDF) (Technical report). Tokyo: University of Tokyo Technical Report TR 74-03.

Further reading