Lookup table

Last updated

In computer science, a lookup table (LUT) is an array that replaces runtime computation with a simpler array indexing operation, in a process termed as direct addressing. The savings in processing time can be significant, because retrieving a value from memory is often faster than carrying out an "expensive" computation or input/output operation. [1] The tables may be precalculated and stored in static program storage, calculated (or "pre-fetched") as part of a program's initialization phase ( memoization ), or even stored in hardware in application-specific platforms. Lookup tables are also used extensively to validate input values by matching against a list of valid (or invalid) items in an array and, in some programming languages, may include pointer functions (or offsets to labels) to process the matching input. FPGAs also make extensive use of reconfigurable, hardware-implemented, lookup tables to provide programmable hardware functionality. LUTs differ from hash tables in a way that, to retrieve a value with key , a hash table would store the value in the slot where is a hash function i.e. is used to compute the slot, while in the case of LUT, the value is stored in slot , thus directly addressable. [2] :466

Contents

History

Part of a 20th-century table of common logarithms in the reference book Abramowitz and Stegun. Abramowitz&Stegun.page97.agr.jpg
Part of a 20th-century table of common logarithms in the reference book Abramowitz and Stegun.

Before the advent of computers, lookup tables of values were used to speed up hand calculations of complex functions, such as in trigonometry, logarithms, and statistical density functions. [3]

In ancient (499 AD) India, Aryabhata created one of the first sine tables, which he encoded in a Sanskrit-letter-based number system. In 493 AD, Victorius of Aquitaine wrote a 98-column multiplication table which gave (in Roman numerals) the product of every number from 2 to 50 times and the rows were "a list of numbers starting with one thousand, descending by hundreds to one hundred, then descending by tens to ten, then by ones to one, and then the fractions down to 1/144" [4] Modern school children are often taught to memorize "times tables" to avoid calculations of the most commonly used numbers (up to 9 x 9 or 12 x 12).

Early in the history of computers, input/output operations were particularly slow – even in comparison to processor speeds of the time. It made sense to reduce expensive read operations by a form of manual caching by creating either static lookup tables (embedded in the program) or dynamic prefetched arrays to contain only the most commonly occurring data items. Despite the introduction of systemwide caching that now automates this process, application level lookup tables can still improve performance for data items that rarely, if ever, change.

Lookup tables were one of the earliest functionalities implemented in computer spreadsheets, with the initial version of VisiCalc (1979) including a LOOKUP function among its original 20 functions. [5] This has been followed by subsequent spreadsheets, such as Microsoft Excel, and complemented by specialized VLOOKUP and HLOOKUP functions to simplify lookup in a vertical or horizontal table. In Microsoft Excel the XLOOKUP function has been rolled out starting 28 August 2019.

Limitations

Although the performance of a LUT is a guaranteed for a lookup operation, no two entities or values can have the same key . When the size of universe —where the keys are drawn—is large, it might be impractical or impossible to be stored in memory. Hence, in this case, a hash table would be a preferable alternative. [2] :468

Examples

Trivial hash function

For a trivial hash function lookup, the unsigned raw data value is used directly as an index to a one-dimensional table to extract a result. For small ranges, this can be amongst the fastest lookup, even exceeding binary search speed with zero branches and executing in constant time. [6]

Counting bits in a series of bytes

One discrete problem that is expensive to solve on many computers is that of counting the number of bits that are set to 1 in a (binary) number, sometimes called the population function . For example, the decimal number "37" is "00100101" in binary, so it contains three bits that are set to binary "1". [7] :282

A simple example of C code, designed to count the 1 bits in a int, might look like this: [7] :283

intcount_ones(unsignedintx){intresult=0;while(x!=0){x=x&(x-1);result++;}returnresult;}

The above implementation requires 32 operations for an evaluation of a 32-bit value, which can potentially take several clock cycles due to branching. It can be "unrolled" into a lookup table which in turn uses trivial hash function for better performance. [7] :282-283

The bits array, bits_set with 256 entries is constructed by giving the number of one bits set in each possible byte value (e.g. 0x00 = 0, 0x01 = 1, 0x02 = 1, and so on). Although a runtime algorithm can be used to generate the bits_set array, it's an inefficient usage of clock cycles when the size is taken into consideration, hence a precomputed table is used—although a compile time script could be used to dynamically generate and append the table to the source file. Sum of ones in each byte of the integer can be calculated through trivial hash function lookup on each byte; thus, effectively avoiding branches resulting in considerable improvement in performance. [7] :284

intcount_ones(intinput_value){unionfour_bytes{intbig_int;chareach_byte[4];}operand=input_value;constintbits_set[256]={0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4,1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,4,5,5,6,5,6,6,7,5,6,6,7,6,7,7,8};return(bits_set[operand.each_byte[0]]+bits_set[operand.each_byte[1]]+bits_set[operand.each_byte[2]]+bits_set[operand.each_byte[3]]);}}

Lookup tables in image processing

Red (A), Green (B), Blue (C) 16-bit lookup table file sample. (Lines 14 to 65524 not shown) Red Green Blue 16 bit Look up Table Sample.svg
Red (A), Green (B), Blue (C) 16-bit lookup table file sample. (Lines 14 to 65524 not shown)

"Lookup tables (LUTs) are an excellent technique for optimizing the evaluation of functions that are expensive to compute and inexpensive to cache. ... For data requests that fall between the table's samples, an interpolation algorithm can generate reasonable approximations by averaging nearby samples." [8]

In data analysis applications, such as image processing, a lookup table (LUT) can be used to transform the input data into a more desirable output format. For example, a grayscale picture of the planet Saturn could be transformed into a color image to emphasize the differences in its rings.

In image processing, lookup tables are often called LUT s (or 3DLUT), and give an output value for each of a range of index values. One common LUT, called the colormap or palette , is used to determine the colors and intensity values with which a particular image will be displayed. In computed tomography, "windowing" refers to a related concept for determining how to display the intensity of measured radiation.

Discussion

A classic example of reducing run-time computations using lookup tables is to obtain the result of a trigonometry calculation, such as the sine of a value. [9] Calculating trigonometric functions can substantially slow a computing application. The same application can finish much sooner when it first precalculates the sine of a number of values, for example for each whole number of degrees (The table can be defined as static variables at compile time, reducing repeated run time costs). When the program requires the sine of a value, it can use the lookup table to retrieve the closest sine value from a memory address, and may also interpolate to the sine of the desired value, instead of calculating by mathematical formula. Lookup tables can thus used by mathematics coprocessors in computer systems. An error in a lookup table was responsible for Intel's infamous floating-point divide bug.

Functions of a single variable (such as sine and cosine) may be implemented by a simple array. Functions involving two or more variables require multidimensional array indexing techniques. The latter case may thus employ a two-dimensional array of power[x][y] to replace a function to calculate xy for a limited range of x and y values. Functions that have more than one result may be implemented with lookup tables that are arrays of structures.

As mentioned, there are intermediate solutions that use tables in combination with a small amount of computation, often using interpolation. Pre-calculation combined with interpolation can produce higher accuracy for values that fall between two precomputed values. This technique requires slightly more time to be performed but can greatly enhance accuracy in applications that require it. Depending on the values being precomputed, precomputation with interpolation can also be used to shrink the lookup table size while maintaining accuracy.

While often effective, employing a lookup table may nevertheless result in a severe penalty if the computation that the LUT replaces is relatively simple. Memory retrieval time and the complexity of memory requirements can increase application operation time and system complexity relative to what would be required by straight formula computation. The possibility of polluting the cache may also become a problem. Table accesses for large tables will almost certainly cause a cache miss. This phenomenon is increasingly becoming an issue as processors outpace memory. A similar issue appears in rematerialization, a compiler optimization. In some environments, such as the Java programming language, table lookups can be even more expensive due to mandatory bounds-checking involving an additional comparison and branch for each lookup.

There are two fundamental limitations on when it is possible to construct a lookup table for a required operation. One is the amount of memory that is available: one cannot construct a lookup table larger than the space available for the table, although it is possible to construct disk-based lookup tables at the expense of lookup time. The other is the time required to compute the table values in the first instance; although this usually needs to be done only once, if it takes a prohibitively long time, it may make the use of a lookup table an inappropriate solution. As previously stated however, tables can be statically defined in many cases.

Computing sines

Most computers only perform basic arithmetic operations and cannot directly calculate the sine of a given value. Instead, they use the CORDIC algorithm or a complex formula such as the following Taylor series to compute the value of sine to a high degree of precision: [10] :5

(for x close to 0)

However, this can be expensive to compute, especially on slow processors, and there are many applications, particularly in traditional computer graphics, that need to compute many thousands of sine values every second. A common solution is to initially compute the sine of many evenly distributed values, and then to find the sine of x we choose the sine of the value closest to x through array indexing operation. This will be close to the correct value because sine is a continuous function with a bounded rate of change. [10] :6 For example: [11] :545–548

realarraysine_table[-1000..1000]forxfrom-1000to1000sine_table[x]=sine(pi*x/1000)functionlookup_sine(x)returnsine_table[round(1000*x/pi)]
Linear interpolation on a portion of the sine function Interpolation example linear.svg
Linear interpolation on a portion of the sine function

Unfortunately, the table requires quite a bit of space: if IEEE double-precision floating-point numbers are used, over 16,000 bytes would be required. We can use fewer samples, but then our precision will significantly worsen. One good solution is linear interpolation, which draws a line between the two points in the table on either side of the value and locates the answer on that line. This is still quick to compute, and much more accurate for smooth functions such as the sine function. Here is an example using linear interpolation:

functionlookup_sine(x)x1=floor(x*1000/pi)y1=sine_table[x1]y2=sine_table[x1+1]returny1+(y2-y1)*(x*1000/pi-x1)

Linear interpolation provides for an interpolated function that is continuous, but will not, in general, have continuous derivatives. For smoother interpolation of table lookup that is continuous and has continuous first derivative, one should use the cubic Hermite spline.

When using interpolation, the size of the lookup table can be reduced by using nonuniform sampling , which means that where the function is close to straight, we use few sample points, while where it changes value quickly we use more sample points to keep the approximation close to the real curve. For more information, see interpolation.

Other usages of lookup tables

Caches

Storage caches (including disk caches for files, or processor caches for either code or data) work also like a lookup table. The table is built with very fast memory instead of being stored on slower external memory, and maintains two pieces of data for a sub-range of bits composing an external memory (or disk) address (notably the lowest bits of any possible external address):

A single (fast) lookup is performed to read the tag in the lookup table at the index specified by the lowest bits of the desired external storage address, and to determine if the memory address is hit by the cache. When a hit is found, no access to external memory is needed (except for write operations, where the cached value may need to be updated asynchronously to the slower memory after some time, or if the position in the cache must be replaced to cache another address).

Hardware LUTs

In digital logic, a lookup table can be implemented with a multiplexer whose select lines are driven by the address signal and whose inputs are the values of the elements contained in the array. These values can either be hard-wired, as in an ASIC whose purpose is specific to a function, or provided by D latches which allow for configurable values. (ROM, EPROM, EEPROM, or RAM.)

An n-bit LUT can encode any n-input Boolean function by storing the truth table of the function in the LUT. This is an efficient way of encoding Boolean logic functions, and LUTs with 4-6 bits of input are in fact the key component of modern field-programmable gate arrays (FPGAs) which provide reconfigurable hardware logic capabilities.

Data acquisition and control systems

In data acquisition and control systems, lookup tables are commonly used to undertake the following operations in:

In some systems, polynomials may also be defined in place of lookup tables for these calculations.

See also

Related Research Articles

<span class="mw-page-title-main">Binary search</span> Search algorithm finding the position of a target value within a sorted array

In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the middle element of the array. If they are not equal, the half in which the target cannot lie is eliminated and the search continues on the remaining half, again taking the middle element to compare to the target value, and repeating this until the target value is found. If the search ends with the remaining half being empty, the target is not in the array.

<span class="mw-page-title-main">Hash function</span> Mapping arbitrary data to fixed-size values

A hash function is any function that can be used to map data of arbitrary size to fixed-size values, though there are some hash functions that support variable-length output. The values returned by a hash function are called hash values, hash codes, hash digests, digests, or simply hashes. The values are usually used to index a fixed-size table called a hash table. Use of a hash function to index a hash table is called hashing or scatter-storage addressing.

<span class="mw-page-title-main">Hash table</span> Associative array for storing key-value pairs

In computing, a hash table is a data structure that implements an associative array, also called a dictionary or simply map; an associative array is an abstract data type that maps keys to values. A hash table uses a hash function to compute an index, also called a hash code, into an array of buckets or slots, from which the desired value can be found. During lookup, the key is hashed and the resulting hash indicates where the corresponding value is stored. A map implemented by a hash table is called a hash map.

<span class="mw-page-title-main">Trie</span> K-ary search tree data structure

In computer science, a trie, also called digital tree or prefix tree, is a type of k-ary search tree, a tree data structure used for locating specific keys from within a set. These keys are most often strings, with links between nodes defined not by the entire key, but by individual characters. In order to access a key, the trie is traversed depth-first, following the links between nodes, which represent each character in the key.

A cyclic redundancy check (CRC) is an error-detecting code commonly used in digital networks and storage devices to detect accidental changes to digital data. Blocks of data entering these systems get a short check value attached, based on the remainder of a polynomial division of their contents. On retrieval, the calculation is repeated and, in the event the check values do not match, corrective action can be taken against data corruption. CRCs can be used for error correction.

Pearson hashing is a non-cryptographic hash function designed for fast execution on processors with 8-bit registers. Given an input consisting of any number of bytes, it produces as output a single byte that is strongly dependent on every byte of the input. Its implementation requires only a few instructions, plus a 256-byte lookup table containing a permutation of the values 0 through 255.

A numerically controlled oscillator (NCO) is a digital signal generator which creates a synchronous, discrete-time, discrete-valued representation of a waveform, usually sinusoidal. NCOs are often used in conjunction with a digital-to-analog converter (DAC) at the output to create a direct digital synthesizer (DDS).

<span class="mw-page-title-main">Perfect hash function</span> Hash function without any collisions

In computer science, a perfect hash functionh for a set S is a hash function that maps distinct elements in S to a set of m integers, with no collisions. In mathematical terms, it is an injective function.

A bitboard is a specialized bit array data structure commonly used in computer systems that play board games, where each bit corresponds to a game board space or piece. This allows parallel bitwise operations to set or query the game state, or determine moves or plays in the game.

A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not – in other words, a query returns either "possibly in set" or "definitely not in set". Elements can be added to the set, but not removed ; the more items added, the larger the probability of false positives.

A rolling hash is a hash function where the input is hashed in a window that moves through the input.

In computer programming, a branch table or jump table is a method of transferring program control (branching) to another part of a program using a table of branch or jump instructions. It is a form of multiway branch. The branch table construction is commonly used when programming in assembly language but may also be generated by compilers, especially when implementing optimized switch statements whose values are densely packed together.

<span class="mw-page-title-main">Computation of cyclic redundancy checks</span> Overview of the computation of cyclic redundancy checks

Computation of a cyclic redundancy check is derived from the mathematics of polynomial division, modulo two. In practice, it resembles long division of the binary message string, with a fixed number of zeroes appended, by the "generator polynomial" string except that exclusive or operations replace subtractions. Division of this type is efficiently realised in hardware by a modified shift register, and in software by a series of equivalent algorithms, starting with simple code close to the mathematics and becoming faster through byte-wise parallelism and space–time tradeoffs.

<span class="mw-page-title-main">Threefish</span> Block cipher

Threefish is a symmetric-key tweakable block cipher designed as part of the Skein hash function, an entry in the NIST hash function competition. Threefish uses no S-boxes or other table lookups in order to avoid cache timing attacks; its nonlinearity comes from alternating additions with exclusive ORs. In that respect, it is similar to Salsa20, TEA, and the SHA-3 candidates CubeHash and BLAKE.

The following tables compare general and technical information for a number of cryptographic hash functions. See the individual functions' articles for further information. This article is not all-inclusive or necessarily up-to-date. An overview of hash function security/cryptanalysis can be found at hash function security summary.

bcrypt is a password-hashing function designed by Niels Provos and David Mazières, based on the Blowfish cipher and presented at USENIX in 1999. Besides incorporating a salt to protect against rainbow table attacks, bcrypt is an adaptive function: over time, the iteration count can be increased to make it slower, so it remains resistant to brute-force search attacks even with increasing computation power.

<span class="mw-page-title-main">Precomputation</span> Act of performing an initial computation before run time

In algorithms, precomputation is the act of performing an initial computation before run time to generate a lookup table that can be used by an algorithm to avoid repeated computation each time it is executed. Precomputation is often used in algorithms that depend on the results of expensive computations that don't depend on the input of the algorithm. A trivial example of precomputation is the use of hardcoded mathematical constants, such as π and e, rather than computing their approximations to the necessary precision at run time.

In computer science, a family of hash functions is said to be k-independent, k-wise independent or k-universal if selecting a function at random from the family guarantees that the hash codes of any designated k keys are independent random variables. Such families allow good average case performance in randomized algorithms or data structures, even if the input data is chosen by an adversary. The trade-offs between the degree of independence and the efficiency of evaluating the hash function are well studied, and many k-independent families have been proposed.

In computer science, tabulation hashing is a method for constructing universal families of hash functions by combining table lookup with exclusive or operations. It was first studied in the form of Zobrist hashing for computer games; later work by Carter and Wegman extended this method to arbitrary fixed-length keys. Generalizations of tabulation hashing have also been developed that can handle variable-length keys such as text strings.

Approximate Membership Query Filters comprise a group of space-efficient probabilistic data structures that support approximate membership queries. An approximate membership query answers whether an element is in a set or not with a false positive rate of .

References

  1. McNamee, Paul (21 August 1998). "Automated Memoization in C++". Archived from the original on 16 April 2019.{{cite web}}: CS1 maint: unfit URL (link)
  2. 1 2 Kwok, W.; Haghighi, K.; Kang, E. (1995). "An efficient data structure for the advancing-front triangular mesh generation technique". Communications in Numerical Methods in Engineering. 11 (5). Wiley & Sons: 465–473. doi:10.1002/cnm.1640110511.
  3. Campbell-Kelly, Martin; Croarken, Mary; Robson, Eleanor, eds. (2003). The History of Mathematical Tables: From Sumer to Spreadsheets. Oxford University Press.
  4. Maher, David. W. J. and John F. Makowski. "Literary Evidence for Roman Arithmetic With Fractions", 'Classical Philology' (2001) Vol. 96 No. 4 (2001) pp. 376–399. (See page p.383.)
  5. Bill Jelen: "From 1979 – VisiCalc and LOOKUP"!, by MrExcel East, 31 March 2012
  6. Cormen, Thomas H. (2009). Introduction to algorithms (3rd ed.). Cambridge, Mass.: MIT Press. pp. 253–255. ISBN   9780262033848 . Retrieved 26 November 2015.
  7. 1 2 3 4 Jungck P.; Dencan R.; Mulcahy D. (2011). Developing for Performance. In: packetC Programming. Apress. doi:10.1007/978-1-4302-4159-1_26. ISBN   978-1-4302-4159-1.
  8. nvidia gpu gems2 : using-lookup-tables-accelerate-color
  9. Sasao, T.; Butler, J. T.; Riedel, M. D. "Application of LUT Cascades to Numerical Function Generators". Defence Technical Information Center. NAVAL POSTGRADUATE SCHOOL MONTEREY CA DEPT OF ELECTRICAL AND COMPUTER ENGINEERING. Retrieved 17 May 2024.{{cite web}}: CS1 maint: multiple names: authors list (link)
  10. 1 2 Sharif, Haidar (2014). "High-performance mathematical functions for single-core architectures". Journal of Circuits, Systems and Computers. 23 (4). World Scientific. doi:10.1142/S0218126614500510.
  11. Randall Hyde (1 March 2010). The Art of Assembly Language, 2nd Edition (PDF). No Starch Press. ISBN   978-1593272074 via University of Campinas Institute of Computing.