XOR linked list

Last updated

An XOR linked list is a type of data structure used in computer programming. It takes advantage of the bitwise XOR operation to decrease storage requirements for doubly linked lists by storing the composition of both addresses in one field. While the composed address is not meaningful on its own, during traversal it can be combined with knowledge of the last-visited node address to deduce the address of the following node.

Contents

Description

An ordinary doubly linked list stores addresses of the previous and next list items in each list node, requiring two address fields:

 ... A        B        C         D         E ...          –>  next –>  next  –>  next  –>          <–  prev <–  prev  <–  prev  <–

An XOR linked list compresses the same information into one address field by storing the bitwise XOR (here denoted by ⊕) of the address for previous and the address for next in one field:

 ... A        B          C          D         E ...          ⇌   A⊕C   ⇌   B⊕D   ⇌   C⊕E   ⇌

More formally:

  link(B) = addr(A)⊕addr(C), link(C) = addr(B)⊕addr(D), ...

When traversing the list from left to right: supposing the cursor is at C, the previous item, B, may be XORed with the value in the link field (B⊕D). The address for D will then be obtained and list traversal may resume. The same pattern applies in the other direction.

i.e. addr(D) = link(C) ⊕ addr(B) where

      link(C) = addr(B)⊕addr(D)

so

      addr(D) = addr(B)⊕addr(D) ⊕ addr(B)                     addr(D) = addr(B)⊕addr(B) ⊕ addr(D) 

since

       X⊕X = 0                         => addr(D) = 0 ⊕ addr(D)

since

       X⊕0 = X        => addr(D) = addr(D)

The XOR operation cancels addr(B) appearing twice in the equation and all we are left with is the addr(D).

To start traversing the list in either direction from some point, the address of two consecutive items is required. If the addresses of the two consecutive items are reversed, list traversal will occur in the opposite direction. [1]

Theory of operation

The key is the first operation, and the properties of XOR:

The R2 register always contains the XOR of the address of current item C with the address of the predecessor item P: C⊕P. The Link fields in the records contain the XOR of the left and right successor addresses, say L⊕R. XOR of R2 (C⊕P) with the current link field (L⊕R) yields C⊕P⊕L⊕R.

In each case, the result is the XOR of the current address with the next address. XOR of this with the current address in R1 leaves the next address. R2 is left with the requisite XOR pair of the (now) current address and the predecessor.

Features

X  R2,Link    R2 <- C⊕D ⊕ B⊕D (i.e. B⊕C, "Link" being the link field                                   in the current record, containing B⊕D) XR R1,R2      R1 <- C ⊕ B⊕C    (i.e. B, voilà: the next record)

Drawbacks

Computer systems have increasingly cheap and plentiful memory, therefore storage overhead is not generally an overriding issue outside specialized embedded systems. Where it is still desirable to reduce the overhead of a linked list, unrolling provides a more practical approach (as well as other advantages, such as increasing cache performance and speeding random access).

Variations

The underlying principle of the XOR linked list can be applied to any reversible binary operation. Replacing XOR by addition or subtraction gives slightly different, but largely equivalent, formulations:

Addition linked list

 ...  A        B         C         D         E  ...           ⇌   A+C   ⇌   B+D   ⇌   C+E   ⇌

This kind of list has exactly the same properties as the XOR linked list, except that a zero link field is not a "mirror". The address of the next node in the list is given by subtracting the previous node's address from the current node's link field.

Subtraction linked list

 ...  A        B         C         D         E  ...           ⇌   C-A   ⇌   D-B   ⇌   E-C   ⇌

This kind of list differs from the standard "traditional" XOR linked list in that the instruction sequences needed to traverse the list forwards is different from the sequence needed to traverse the list in reverse. The address of the next node, going forwards, is given by adding the link field to the previous node's address; the address of the preceding node is given by subtracting the link field from the next node's address.

The subtraction linked list is also special in that the entire list can be relocated in memory without needing any patching of pointer values, since adding a constant offset to each address in the list will not require any changes to the values stored in the link fields. (See also serialization.) This is an advantage over both XOR linked lists and traditional linked lists.

Binary search tree

The XOR linked list concept can be generalized to XOR binary search trees. [3]

See also

Related Research Articles

<span class="mw-page-title-main">Binary tree</span> Limited form of tree data structure

In computer science, a binary tree is a tree data structure in which each node has at most two children, referred to as the left child and the right child. That is, it is a k-ary tree with k = 2. A recursive definition using set theory is that a binary tree is a tuple (L, S, R), where L and R are binary trees or the empty set and S is a singleton set containing the root.

In computer science, a B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree generalizes the binary search tree, allowing for nodes with more than two children. Unlike other self-balancing binary search trees, the B-tree is well suited for storage systems that read and write relatively large blocks of data, such as databases and file systems.

In computer science, a linked list is a linear collection of data elements whose order is not given by their physical placement in memory. Instead, each element points to the next. It is a data structure consisting of a collection of nodes which together represent a sequence. In its most basic form, each node contains data, and a reference to the next node in the sequence. This structure allows for efficient insertion or removal of elements from any position in the sequence during iteration. More complex variants add additional links, allowing more efficient insertion or removal of nodes at arbitrary positions. A drawback of linked lists is that data access time is linear in respect to the number of nodes in the list. Because nodes are serially linked, accessing any node requires that the prior node be accessed beforehand. Faster access, such as random access, is not feasible. Arrays have better cache locality compared to linked lists.

<span class="mw-page-title-main">Dijkstra's algorithm</span> Algorithm for finding shortest paths

Dijkstra's algorithm is an algorithm for finding the shortest paths between nodes in a weighted graph, which may represent, for example, road networks. It was conceived by computer scientist Edsger W. Dijkstra in 1956 and published three years later.

<span class="mw-page-title-main">XOR swap algorithm</span> Binary arithmetic algorithm

In computer programming, the exclusive or swap is an algorithm that uses the exclusive or bitwise operation to swap the values of two variables without using the temporary variable which is normally required.

In CPU design, the use of a sum-addressed decoder (SAD) or sum-addressed memory (SAM) decoder is a method of reducing the latency of the CPU cache access and address calculation. This is achieved by fusing the address generation sum operation with the decode operation in the cache SRAM.

Kademlia is a distributed hash table for decentralized peer-to-peer computer networks designed by Petar Maymounkov and David Mazières in 2002. It specifies the structure of the network and the exchange of information through node lookups. Kademlia nodes communicate among themselves using UDP. A virtual or overlay network is formed by the participant nodes. Each node is identified by a number or node ID. The node ID serves not only as identification, but the Kademlia algorithm uses the node ID to locate values.

<span class="mw-page-title-main">Unrolled linked list</span>

In computer programming, an unrolled linked list is a variation on the linked list which stores multiple elements in each node. It can dramatically increase cache performance, while decreasing the memory overhead associated with storing list metadata such as references. It is related to the B-tree.

Loop unrolling, also known as loop unwinding, is a loop transformation technique that attempts to optimize a program's execution speed at the expense of its binary size, which is an approach known as space–time tradeoff. The transformation can be undertaken manually by the programmer or by an optimizing compiler. On modern processors, loop unrolling is often counterproductive, as the increased code size can cause more cache misses; cf. Duff's device.

<i>m</i>-ary tree Tree data structure in which each node has at most m children.

In graph theory, an m-ary tree is an arborescence in which each node has no more than m children. A binary tree is an important case where m = 2; similarly, a ternary tree is one where m = 3.

In computer programming, a sentinel node is a specifically designated node used with linked lists and trees as a traversal path terminator. This type of node does not hold or reference any data managed by the data structure.

<span class="mw-page-title-main">Threaded binary tree</span> Binary tree variant

In computing, a threaded binary tree is a binary tree variant that facilitates traversal in a particular order.

In computer science, a doubly linked list is a linked data structure that consists of a set of sequentially linked records called nodes. Each node contains three fields: two link fields and one data field. The beginning and ending nodes' previous and next links, respectively, point to some kind of terminator, typically a sentinel node or null, to facilitate traversal of the list. If there is only one sentinel node, then the list is circularly linked via the sentinel node. It can be conceptualized as two singly linked lists formed from the same data items, but in opposite sequential orders.

<span class="mw-page-title-main">TI-990</span> Series of 16-bit computers by Texas Instruments.

The TI-990 was a series of 16-bit minicomputers sold by Texas Instruments (TI) in the 1970s and 1980s. The TI-990 was a replacement for TI's earlier minicomputer systems, the TI-960 and the TI-980. It had several unique features, and was easier to program than its predecessors.

In multithreaded computing, the ABA problem occurs during synchronization, when a location is read twice, has the same value for both reads, and the read value being the same twice is used to conclude that nothing has happened in the interim; however, another thread can execute between the two reads and change the value, do other work, then change the value back, thus fooling the first thread into thinking nothing has changed even though the second thread did work that violates that assumption.

<span class="mw-page-title-main">ATS (programming language)</span> Programming language

In computing, ATS is a multi-paradigm, general-purpose, high-level, functional programming language. It is a dialect of the programming language ML, designed by Hongwei Xi to unify computer programming with formal specification. ATS has support for combining theorem proving with practical programming through the use of advanced type systems. A past version of The Computer Language Benchmarks Game has demonstrated that the performance of ATS is comparable to that of the languages C and C++. By using theorem proving and strict type checking, the compiler can detect and prove that its implemented functions are not susceptible to bugs such as division by zero, memory leaks, buffer overflow, and other forms of memory corruption by verifying pointer arithmetic and reference counting before the program compiles. Also, by using the integrated theorem-proving system of ATS (ATS/LF), the programmer may make use of static constructs that are intertwined with the operative code to prove that a function conforms to its specification.

In computer science, a linked data structure is a data structure which consists of a set of data records (nodes) linked together and organized by references. The link between data can also be called a connector.

<span class="mw-page-title-main">Queap</span>

In computer science, a queap is a priority queue data structure. The data structure allows insertions and deletions of arbitrary elements, as well as retrieval of the highest-priority element. Each deletion takes amortized time logarithmic in the number of items that have been in the structure for a longer time than the removed item. Insertions take constant amortized time.

Skip graphs are a kind of distributed data structure based on skip lists. They were invented in 2003 by James Aspnes and Gauri Shah. A nearly identical data structure called SkipNet was independently invented by Nicholas Harvey, Michael Jones, Stefan Saroiu, Marvin Theimer and Alec Wolman, also in 2003.

Toi is an imperative, type-sensitive language that provides the basic functionality of a programming language. The language was designed and developed from the ground-up by Paul Longtine. Written in C, Toi was created with the intent to be an educational experience and serves as a learning tool for those looking to familiarize themselves with the inner-workings of a programming language.

References

  1. "XOR Linked List - A Memory Efficient Doubly Linked List | Set 1 - GeeksforGeeks". GeeksforGeeks. 2011-05-23. Retrieved 2018-10-29.
  2. Gadbois, David; et al. "GC [garbage collection] FAQ – draft" . Retrieved 5 December 2018.
  3. "c++ associative containers based on the XOR scapegoat tree" . Retrieved 5 November 2021.