Persistent array

Last updated

In computer science, and more precisely regarding data structures, a persistent array is a persistent data structure with properties similar to a (non-persistent) array. That is then, after a value's update in a persistent array, there exists two persistent arrays. One persistent array in which the update is taken into account, and one which is equal to the array before the update.

Contents

Difference between persistent arrays and arrays

An array is a data structure, with a fixed number n of elements . It is expected that, given the array ar and an index , the value can be retrieved quickly. This operation is called a look-up. Furthermore, given the array ar, an index and a new value v, a new array ar2 with content can be created quickly. This operation is called an update. The main difference between persistent and non-persistent arrays being that, in non-persistent arrays, the array ar is destroyed during the creation of ar2.

For example, consider the following pseudocode.

array = [0, 0, 0] updated_array = array.update(0, 8) other_array = array.update(1, 3)last_array = updated_array.update(2, 5)

At the end of execution, the value of array is still [0, 0, 0], the value of update_array is [8, 0, 0], the value of other_array is [0, 3, 0] and the value of last_array is [0, 8, 5].

There exists two kinds of persistent arrays. A persistent array may be either partially or fully persistent. A fully persistent array may be updated an arbitrary number of times while a partially persistent array may be updated at most once. In our previous example, if array were only partially persistent, the creation of other_array would be forbidden, however, the creation of last_array would still be valid. Indeed, updated_array is an array distinct from array and has never been updated before the creation of last_array.

Implementations

Many implementations of persistent arrays exists. In this section, the positive natural number n will always be the size of the persistent array.

Three implementations are discussed below. The first ones are the easiest one to implement, while the last ones are the more efficient.

Using purely functional data structures

The simplest implementation of a fully persistent array of size n consist in using an arbitrary persistent map, whose entry are the numbers from 0 to n − 1. Such a data structure may be, for example, a balanced tree. However, looking up an element in such a data structure would take logarithmic time. One of the main interest of array is that operations are executed in constant time. While it is impossible to create a data structures in which every operations takes constant time, [1] the following operations would allow look-up to be more efficient, at least on the last version of the structures.

Using an array, and a tree of modifications

A fully persistent array may be implemented using an array and the so-called Baker's trick. [2] This implementation is used in the OCaml module parray.ml [3] by Jean-Christophe Filliâtre.

In order to define this implementation, a few other definitions must be given. An initial array is an array which is not generated by an update on another array. A child of an array ar is an array of the form ar.update(i,v), and ar is the parent of ar.update(i,v). A descendant of an array ar is either ar or the descendant of a child of ar. The initial array of an array ar is either ar if ar is initial, or it is the initial array of the parent of ar. That is, the initial array of ar is the unique array init such that , with ar initial and an arbitrary sequence of indexes and an arbitrary sequence of value. A family of array is thus a set of arrays containing an initial array and all of its descendants. Finally, the tree of a family of arrays is the tree whose nodes are the arrays, and with an edge e from ar to each of its child ar.update(i,v).

A persistent array using the Baker's trick consists into a pair with an actual array called array and the tree of arrays. This tree admits an arbitrary root - not necessarily the initial array. The root may be moved to an arbitrary node of the tree. Changing the root from root to an arbitrary node ar takes time proportional in the depth of ar. That is, in the distance between root and ar. Similarly, looking up a value takes time proportional to the distance between the array and the root of its family. Thus, if the same array ar may be look-up multiple time, it is more efficient to move the root to ar before doing the look-up. Finally updating an array only takes constant time.

Technically, given two adjacent arrays ar1 and ar2, with ar1 closer to the root than ar2, the edge from ar1 to ar2 is labelled by (i,ar2[i]), where i the only position whose value differ between ar1 and ar2.

Accessing an element i of an array ar is done as follows. If ar is the root, then ar[i] equals root[i]. Otherwise, let e the edge leaving ar toward the root. If the label of e is (i,v) then ar[i] equals v. Otherwise, let ar2 be the other node of the edge e. Then ar[i] equals ar2[i]. The computation of ar2[i] being done recursively using the same definition.

The creation of ar.update(i,v) consists in adding a new node ar2 to the tree, and an edge e from ar to ar2 labelled by (i,v).

Finally, moving the root to a node ar is done as follows. If ar is already the root, there is nothing to do. Otherwise, let e the edge leaving ar toward the current root, (i,v) its label and ar2 the other end of e. Moving the root to ar is done by first moving the root to ar2, changing the label of e to (i, ar2[i]), and changing array[i] to v.

Log-log-time

In 1989, Dietz [4] gave an implementation of fully persistent arrays such that look-up and updates can be done in -time, and space , with m the number of arrays and n the number of element in an array. [1] This implementation is optimal for look-up, according to the so-called cell-probe model.

Note however that this implementation is far more complex than the two mentioned above, and thus won't be described in this article.

Related Research Articles

<span class="mw-page-title-main">AVL tree</span> Self-balancing binary search tree

In computer science, an AVL tree is a self-balancing binary search tree. It was the first such data structure to be invented. In an AVL tree, the heights of the two child subtrees of any node differ by at most one; if at any time they differ by more than one, rebalancing is done to restore this property. Lookup, insertion, and deletion all take O(log n) time in both the average and worst cases, where is the number of nodes in the tree prior to the operation. Insertions and deletions may require the tree to be rebalanced by one or more tree rotations.

<span class="mw-page-title-main">Binary search algorithm</span> Search algorithm finding the position of a target value within a sorted array

In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the middle element of the array. If they are not equal, the half in which the target cannot lie is eliminated and the search continues on the remaining half, again taking the middle element to compare to the target value, and repeating this until the target value is found. If the search ends with the remaining half being empty, the target is not in the array.

<span class="mw-page-title-main">Binary search tree</span> Rooted binary tree data structure

In computer science, a binary search tree (BST), also called an ordered or sorted binary tree, is a rooted binary tree data structure with the key of each internal node being greater than all the keys in the respective node's left subtree and less than the ones in its right subtree. The time complexity of operations on the binary search tree is directly proportional to the height of the tree.

<span class="mw-page-title-main">Binary tree</span> Limited form of tree data structure

In computer science, a binary tree is a k-ary tree data structure in which each node has at most two children, which are referred to as the left child and the right child. A recursive definition using just set theory notions is that a (non-empty) binary tree is a tuple, where L and R are binary trees or the empty set and S is a singleton set containing the root. Some authors allow the binary tree to be the empty set as well.

<span class="mw-page-title-main">Heap (data structure)</span> Computer science data structure

In computer science, a heap is a specialized tree-based data structure which is essentially an almost complete tree that satisfies the heap property: in a max heap, for any given node C, if P is a parent node of C, then the key of P is greater than or equal to the key of C. In a min heap, the key of P is less than or equal to the key of C. The node at the "top" of the heap is called the root node.

In computer science, a red–black tree is a kind of self-balancing binary search tree. Each node stores an extra bit representing "color", used to ensure that the tree remains balanced during insertions and deletions.

<span class="mw-page-title-main">Trie</span> K-ary search tree data structure

In computer science, a trie, also called digital tree or prefix tree, is a type of k-ary search tree, a tree data structure used for locating specific keys from within a set. These keys are most often strings, with links between nodes defined not by the entire key, but by individual characters. In order to access a key, the trie is traversed depth-first, following the links between nodes, which represent each character in the key.

<span class="mw-page-title-main">Binary heap</span> Variant of heap data structure

A binary heap is a heap data structure that takes the form of a binary tree. Binary heaps are a common way of implementing priority queues. The binary heap was introduced by J. W. J. Williams in 1964, as a data structure for heapsort.

<span class="mw-page-title-main">Treap</span>

In computer science, the treap and the randomized binary search tree are two closely related forms of binary search tree data structures that maintain a dynamic set of ordered keys and allow binary searches among the keys. After any sequence of insertions and deletions of keys, the shape of the tree is a random variable with the same probability distribution as a random binary tree; in particular, with high probability its height is proportional to the logarithm of the number of keys, so that each search, insertion, or deletion operation takes logarithmic time to perform.

In computer science, a Fibonacci heap is a data structure for priority queue operations, consisting of a collection of heap-ordered trees. It has a better amortized running time than many other priority queue data structures including the binary heap and binomial heap. Michael L. Fredman and Robert E. Tarjan developed Fibonacci heaps in 1984 and published them in a scientific journal in 1987. Fibonacci heaps are named after the Fibonacci numbers, which are used in their running time analysis.

In computer science, tree traversal is a form of graph traversal and refers to the process of visiting each node in a tree data structure, exactly once. Such traversals are classified by the order in which the nodes are visited. The following algorithms are described for a binary tree, but they may be generalized to other trees as well.

In computing, a persistent data structure or not ephemeral data structure is a data structure that always preserves the previous version of itself when it is modified. Such data structures are effectively immutable, as their operations do not (visibly) update the structure in-place, but instead always yield a new updated structure. The term was introduced in Driscoll, Sarnak, Sleator, and Tarjans' 1986 article.

<span class="mw-page-title-main">Suffix tree</span> Tree containing all suffixes of a given text

In computer science, a suffix tree is a compressed trie containing all the suffixes of the given text as their keys and positions in the text as their values. Suffix trees allow particularly fast implementations of many important string operations.

In computer science, a disjoint-set data structure, also called a union–find data structure or merge–find set, is a data structure that stores a collection of disjoint (non-overlapping) sets. Equivalently, it stores a partition of a set into disjoint subsets. It provides operations for adding new sets, merging sets, and finding a representative member of a set. The last operation makes it possible to find out efficiently if any two elements are in the same or different sets.

A B+ tree is an m-ary tree with a variable but often large number of children per node. A B+ tree consists of a root, internal nodes and leaves. The root may be either a leaf or a node with two or more children.

<i>m</i>-ary tree

In graph theory, an m-ary tree is a rooted tree in which each node has no more than m children. A binary tree is the special case where m = 2, and a ternary tree is another case with m = 3 that limits its children to three.

<span class="mw-page-title-main">Fenwick tree</span>

A Fenwick tree or binary indexed tree is a data structure that can efficiently update elements and calculate prefix sums in a table of numbers.

In computer science, a min-max heap is a complete binary tree data structure which combines the usefulness of both a min-heap and a max-heap, that is, it provides constant time retrieval and logarithmic time removal of both the minimum and maximum elements in it. This makes the min-max heap a very useful data structure to implement a double-ended priority queue. Like binary min-heaps and max-heaps, min-max heaps support logarithmic insertion and deletion and can be built in linear time. Min-max heaps are often represented implicitly in an array; hence it's referred to as an implicit data structure.

In computer science, the longest common prefix array is an auxiliary data structure to the suffix array. It stores the lengths of the longest common prefixes (LCPs) between all pairs of consecutive suffixes in a sorted suffix array.

The PH-tree is a tree data structure used for spatial indexing of multi-dimensional data (keys) such as geographical coordinates, points, feature vectors, rectangles or bounding boxes. The PH-tree is space partitioning index with a structure similar to that of a quadtree or octree. However, unlike quadtrees, it uses a splitting policy based on tries and similar to Crit bit trees that is based on the bit-representation of the keys. The bit-based splitting policy, when combined with the use of different internal representations for nodes, provides scalability with high-dimensional data. The bit-representation splitting policy also imposes a maximum depth, thus avoiding degenerated trees and the need for rebalancing.

References

  1. 1 2 Straka e, Milan (2013). Functional Data Structures and Algorithms. Prague. pp. 87–90.
  2. Fillâtre, Jean-Christophe; Conchon, Sylvain (2007). A Persistent Union-find Data Structure (PDF). New York, NY, USA: ACM. pp. 37–46. ISBN   978-1-59593-676-9.
  3. Filliâtre, Jean-Christophe. "Persistent-array implementation".
  4. Dietz, Paul F. (1989). "Fully persistent arrays". Proceedings of the Algorithms and Data Structures. pp. 67–74. doi:10.1007/3-540-51542-9_8.

.