Finger tree

Last updated

In computer science, a finger tree is a purely functional data structure that can be used to efficiently implement other functional data structures. A finger tree gives amortized constant time access to the "fingers" (leaves) of the tree, which is where data is stored, and concatenation and splitting logarithmic time in the size of the smaller piece. It also stores in each internal node the result of applying some associative operation to its descendants. This "summary" data stored in the internal nodes can be used to provide the functionality of data structures other than trees.

Contents

Overview

Finger tree used as a simple queue with amortized O(1) put & get operations. Integers 1 to 21 are inserted to the right & extracted from the left. Square blocks represent values, "Digit" (sky blue) can have 1-4 children, "Node" (dark blue) can have 2-3 children, white circle is for "Empty", red node represents "Single" value & green nodes represent "Deep" values. Note that for each step we take down the spine, single values & digit children get nested with a new level of nodes. Finger-tree-queue.gif
Finger tree used as a simple queue with amortized O(1) put & get operations. Integers 1 to 21 are inserted to the right & extracted from the left. Square blocks represent values, "Digit" (sky blue) can have 1–4 children, "Node" (dark blue) can have 2–3 children, white circle is for "Empty", red node represents "Single" value & green nodes represent "Deep" values. Note that for each step we take down the spine, single values & digit children get nested with a new level of nodes.

Ralf Hinze and Ross Paterson state a finger tree is a functional representation of persistent sequences that can access the ends in amortized constant time. Concatenation and splitting can be done in logarithmic time in the size of the smaller piece. The structure can also be made into a general purpose data structure by defining the split operation in a general form, allowing it to act as a sequence, priority queue, search tree, or priority search queue, among other varieties of abstract data types. [1]

A finger is a point where one can access part of a data structure; in imperative languages, this is called a pointer. [2] In a finger tree, the fingers are structures that point to the ends of a sequence, or the leaf nodes. The fingers are added on to the original tree to allow for constant time access to fingers. In the images shown below, the fingers are the lines reaching out of the spine to the nodes.

A finger tree is composed of different layers which can be identified by the nodes along its spine. The spine of a tree can be thought of as the trunk in the same way trees have leaves and a root. Though finger trees are often shown with the spine and branches coming off the sides, there are actually two nodes on the spine at each level that have been paired to make this central spine. The prefix is on the left of the spine, while the suffix is on the right. Each of those nodes has a link to the next level of the spine until they reach the root. [2]

Shows a 2-3 tree (top) can be pulled up into a finger tree (bottom) Finger-tree from 2-3 tree.jpg
Shows a 2–3 tree (top) can be pulled up into a finger tree (bottom)

The first level of the tree contains only values, the leaf nodes of the tree, and is of depth 0. The second level is of depth 1. The third is of depth 2 and so on. The closer to the root, the deeper the subtrees of the original tree (the tree before it was a finger tree) the nodes points to. In this way, working down the tree is going from the leaves to the root of the tree, which is the opposite of the typical tree data structure. To get this nice and unusual structure, we have to make sure the original tree has a uniform depth. To ensure that the depth is uniform, when declaring the node object, it must be parameterized by the type of the child. The nodes on the spine of depth 1 and above point to trees, and with this parameterization they can be represented by the nested nodes. [3]

Transforming a tree into a finger tree

We will start this process with a balanced 2–3 tree. For the finger tree to work, all the leaf nodes need to also be level.

A finger is "a structure providing efficient access to nodes of a tree near a distinguished location." [1] To make a finger tree we need to put fingers to the right and left ends of the tree and transform it like a zipper. This gives us that constant amortized time access to the ends of a sequence.

To transform, start with the balanced 2–3 tree.

Take the leftmost and rightmost internal nodes of the tree and pull them up so the rest of the tree dangles between them as shown in the image to the right.

Combines the spines to make a standard 2–3 finger tree.

This can be described as: [1]

dataFingerTreea=Empty|Singlea|Deep(Digita)(FingerTree(Nodea))(Digita)dataNodea=Node2aa|Node3aaa

The digits in the examples shown are the nodes with letters. Each list is divided by the prefix or suffix of each node on the spine. In a transformed 2–3 tree it seems that the digit lists at the top level can have a length of two or three with the lower levels only having length of one or two. In order for some application of finger trees to run so efficiently, finger trees allow between one and four subtrees on each level.

The digits of the finger tree can be transformed into a list like so: [1]

typeDigita=Onea|Twoaa|Threeaaa|Fouraaaa

And so on the image, the top level has elements of type a, the next has elements of type Node a because the node in between the spine and leaves, and this would go on meaning in general that the nth level of the tree has elements of type a, or 2–3 trees of depth n. This means a sequence of n elements is represented by a tree of depth Θ(log n). Even better, an element d places from the nearest end is stored at a depth of Θ(log d) in the tree. [1]

Reductions



Deque operations

Finger trees also make efficient deques. Whether the structure is persistent or not, all operations take Θ(1) amortized time. The analysis can be compared to Okasaki's implicit deques, the only difference being that the FingerTree type stores Nodes instead of pairs. [1]

Application

Finger trees can be used to build other trees. [4] For example, a priority queue can be implemented by labeling the internal nodes by the minimum priority of its children in the tree, or an indexed list/array can be implemented with a labeling of nodes by the count of the leaves in their children. Other applications are random-access sequences, described below, ordered sequences, and interval trees. [1]

Finger trees can provide amortized O(1) pushing, reversing, popping, O(log n) append and split; and can be adapted to be indexed or ordered sequences. And like all functional data structures, it is inherently persistent; that is, older versions of the tree are always preserved.

Random-access sequences

Finger trees can efficiently implement random-access sequences. This should support fast positional operations including accessing the nth element and splitting a sequence at a certain position. To do this, we annotate the finger tree with sizes. [1]

newtypeSize=Size{getSize::N}deriving(Eq,Ord)instanceMonoidSizewhere=Size0SizemSizen=Size(m+n)

The N is for natural numbers. The new type is needed because the type is the carrier of different monoids. Another new type is still needed for the elements in the sequence, shown below.

newtypeElema=Elem{getElem::a}newtypeSeqa=Seq(FingerTreeSize(Elema))instanceMeasured(Elema)Sizewhere||Elem||=Size1

These lines of code show that instance works a base case for measuring the sizes and the elements are of size one. The use of newtypes doesn't cause a run-time penalty in Haskell because in a library, the Size and Elem types would be hidden from the user with wrapper functions.

With these changes, the length of a sequence can now be computed in constant time.

First publication

Finger trees were first published in 1977 by Leonidas J. Guibas, [5] and periodically refined since (e.g. a version using AVL trees, [6] non-lazy finger trees, simpler 2–3 finger trees shown here, [1] B-Trees and so on)

Implementations

Finger trees have since been used in the Haskell core libraries (in the implementation of Data.Sequence), and an implementation in OCaml exists [7] which was derived from a proven-correct Coq implementation. [8] There is also a verified implementation in Isabelle (proof assistant) from which programs in Haskell and other (functional) languages can be generated. [9] Finger trees can be implemented with or without lazy evaluation, [10] but laziness allows for simpler implementations.

See also

Related Research Articles

<span class="mw-page-title-main">AVL tree</span> Self-balancing binary search tree

In computer science, an AVL tree is a self-balancing binary search tree. In an AVL tree, the heights of the two child subtrees of any node differ by at most one; if at any time they differ by more than one, rebalancing is done to restore this property. Lookup, insertion, and deletion all take O(log n) time in both the average and worst cases, where is the number of nodes in the tree prior to the operation. Insertions and deletions may require the tree to be rebalanced by one or more tree rotations.

A splay tree is a binary search tree with the additional property that recently accessed elements are quick to access again. Like self-balancing binary search trees, a splay tree performs basic operations such as insertion, look-up and removal in O(log n) amortized time. For random access patterns drawn from a non-uniform random distribution, their amortized time can be faster than logarithmic, proportional to the entropy of the access pattern. For many patterns of non-random operations, also, splay trees can take better than logarithmic time, without requiring advance knowledge of the pattern. According to the unproven dynamic optimality conjecture, their performance on all access patterns is within a constant factor of the best possible performance that could be achieved by any other self-adjusting binary search tree, even one selected to fit that pattern. The splay tree was invented by Daniel Sleator and Robert Tarjan in 1985.

A transposition table is a cache of previously seen positions, and associated evaluations, in a game tree generated by a computer game playing program. If a position recurs via a different sequence of moves, the value of the position is retrieved from the table, avoiding re-searching the game tree below that position. Transposition tables are primarily useful in perfect-information games. The usage of transposition tables is essentially memoization applied to the tree search and is a form of dynamic programming.

In computer science, a binomial heap is a data structure that acts as a priority queue but also allows pairs of heaps to be merged. It is important as an implementation of the mergeable heap abstract data type, which is a priority queue supporting merge operation. It is implemented as a heap similar to a binary heap but using a special tree structure that is different from the complete binary trees used by binary heaps. Binomial heaps were invented in 1978 by Jean Vuillemin.

In computer science, a Fibonacci heap is a data structure for priority queue operations, consisting of a collection of heap-ordered trees. It has a better amortized running time than many other priority queue data structures including the binary heap and binomial heap. Michael L. Fredman and Robert E. Tarjan developed Fibonacci heaps in 1984 and published them in a scientific journal in 1987. Fibonacci heaps are named after the Fibonacci numbers, which are used in their running time analysis.

In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually has to be exact: "either it will or will not be a match." The patterns generally have the form of either sequences or tree structures. Uses of pattern matching include outputting the locations of a pattern within a token sequence, to output some component of the matched pattern, and to substitute the matching pattern with some other token sequence.

In computer programming, especially functional programming and type theory, an algebraic data type (ADT) is a kind of composite type, i.e., a type formed by combining other types.

In computing, a persistent data structure or not ephemeral data structure is a data structure that always preserves the previous version of itself when it is modified. Such data structures are effectively immutable, as their operations do not (visibly) update the structure in-place, but instead always yield a new updated structure. The term was introduced in Driscoll, Sarnak, Sleator, and Tarjan's 1986 article.

In computer science, a disjoint-set data structure, also called a union–find data structure or merge–find set, is a data structure that stores a collection of disjoint (non-overlapping) sets. Equivalently, it stores a partition of a set into disjoint subsets. It provides operations for adding new sets, merging sets, and finding a representative member of a set. The last operation makes it possible to find out efficiently if any two elements are in the same or different sets.

In computer science, a scapegoat tree is a self-balancing binary search tree, invented by Arne Andersson in 1989 and again by Igal Galperin and Ronald L. Rivest in 1993. It provides worst-case lookup time and amortized insertion and deletion time.

MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster.

A pairing heap is a type of heap data structure with relatively simple implementation and excellent practical amortized performance, introduced by Michael Fredman, Robert Sedgewick, Daniel Sleator, and Robert Tarjan in 1986. Pairing heaps are heap-ordered multiway tree structures, and can be considered simplified Fibonacci heaps. They are considered a "robust choice" for implementing such algorithms as Prim's MST algorithm, and support the following operations :

In computing, a procedural parameter is a parameter of a procedure that is itself a procedure.

<span class="mw-page-title-main">Recursion (computer science)</span> Use of functions that call themselves

In computer science, recursion is a method of solving a computational problem where the solution depends on solutions to smaller instances of the same problem. Recursion solves such recursive problems by using functions that call themselves from within their own code. The approach can be applied to many types of problems, and recursion is one of the central ideas of computer science.

The power of recursion evidently lies in the possibility of defining an infinite set of objects by a finite statement. In the same manner, an infinite number of computations can be described by a finite recursive program, even if this program contains no explicit repetitions.

In computer science, weight-balanced binary trees (WBTs) are a type of self-balancing binary search trees that can be used to implement dynamic sets, dictionaries (maps) and sequences. These trees were introduced by Nievergelt and Reingold in the 1970s as trees of bounded balance, or BB[α] trees. Their more common name is due to Knuth.

In graph theory and computer science, the lowest common ancestor (LCA) of two nodes v and w in a tree or directed acyclic graph (DAG) T is the lowest node that has both v and w as descendants, where we define each node to be a descendant of itself.

A link/cut tree is a data structure for representing a forest, a set of rooted trees, and offers the following operations:

<span class="mw-page-title-main">Cartesian tree</span> Binary tree derived from a sequence of numbers

In computer science, a Cartesian tree is a binary tree derived from a sequence of distinct numbers. To construct the Cartesian tree, set its root to be the minimum number in the sequence, and recursively construct its left and right subtrees from the subsequences before and after this number. It is uniquely defined as a min-heap whose symmetric (in-order) traversal returns the original sequence.

<span class="mw-page-title-main">Finger search tree</span> Type of binary search tree

In computer science, finger search trees are a type of binary search tree that keeps pointers to interior nodes, called fingers. The fingers speed up searches, insertions, and deletions for elements close to the fingers, giving amortized O(log n) lookups, and amortized O(1) insertions and deletions. It should not be confused with a finger tree nor a splay tree, although both can be used to implement finger search trees.

A conc-tree is a data structure that stores element sequences, and provides amortized O(1) time append and prepend operations, O(log n) time insert and remove operations and O(log n) time concatenation. This data structure is particularly viable for functional task-parallel and data-parallel programming, and is relatively simple to implement compared to other data-structures with similar asymptotic complexity. Conc-trees were designed to improve efficiency of data-parallel operations that do not require sequential left-to-right iteration order, and improve constant factors in these operations by avoiding unnecessary copies of the data. Orthogonally, they are used to efficiently aggregate data in functional-style task-parallel algorithms, as an implementation of the conc-list data abstraction. Conc-list is a parallel programming counterpart to functional cons-lists, and was originally introduced by the Fortress language.

References

  1. 1 2 3 4 5 6 7 8 9 Hinze, Ralf; Paterson, Ross (2006), "Finger Trees: A Simple General-purpose Data Structure" (PDF), Journal of Functional Programming , 16 (2): 197–217, doi:10.1017/S0956796805005769, S2CID   6881581 .
  2. 1 2 Gibiansky, Andrew. "Finger Trees - Andrew Gibiansky". andrew.gibiansky.com. Retrieved 2017-10-26.
  3. "Finger Trees Done Right (I hope)". Good Math, Bad Math. Retrieved 2017-10-26.
  4. Sarkar, Abhiroop. "Finger Tree - The ultimate data structure?". abhiroop.github.io. Archived from the original on 2017-10-26. Retrieved 2017-10-26.
  5. Guibas, L. J.; McCreight, E. M.; Plass, M. F.; Roberts, J. R. (1977), "A new representation for linear lists", Conference Record of the Ninth Annual ACM Symposium on Theory of Computing, pp. 49–60.
  6. Tsakalidis, A. K. (1985), "AVL-trees for localized search", Information and Control, 67 (1–3): 173–194, doi: 10.1016/S0019-9958(85)80034-6 .
  7. Caml Weekly News
  8. Matthieu Sozeau :: Dependent Finger Trees in Coq
  9. Nordhoff, Benedikt; Körner, Stefan; Lammich, Peter. "Finger Trees". Archive of Formal Proofs. Retrieved 26 November 2021.
  10. Kaplan, H.; Tarjan, R. E. (1995), "Persistent lists with catenation via recursive slow-down", Proceedings of the Twenty-Seventh Annual ACM Symposium on the Theory of Computing, pp. 93–102.