Comparison of data structures

Last updated

This is a comparison of the performance of notable data structures, as measured by the complexity of their logical operations. For a more comprehensive listing of data structures, see List of data structures.

Contents

The comparisons in this article are organized by abstract data type. As a single concrete data structure may be used to implement many abstract data types, some data structures may appear in multiple comparisons (for example, a hash map can be used to implement an associative array or a set).

Lists

A list or sequence is an abstract data type that represents a finite number of ordered values, where the same value may occur more than once. Lists generally support the following operations:

Comparison of list data structures
Peek
(index)
Mutate (insert or delete) at …Excess space,
average
BeginningEndMiddle
Linked list Θ(n)Θ(1)Θ(1), known end element;
Θ(n), unknown end element
Θ(n)Θ(n)
Array Θ(1)0
Dynamic array Θ(1)Θ(n)Θ(1) amortized Θ(n)Θ(n) [1]
Balanced tree Θ(log n)Θ(log n)Θ(log n)Θ(log n)Θ(n)
Random-access listΘ(log n) [2] Θ(1) [2] [2] Θ(n)
Hashed array tree Θ(1)Θ(n)Θ(1) amortized Θ(n)Θ(√n)

Maps

Maps store a collection of (key, value) pairs, such that each possible key appears at most once in the collection. They generally support three operations: [3]

Unless otherwise noted, all data structures in this table require O(n) space.

Data structureLookup, removalInsertionOrdered
averageworst caseaverageworst case
Association list O(n)O(n)O(1)O(1)No
B-tree [4] O(log n)O(log n)O(log n)O(log n)Yes
Hash table O(1)O(n)O(1)O(n)No
Unbalanced binary search tree O(log n)O(n)O(log n)O(n)Yes

Integer keys

Some map data structures offer superior performance in the case of integer keys. In the following table, let m be the number of bits in the keys.

Data structureLookup, removalInsertionSpace
averageworst caseaverageworst case
Fusion tree [ ? ]O(log mn)[ ? ][ ? ]O(n)
Van Emde Boas tree O(log log m)O(log log m)O(log log m)O(log log m)O(m)
X-fast trie O(n log m) [a] [ ? ]O(log log m)O(log log m)O(n log m)
Y-fast trie O(log log m) [a] [ ? ]O(log log m) [a] [ ? ]O(n)

Priority queues

A priority queue is an abstract data-type similar to a regular queue or stack. Each element in a priority queue has an associated priority. In a priority queue, elements with high priority are served before elements with low priority. Priority queues support the following operations:

Priority queues are frequently implemented using heaps.

Heaps

A (max) heap is a tree-based data structure which satisfies the heap property: for any given node C, if P is a parent node of C, then the key (the value) of P is greater than or equal to the key of C.

In addition to the operations of an abstract priority queue, the following table lists the complexity of two additional logical operations:

Here are time complexities [5] of various heap data structures. The abbreviation am. indicates that the given complexity is amortized, otherwise it is a worst-case complexity. For the meaning of "O(f)" and "Θ(f)" see Big O notation. Names of operations assume a max-heap.

Operationfind-maxdelete-maxincrease-keyinsertmeldmake-heap [b]
Binary [5] Θ(1)Θ(log n)Θ(log n)Θ(log n)Θ(n)Θ(n)
Skew [6] Θ(1)O(log n) am.O(log n) am.O(log n) am.O(log n) am.Θ(n) am.
Leftist [7] Θ(1)Θ(log n)Θ(log n)Θ(log n)Θ(log n)Θ(n)
Binomial [5] [9] Θ(1)Θ(log n)Θ(log n)Θ(1) am.Θ(log n) [c] Θ(n)
Skew binomial [10] Θ(1)Θ(log n)Θ(log n)Θ(1)Θ(log n) [c] Θ(n)
2–3 heap [12] Θ(1)O(log n) am.Θ(1)Θ(1) am.O(log n) [c] Θ(n)
Bottom-up skew [6] Θ(1)O(log n) am.O(log n) am.Θ(1) am.Θ(1) am.Θ(n) am.
Pairing [13] Θ(1)O(log n) am.o(log n) am. [d] Θ(1)Θ(1)Θ(n)
Rank-pairing [16] Θ(1)O(log n) am.Θ(1) am.Θ(1)Θ(1)Θ(n)
Fibonacci [5] [17] Θ(1)O(log n) am.Θ(1) am.Θ(1)Θ(1)Θ(n)
Strict Fibonacci [18] [e] Θ(1)Θ(log n)Θ(1)Θ(1)Θ(1)Θ(n)
Brodal [19] [e] Θ(1)Θ(log n)Θ(1)Θ(1)Θ(1)Θ(n) [20]
  1. 1 2 3 Amortized time.
  2. make-heap is the operation of building a heap from a sequence of n unsorted elements. It can be done in Θ(n) time whenever meld runs in O(log n) time (where both complexities can be amortized). [6] [7] Another algorithm achieves Θ(n) for binary heaps. [8]
  3. 1 2 3 For persistent heaps (not supporting increase-key), a generic transformation reduces the cost of meld to that of insert, while the new cost of delete-max is the sum of the old costs of delete-max and meld. [11] Here, it makes meld run in Θ(1) time (amortized, if the cost of insert is) while delete-max still runs in O(log n). Applied to skew binomial heaps, it yields Brodal-Okasaki queues, persistent heaps with optimal worst-case complexities. [10]
  4. Lower bound of [14] upper bound of [15]
  5. 1 2 Brodal queues and strict Fibonacci heaps achieve optimal worst-case complexities for heaps. They were first described as imperative data structures. The Brodal-Okasaki queue is a persistent data structure achieving the same optimum, except that increase-key is not supported.

Notes

  1. Brodnik, Andrej; Carlsson, Svante; Sedgewick, Robert; Munro, JI; Demaine, ED (1999), Resizable Arrays in Optimal Time and Space (Technical Report CS-99-09) (PDF), Department of Computer Science, University of Waterloo
  2. 1 2 3 Chris Okasaki (1995). "Purely Functional Random-Access Lists". Proceedings of the Seventh International Conference on Functional Programming Languages and Computer Architecture: 86–95. doi:10.1145/224164.224187.
  3. Mehlhorn, Kurt; Sanders, Peter (2008), "4 Hash Tables and Associative Arrays", Algorithms and Data Structures: The Basic Toolbox (PDF), Springer, pp. 81–98, archived (PDF) from the original on 2014-08-02
  4. Cormen et al. 2022, p. 484.
  5. 1 2 3 4 Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L. (1990). Introduction to Algorithms (1st ed.). MIT Press and McGraw-Hill. ISBN   0-262-03141-8.
  6. 1 2 3 Sleator, Daniel Dominic; Tarjan, Robert Endre (February 1986). "Self-Adjusting Heaps". SIAM Journal on Computing . 15 (1): 52–69. CiteSeerX   10.1.1.93.6678 . doi:10.1137/0215004. ISSN   0097-5397.
  7. 1 2 Tarjan, Robert (1983). "3.3. Leftist heaps". Data Structures and Network Algorithms. pp. 38–42. doi:10.1137/1.9781611970265. ISBN   978-0-89871-187-5.
  8. Hayward, Ryan; McDiarmid, Colin (1991). "Average Case Analysis of Heap Building by Repeated Insertion" (PDF). J. Algorithms. 12: 126–153. CiteSeerX   10.1.1.353.7888 . doi:10.1016/0196-6774(91)90027-v. Archived from the original (PDF) on 2016-02-05. Retrieved 2016-01-28.
  9. "Binomial Heap | Brilliant Math & Science Wiki". brilliant.org. Retrieved 2019-09-30.
  10. 1 2 Brodal, Gerth Stølting; Okasaki, Chris (November 1996), "Optimal purely functional priority queues", Journal of Functional Programming, 6 (6): 839–857, doi: 10.1017/s095679680000201x
  11. Okasaki, Chris (1998). "10.2. Structural Abstraction". Purely Functional Data Structures (1st ed.). pp. 158–162. ISBN   9780521631242.
  12. Takaoka, Tadao (1999), Theory of 2–3 Heaps (PDF), p. 12
  13. Iacono, John (2000), "Improved upper bounds for pairing heaps", Proc. 7th Scandinavian Workshop on Algorithm Theory (PDF), Lecture Notes in Computer Science, vol. 1851, Springer-Verlag, pp. 63–77, arXiv: 1110.4428 , CiteSeerX   10.1.1.748.7812 , doi:10.1007/3-540-44985-X_5, ISBN   3-540-67690-2
  14. Fredman, Michael Lawrence (July 1999). "On the Efficiency of Pairing Heaps and Related Data Structures" (PDF). Journal of the Association for Computing Machinery . 46 (4): 473–501. doi:10.1145/320211.320214.
  15. Pettie, Seth (2005). Towards a Final Analysis of Pairing Heaps (PDF). FOCS '05 Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science. pp. 174–183. CiteSeerX   10.1.1.549.471 . doi:10.1109/SFCS.2005.75. ISBN   0-7695-2468-0.
  16. Haeupler, Bernhard; Sen, Siddhartha; Tarjan, Robert E. (November 2011). "Rank-pairing heaps" (PDF). SIAM J. Computing. 40 (6): 1463–1485. doi:10.1137/100785351.
  17. Fredman, Michael Lawrence; Tarjan, Robert E. (July 1987). "Fibonacci heaps and their uses in improved network optimization algorithms" (PDF). Journal of the Association for Computing Machinery . 34 (3): 596–615. CiteSeerX   10.1.1.309.8927 . doi:10.1145/28869.28874.
  18. Brodal, Gerth Stølting; Lagogiannis, George; Tarjan, Robert E. (2012). Strict Fibonacci heaps (PDF). Proceedings of the 44th symposium on Theory of Computing - STOC '12. pp. 1177–1184. CiteSeerX   10.1.1.233.1740 . doi:10.1145/2213977.2214082. ISBN   978-1-4503-1245-5.
  19. Brodal, Gerth S. (1996), "Worst-Case Efficient Priority Queues" (PDF), Proc. 7th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 52–58
  20. Goodrich, Michael T.; Tamassia, Roberto (2004). "7.3.6. Bottom-Up Heap Construction". Data Structures and Algorithms in Java (3rd ed.). pp. 338–341. ISBN   0-471-46983-1.

Related Research Articles

<span class="mw-page-title-main">Binary search tree</span> Rooted binary tree data structure

In computer science, a binary search tree (BST), also called an ordered or sorted binary tree, is a rooted binary tree data structure with the key of each internal node being greater than all the keys in the respective node's left subtree and less than the ones in its right subtree. The time complexity of operations on the binary search tree is linear with respect to the height of the tree.

In computer science, a double-ended queue is an abstract data type that generalizes a queue, for which elements can be added to or removed from either the front (head) or back (tail). It is also often called a head-tail linked list, though properly this refers to a specific data structure implementation of a deque.

<span class="mw-page-title-main">Heap (data structure)</span> Computer science data structure

In computer science, a heap is a tree-based data structure that satisfies the heap property: In a max heap, for any given node C, if P is the parent node of C, then the key of P is greater than or equal to the key of C. In a min heap, the key of P is less than or equal to the key of C. The node at the "top" of the heap is called the root node.

In computer science, a priority queue is an abstract data type similar to a regular queue or stack abstract data type. Each element in a priority queue has an associated priority. In a priority queue, elements with high priority are served before elements with low priority. In some implementations, if two elements have the same priority, they are served in the same order in which they were enqueued. In other implementations, the order of elements with the same priority is undefined.

A splay tree is a binary search tree with the additional property that recently accessed elements are quick to access again. Like self-balancing binary search trees, a splay tree performs basic operations such as insertion, look-up and removal in O(log n) amortized time. For random access patterns drawn from a non-uniform random distribution, their amortized time can be faster than logarithmic, proportional to the entropy of the access pattern. For many patterns of non-random operations, also, splay trees can take better than logarithmic time, without requiring advance knowledge of the pattern. According to the unproven dynamic optimality conjecture, their performance on all access patterns is within a constant factor of the best possible performance that could be achieved by any other self-adjusting binary search tree, even one selected to fit that pattern. The splay tree was invented by Daniel Sleator and Robert Tarjan in 1985.

<span class="mw-page-title-main">Dijkstra's algorithm</span> Algorithm for finding shortest paths

Dijkstra's algorithm is an algorithm for finding the shortest paths between nodes in a weighted graph, which may represent, for example, a road network. It was conceived by computer scientist Edsger W. Dijkstra in 1956 and published three years later.

<span class="mw-page-title-main">Binary heap</span> Variant of heap data structure

A binary heap is a heap data structure that takes the form of a binary tree. Binary heaps are a common way of implementing priority queues. The binary heap was introduced by J. W. J. Williams in 1964 as a data structure for implementing heapsort.

In computer science, a binomial heap is a data structure that acts as a priority queue. It is an example of a mergeable heap, as it supports merging two heaps in logarithmic time. It is implemented as a heap similar to a binary heap but using a special tree structure that is different from the complete binary trees used by binary heaps. Binomial heaps were invented in 1978 by Jean Vuillemin.

In computer science, a Fibonacci heap is a data structure for priority queue operations, consisting of a collection of heap-ordered trees. It has a better amortized running time than many other priority queue data structures including the binary heap and binomial heap. Michael L. Fredman and Robert E. Tarjan developed Fibonacci heaps in 1984 and published them in a scientific journal in 1987. Fibonacci heaps are named after the Fibonacci numbers, which are used in their running time analysis.

In computer science, a soft heap is a variant on the simple heap data structure that has constant amortized time complexity for 5 types of operations. This is achieved by carefully "corrupting" (increasing) the keys of at most a constant number of values in the heap.

In computer science, a leftist tree or leftist heap is a priority queue implemented with a variant of a binary heap. Every node x has an s-value which is the distance to the nearest leaf in subtree rooted at x. In contrast to a binary heap, a leftist tree attempts to be very unbalanced. In addition to the heap property, leftist trees are maintained so the right descendant of each node has the lower s-value.

A pairing heap is a type of heap data structure with relatively simple implementation and excellent practical amortized performance, introduced by Michael Fredman, Robert Sedgewick, Daniel Sleator, and Robert Tarjan in 1986. Pairing heaps are heap-ordered multiway tree structures, and can be considered simplified Fibonacci heaps. They are considered a "robust choice" for implementing such algorithms as Prim's MST algorithm, and support the following operations :

The d-ary heap or d-heap is a priority queue data structure, a generalization of the binary heap in which the nodes have d children instead of 2. Thus, a binary heap is a 2-heap, and a ternary heap is a 3-heap. According to Tarjan and Jensen et al., d-ary heaps were invented by Donald B. Johnson in 1975.

In computer science, a purely functional data structure is a data structure that can be directly implemented in a purely functional language. The main difference between an arbitrary data structure and a purely functional one is that the latter is (strongly) immutable. This restriction ensures the data structure possesses the advantages of immutable objects: (full) persistency, quick copy of objects, and thread safety. Efficient purely functional data structures may require the use of lazy evaluation and memoization.

<span class="mw-page-title-main">Cartesian tree</span> Binary tree derived from a sequence of numbers

In computer science, a Cartesian tree is a binary tree derived from a sequence of distinct numbers. To construct the Cartesian tree, set its root to be the minimum number in the sequence, and recursively construct its left and right subtrees from the subsequences before and after this number. It is uniquely defined as a min-heap whose symmetric (in-order) traversal returns the original sequence.

In computer science, a double-ended priority queue (DEPQ) or double-ended heap is a data structure similar to a priority queue or heap, but allows for efficient removal of both the maximum and minimum, according to some ordering on the keys (items) stored in the structure. Every element in a DEPQ has a priority or value. In a DEPQ, it is possible to remove the elements in both ascending as well as descending order.

In computer science, a skew binomial heap is a data structure for priority queue operations. It is a variant of the binomial heap that supports constant-time insertion operations in the worst case, rather than amortized time.

In computer science, the Brodal queue is a heap/priority queue structure with very low worst case time bounds: for insertion, find-minimum, meld and decrease-key and for delete-minimum and general deletion. They are the first heap variant to achieve these bounds without resorting to amortization of operational costs. Brodal queues are named after their inventor Gerth Stølting Brodal.

A Kinetic Priority Queue is an abstract kinetic data structure. It is a variant of a priority queue designed to maintain the maximum priority element when the priority of every element is changing as a continuous function of time. Kinetic priority queues have been used as components of several kinetic data structures, as well as to solve some important non-kinetic problems such as the k-set problem and the connected red blue segments intersection problem.

In computer science, a strict Fibonacci heap is a priority queue data structure with low worst case time bounds. It matches the amortized time bounds of the Fibonacci heap in the worst case. To achieve these time bounds, strict Fibonacci heaps maintain several invariants by performing restoring transformations after every operation. These transformations can be done in constant time by using auxiliary data structures to track invariant violations, and the pigeonhole principle guarantees that these can be fixed. Strict Fibonacci heaps were invented in 2012 by Gerth S. Brodal, George Lagogiannis, and Robert E. Tarjan.

References