Comparison of data structures

Last updated

This is a comparison of the performance of notable data structures, as measured by the complexity of their logical operations. For a more comprehensive listing of data structures, see List of data structures.

Contents

The comparisons in this article are organized by abstract data type. As a single concrete data structure may be used to implement many abstract data types, some data structures may appear in multiple comparisons (for example, a hash map can be used to implement an associative array or a set).

Lists

A list or sequence is an abstract data type that represents a finite number of ordered values, where the same value may occur more than once. Lists generally support the following operations:

Comparison of list data structures
Peek
(index)
Mutate (insert or delete) at …Excess space,
average
BeginningEndMiddle
Linked list Θ(n)Θ(1)Θ(1), known end element;
Θ(n), unknown end element
Peek time +
Θ(1) [1] [2]
Θ(n)
Array Θ(1)0
Dynamic array Θ(1)Θ(n)Θ(1) amortized Θ(n)Θ(n) [3]
Balanced tree Θ(log n)Θ(log n)Θ(log n)Θ(log n)Θ(n)
Random-access listΘ(log n) [4] Θ(1) [4] [4] Θ(n)
Hashed array tree Θ(1)Θ(n)Θ(1) amortized Θ(n)Θ(√n)

Maps

Maps store a collection of (key, value) pairs, such that each possible key appears at most once in the collection. They generally support three operations: [5]

Unless otherwise noted, all data structures in this table require O(n) space.

Data structureLookup, removalInsertionOrdered
averageworst caseaverageworst case
Association list O(n)O(n)O(1)O(1)No
B-tree [6] O(log n)O(log n)O(log n)O(log n)Yes
Hash table O(1)O(n)O(1)O(n)No
Unbalanced binary search tree O(log n)O(n)O(log n)O(n)Yes

Integer keys

Some map data structures offer superior performance in the case of integer keys. In the following table, let m be the number of bits in the keys.

Data structureLookup, removalInsertionSpace
averageworst caseaverageworst case
Fusion tree [ ? ]O(log mn)[ ? ][ ? ]O(n)
Van Emde Boas tree O(log log m)O(log log m)O(log log m)O(log log m)O(m)
X-fast trie O(n log m) [lower-alpha 1] [ ? ]O(log log m)O(log log m)O(n log m)
Y-fast trie O(log log m) [lower-alpha 1] [ ? ]O(log log m) [lower-alpha 1] [ ? ]O(n)

Priority queues

A priority queue is an abstract data-type similar to a regular queue or stack. Each element in a priority queue has an associated priority. In a priority queue, elements with high priority are served before elements with low priority. Priority queues support the following operations:

Priority queues are frequently implemented using heaps.

Heaps

A (max) heap is a tree-based data structure which satisfies the heap property: for any given node C, if P is a parent node of C, then the key (the value) of P is greater than or equal to the key of C.

In addition to the operations of an abstract priority queue, the following table lists the complexity of two additional logical operations:

Here are time complexities [7] of various heap data structures. Function names assume a max-heap. For the meaning of "O(f)" and "Θ(f)" see Big O notation.

Operationfind-maxdelete-maxinsertincrease-keymeld
Binary [7] Θ(1)Θ(log n)O(log n)O(log n)Θ(n)
Leftist Θ(1)Θ(log n)Θ(log n)O(log n)Θ(log n)
Binomial [7] [8] Θ(1)Θ(log n)Θ(1) [lower-alpha 1] Θ(log n)O(log n) [lower-alpha 2]
Fibonacci [7] [9] Θ(1)O(log n) [lower-alpha 1] Θ(1)Θ(1) [lower-alpha 1] Θ(1)
Pairing [10] Θ(1)O(log n) [lower-alpha 1] Θ(1)o(log n) [lower-alpha 1] [lower-alpha 3] Θ(1)
Brodal [13] [lower-alpha 4] Θ(1)O(log n)Θ(1)Θ(1)Θ(1)
Rank-pairing [15] Θ(1)O(log n) [lower-alpha 1] Θ(1)Θ(1) [lower-alpha 1] Θ(1)
Strict Fibonacci [16] Θ(1)O(log n)Θ(1)Θ(1)Θ(1)
2–3 heap [17] O(log n)O(log n) [lower-alpha 1] O(log n) [lower-alpha 1] Θ(1)?
  1. 1 2 3 4 5 6 7 8 9 10 11 12 Amortized time.
  2. n is the size of the larger heap.
  3. Lower bound of [11] upper bound of [12]
  4. Brodal and Okasaki later describe a persistent variant with the same bounds except for decrease-key, which is not supported. Heaps with n elements can be constructed bottom-up in O(n). [14]

Notes

  1. Day 1 Keynote - Bjarne Stroustrup: C++11 Style at GoingNative 2012 on channel9.msdn.com from minute 45 or foil 44
  2. Number crunching: Why you should never, ever, EVER use linked-list in your code again at kjellkod.wordpress.com
  3. Brodnik, Andrej; Carlsson, Svante; Sedgewick, Robert; Munro, JI; Demaine, ED (1999), Resizable Arrays in Optimal Time and Space (Technical Report CS-99-09) (PDF), Department of Computer Science, University of Waterloo
  4. 1 2 3 Chris Okasaki (1995). "Purely Functional Random-Access Lists". Proceedings of the Seventh International Conference on Functional Programming Languages and Computer Architecture: 86–95. doi:10.1145/224164.224187.
  5. Mehlhorn, Kurt; Sanders, Peter (2008), "4 Hash Tables and Associative Arrays", Algorithms and Data Structures: The Basic Toolbox (PDF), Springer, pp. 81–98, archived (PDF) from the original on 2014-08-02
  6. Cormen et al. 2022, p. 484.
  7. 1 2 3 4 Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L. (1990). Introduction to Algorithms (1st ed.). MIT Press and McGraw-Hill. ISBN   0-262-03141-8.
  8. "Binomial Heap | Brilliant Math & Science Wiki". brilliant.org. Retrieved 2019-09-30.
  9. Fredman, Michael Lawrence; Tarjan, Robert E. (July 1987). "Fibonacci heaps and their uses in improved network optimization algorithms" (PDF). Journal of the Association for Computing Machinery . 34 (3): 596–615. CiteSeerX   10.1.1.309.8927 . doi:10.1145/28869.28874.
  10. Iacono, John (2000), "Improved upper bounds for pairing heaps", Proc. 7th Scandinavian Workshop on Algorithm Theory (PDF), Lecture Notes in Computer Science, vol. 1851, Springer-Verlag, pp. 63–77, arXiv: 1110.4428 , CiteSeerX   10.1.1.748.7812 , doi:10.1007/3-540-44985-X_5, ISBN   3-540-67690-2
  11. Fredman, Michael Lawrence (July 1999). "On the Efficiency of Pairing Heaps and Related Data Structures" (PDF). Journal of the Association for Computing Machinery . 46 (4): 473–501. doi:10.1145/320211.320214.
  12. Pettie, Seth (2005). Towards a Final Analysis of Pairing Heaps (PDF). FOCS '05 Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science. pp. 174–183. CiteSeerX   10.1.1.549.471 . doi:10.1109/SFCS.2005.75. ISBN   0-7695-2468-0.
  13. Brodal, Gerth S. (1996), "Worst-Case Efficient Priority Queues" (PDF), Proc. 7th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 52–58
  14. Goodrich, Michael T.; Tamassia, Roberto (2004). "7.3.6. Bottom-Up Heap Construction". Data Structures and Algorithms in Java (3rd ed.). pp. 338–341. ISBN   0-471-46983-1.
  15. Haeupler, Bernhard; Sen, Siddhartha; Tarjan, Robert E. (November 2011). "Rank-pairing heaps" (PDF). SIAM J. Computing. 40 (6): 1463–1485. doi:10.1137/100785351.
  16. Brodal, Gerth Stølting; Lagogiannis, George; Tarjan, Robert E. (2012). Strict Fibonacci heaps (PDF). Proceedings of the 44th symposium on Theory of Computing - STOC '12. pp. 1177–1184. CiteSeerX   10.1.1.233.1740 . doi:10.1145/2213977.2214082. ISBN   978-1-4503-1245-5.
  17. Takaoka, Tadao (1999), Theory of 2–3 Heaps (PDF), p. 12

Related Research Articles

<span class="mw-page-title-main">Binary search tree</span> Rooted binary tree data structure

In computer science, a binary search tree (BST), also called an ordered or sorted binary tree, is a rooted binary tree data structure with the key of each internal node being greater than all the keys in the respective node's left subtree and less than the ones in its right subtree. The time complexity of operations on the binary search tree is directly proportional to the height of the tree.

In computer science, a double-ended queue is an abstract data type that generalizes a queue, for which elements can be added to or removed from either the front (head) or back (tail). It is also often called a head-tail linked list, though properly this refers to a specific data structure implementation of a deque.

<span class="mw-page-title-main">Heapsort</span> A sorting algorithm which uses the heap data structure

In computer science, heapsort is a comparison-based sorting algorithm. Heapsort can be thought of as an improved selection sort: like selection sort, heapsort divides its input into a sorted and an unsorted region, and it iteratively shrinks the unsorted region by extracting the largest element from it and inserting it into the sorted region. Unlike selection sort, heapsort does not waste time with a linear-time scan of the unsorted region; rather, heap sort maintains the unsorted region in a heap data structure to more quickly find the largest element in each step.

<span class="mw-page-title-main">Heap (data structure)</span> Computer science data structure

In computer science, a heap is a specialized tree-based data structure which is essentially an almost complete tree that satisfies the heap property: in a max heap, for any given node C, if P is a parent node of C, then the key of P is greater than or equal to the key of C. In a min heap, the key of P is less than or equal to the key of C. The node at the "top" of the heap is called the root node.

In computer science, a priority queue is an abstract data-type similar to a regular queue or stack data structure. Each element in a priority queue has an associated priority. In a priority queue, elements with high priority are served before elements with low priority. In some implementations, if two elements have the same priority, they are served in the same order that they were enqueued in. In other implementations, the order of elements with the same priority is undefined.

<span class="mw-page-title-main">Dijkstra's algorithm</span> Graph search algorithm

Dijkstra's algorithm is an algorithm for finding the shortest paths between nodes in a weighted graph, which may represent, for example, road networks. It was conceived by computer scientist Edsger W. Dijkstra in 1956 and published three years later.

<span class="mw-page-title-main">Binary heap</span> Variant of heap data structure

A binary heap is a heap data structure that takes the form of a binary tree. Binary heaps are a common way of implementing priority queues. The binary heap was introduced by J. W. J. Williams in 1964, as a data structure for heapsort.

In computer science, a binomial heap is a data structure that acts as a priority queue but also allows pairs of heaps to be merged. It is important as an implementation of the mergeable heap abstract data type, which is a priority queue supporting merge operation. It is implemented as a heap similar to a binary heap but using a special tree structure that is different from the complete binary trees used by binary heaps. Binomial heaps were invented in 1978 by Jean Vuillemin.

In computer science, a Fibonacci heap is a data structure for priority queue operations, consisting of a collection of heap-ordered trees. It has a better amortized running time than many other priority queue data structures including the binary heap and binomial heap. Michael L. Fredman and Robert E. Tarjan developed Fibonacci heaps in 1984 and published them in a scientific journal in 1987. Fibonacci heaps are named after the Fibonacci numbers, which are used in their running time analysis.

In computer science, a soft heap is a variant on the simple heap data structure that has constant amortized time complexity for 5 types of operations. This is achieved by carefully "corrupting" (increasing) the keys of at most a constant number of values in the heap. The constant time operations are:

A pairing heap is a type of heap data structure with relatively simple implementation and excellent practical amortized performance, introduced by Michael Fredman, Robert Sedgewick, Daniel Sleator, and Robert Tarjan in 1986. Pairing heaps are heap-ordered multiway tree structures, and can be considered simplified Fibonacci heaps. They are considered a "robust choice" for implementing such algorithms as Prim's MST algorithm, and support the following operations :

In computer science, a purely functional data structure is a data structure that can be implemented in a purely functional language. The main difference between an arbitrary data structure and a purely functional one is that the latter is (strongly) immutable. This restriction ensures the data structure possesses the advantages of immutable objects: (full) persistency, quick copy of objects, and thread safety. Efficient purely functional data structures may require the use of lazy evaluation and memoization.

<span class="mw-page-title-main">Cartesian tree</span> Binary tree derived from a sequence of numbers

In computer science, a Cartesian tree is a binary tree derived from a sequence of numbers. The smallest number in the sequence is at the root of the tree; its left and right subtrees are constructed recursively from the subsequences to the left and right of this number. When all numbers are distinct, the Cartesian tree is uniquely defined from the properties that it is heap-ordered and that a symmetric (in-order) traversal of the tree returns the original sequence.

In computer science, a min-max heap is a complete binary tree data structure which combines the usefulness of both a min-heap and a max-heap, that is, it provides constant time retrieval and logarithmic time removal of both the minimum and maximum elements in it. This makes the min-max heap a very useful data structure to implement a double-ended priority queue. Like binary min-heaps and max-heaps, min-max heaps support logarithmic insertion and deletion and can be built in linear time. Min-max heaps are often represented implicitly in an array; hence it's referred to as an implicit data structure.

In computer science, integer sorting is the algorithmic problem of sorting a collection of data values by integer keys. Algorithms designed for integer sorting may also often be applied to sorting problems in which the keys are floating point numbers, rational numbers, or text strings. The ability to perform integer arithmetic on the keys allows integer sorting algorithms to be faster than comparison sorting algorithms in many cases, depending on the details of which operations are allowed in the model of computing and how large the integers to be sorted are.

In computer science, the Brodal queue is a heap/priority queue structure with very low worst case time bounds: for insertion, find-minimum, meld and decrease-key and for delete-minimum and general deletion. They are the first heap variant to achieve these bounds without resorting to amortization of operational costs. Brodal queues are named after their inventor Gerth Stølting Brodal.

<span class="mw-page-title-main">Kinetic heap</span>

A Kinetic Heap is a kinetic data structure, obtained by the kinetization of a heap. It is designed to store elements where the priority is changing as a continuous function of time. As a type of kinetic priority queue, it maintains the maximum priority element stored in it. The kinetic heap data structure works by storing the elements as a tree that satisfies the following heap property – if B is a child node of A, then the priority of the element in A must be higher than the priority of the element in B. This heap property is enforced using certificates along every edge so, like other kinetic data structures, a kinetic heap also contains a priority queue to maintain certificate failure times.

A Kinetic Priority Queue is an abstract kinetic data structure. It is a variant of a priority queue designed to maintain the maximum priority element when the priority of every element is changing as a continuous function of time. Kinetic priority queues have been used as components of several kinetic data structures, as well as to solve some important non-kinetic problems such as the k-set problem and the connected red blue segments intersection problem.

In computer science, a weak heap is a data structure for priority queues, combining features of the binary heap and binomial heap. It can be stored in an array as an implicit binary tree like a binary heap, and has the efficiency guarantees of binomial heaps.

<span class="mw-page-title-main">Bucket queue</span> Data structure for integer priorities

A bucket queue is a data structure that implements the priority queue abstract data type: it maintains a dynamic collection of elements with numerical priorities and allows quick access to the element with minimum priority. In the bucket queue, the priorities must be integers, and it is particularly suited to applications in which the priorities have a small range. A bucket queue has the form of an array of buckets: an array data structure, indexed by the priorities, whose cells contain collections of items with the same priority as each other. With this data structure, insertion of elements and changes of their priority take constant time. Searching for and removing the minimum-priority element takes time proportional to the number of buckets or, by maintaining a pointer to the most recently found bucket, in time proportional to the difference in priorities between successive operations.

References