Dynamic array

Last updated
Several values are inserted at the end of a dynamic array using geometric expansion. Grey cells indicate space reserved for expansion. Most insertions are fast (constant time), while some are slow due to the need for reallocation (Th(n) time, labelled with turtles). The logical size and capacity of the final array are shown. Dynamic array.svg
Several values are inserted at the end of a dynamic array using geometric expansion. Grey cells indicate space reserved for expansion. Most insertions are fast (constant time), while some are slow due to the need for reallocation (Θ(n) time, labelled with turtles). The logical size and capacity of the final array are shown.

In computer science, a dynamic array, growable array, resizable array, dynamic table, mutable array, or array list is a random access, variable-size list data structure that allows elements to be added or removed. It is supplied with standard libraries in many modern mainstream programming languages. Dynamic arrays overcome a limit of static arrays, which have a fixed capacity that needs to be specified at allocation.

Contents

A dynamic array is not the same thing as a dynamically allocated array or variable-length array, either of which is an array whose size is fixed when the array is allocated, although a dynamic array may use such a fixed-size array as a back end. [1]

Bounded-size dynamic arrays and capacity

A simple dynamic array can be constructed by allocating an array of fixed-size, typically larger than the number of elements immediately required. The elements of the dynamic array are stored contiguously at the start of the underlying array, and the remaining positions towards the end of the underlying array are reserved, or unused. Elements can be added at the end of a dynamic array in constant time by using the reserved space, until this space is completely consumed. When all space is consumed, and an additional element is to be added, then the underlying fixed-size array needs to be increased in size. Typically resizing is expensive because it involves allocating a new underlying array and copying each element from the original array. Elements can be removed from the end of a dynamic array in constant time, as no resizing is required. The number of elements used by the dynamic array contents is its logical size or size, while the size of the underlying array is called the dynamic array's capacity or physical size, which is the maximum possible size without relocating data. [2]

A fixed-size array will suffice in applications where the maximum logical size is fixed (e.g. by specification), or can be calculated before the array is allocated. A dynamic array might be preferred if:

Geometric expansion and amortized cost

To avoid incurring the cost of resizing many times, dynamic arrays resize by a large amount, such as doubling in size, and use the reserved space for future expansion. The operation of adding an element to the end might work as follows:

functioninsertEnd(dynarraya,elemente)if(a.size==a.capacity)// resize a to twice its current capacity:a.capacitya.capacity*2// (copy the contents to the new memory location here)a[a.size]ea.sizea.size+1

As n elements are inserted, the capacities form a geometric progression. Expanding the array by any constant proportion a ensures that inserting n elements takes O(n) time overall, meaning that each insertion takes amortized constant time. Many dynamic arrays also deallocate some of the underlying storage if its size drops below a certain threshold, such as 30% of the capacity. This threshold must be strictly smaller than 1/a in order to provide hysteresis (provide a stable band to avoid repeatedly growing and shrinking) and support mixed sequences of insertions and removals with amortized constant cost.

Dynamic arrays are a common example when teaching amortized analysis. [3] [4]

Growth factor

The growth factor for the dynamic array depends on several factors including a space-time trade-off and algorithms used in the memory allocator itself. For growth factor a, the average time per insertion operation is about a/(a−1), while the number of wasted cells is bounded above by (a−1)n[ citation needed ]. If memory allocator uses a first-fit allocation algorithm, then growth factor values such as a=2 can cause dynamic array expansion to run out of memory even though a significant amount of memory may still be available. [5] There have been various discussions on ideal growth factor values, including proposals for the golden ratio as well as the value 1.5. [6] Many textbooks, however, use a = 2 for simplicity and analysis purposes. [3] [4]

Below are growth factors used by several popular implementations:

ImplementationGrowth factor (a)
Java ArrayList [1] 1.5 (3/2)
Python PyListObject [7] ~1.125 (n + (n >> 3))
Microsoft Visual C++ 2013 [8] 1.5 (3/2)
G++ 5.2.0 [5] 2
Clang 3.6 [5] 2
Facebook folly/FBVector [9] 1.5 (3/2)
Rust Vec [10] 2
Go slices [11] between 1.25 and 2
Nim sequences [12] 2
SBCL (Common Lisp) vectors [13] 2
C# (.NET 8) List2

Performance

Comparison of list data structures
Peek
(index)
Mutate (insert or delete) at …Excess space,
average
BeginningEndMiddle
Linked list Θ(n)Θ(1)Θ(1), known end element;
Θ(n), unknown end element
Θ(n) [14] [15] Θ(n)
Array Θ(1)0
Dynamic array Θ(1)Θ(n)Θ(1) amortized Θ(n)Θ(n) [16]
Balanced tree Θ(log n)Θ(log n)Θ(log n)Θ(log n)Θ(n)
Random-access listΘ(log n) [17] Θ(1) [17] [17] Θ(n)
Hashed array tree Θ(1)Θ(n)Θ(1) amortized Θ(n)Θ(√n)

The dynamic array has performance similar to an array, with the addition of new operations to add and remove elements:

Dynamic arrays benefit from many of the advantages of arrays, including good locality of reference and data cache utilization, compactness (low memory use), and random access. They usually have only a small fixed additional overhead for storing information about the size and capacity. This makes dynamic arrays an attractive tool for building cache-friendly data structures. However, in languages like Python or Java that enforce reference semantics, the dynamic array generally will not store the actual data, but rather it will store references to the data that resides in other areas of memory. In this case, accessing items in the array sequentially will actually involve accessing multiple non-contiguous areas of memory, so the many advantages of the cache-friendliness of this data structure are lost.

Compared to linked lists, dynamic arrays have faster indexing (constant time versus linear time) and typically faster iteration due to improved locality of reference; however, dynamic arrays require linear time to insert or delete at an arbitrary location, since all following elements must be moved, while linked lists can do this in constant time. This disadvantage is mitigated by the gap buffer and tiered vector variants discussed under Variants below. Also, in a highly fragmented memory region, it may be expensive or impossible to find contiguous space for a large dynamic array, whereas linked lists do not require the whole data structure to be stored contiguously.

A balanced tree can store a list while providing all operations of both dynamic arrays and linked lists reasonably efficiently, but both insertion at the end and iteration over the list are slower than for a dynamic array, in theory and in practice, due to non-contiguous storage and tree traversal/manipulation overhead.

Variants

Gap buffers are similar to dynamic arrays but allow efficient insertion and deletion operations clustered near the same arbitrary location. Some deque implementations use array deques, which allow amortized constant time insertion/removal at both ends, instead of just one end.

Goodrich [18] presented a dynamic array algorithm called tiered vectors that provides O(n1/k) performance for insertions and deletions from anywhere in the array, and O(k) get and set, where k ≥ 2 is a constant parameter.

Hashed array tree (HAT) is a dynamic array algorithm published by Sitarski in 1996. [19] Hashed array tree wastes order n1/2 amount of storage space, where n is the number of elements in the array. The algorithm has O(1) amortized performance when appending a series of objects to the end of a hashed array tree.

In a 1999 paper, [20] Brodnik et al. describe a tiered dynamic array data structure, which wastes only n1/2 space for n elements at any point in time, and they prove a lower bound showing that any dynamic array must waste this much space if the operations are to remain amortized constant time. Additionally, they present a variant where growing and shrinking the buffer has not only amortized but worst-case constant time.

Bagwell (2002) [21] presented the VList algorithm, which can be adapted to implement a dynamic array.

Naïve resizable arrays -- also called "the worst implementation" of resizable arrays -- keep the allocated size of the array exactly big enough for all the data it contains, perhaps by calling realloc for each and every item added to the array. Naïve resizable arrays are the simplest way of implementing a resizable array in C. They don't waste any memory, but appending to the end of the array always takes Θ(n) time. [19] [22] [23] [24] [25] Linearly growing arrays pre-allocate ("waste") Θ(1) space every time they re-size the array, making them many times faster than naïve resizable arrays -- appending to the end of the array still takes Θ(n) time but with a much smaller constant. Naïve resizable arrays and linearly growing arrays may be useful when a space-constrained application needs lots of small resizable arrays; they are also commonly used as an educational example leading to exponentially growing dynamic arrays. [26]

Language support

C++'s std::vector and Rust's std::vec::Vec are implementations of dynamic arrays, as are the ArrayList [27] classes supplied with the Java API [28] :236 and the .NET Framework. [29] [30] :22

The generic List<> class supplied with version 2.0 of the .NET Framework is also implemented with dynamic arrays. Smalltalk's OrderedCollection is a dynamic array with dynamic start and end-index, making the removal of the first element also O(1).

Python's list datatype implementation is a dynamic array the growth pattern of which is: 0, 4, 8, 16, 24, 32, 40, 52, 64, 76, ... [31]

Delphi and D implement dynamic arrays at the language's core.

Ada's Ada.Containers.Vectors generic package provides dynamic array implementation for a given subtype.

Many scripting languages such as Perl and Ruby offer dynamic arrays as a built-in primitive data type.

Several cross-platform frameworks provide dynamic array implementations for C, including CFArray and CFMutableArray in Core Foundation, and GArray and GPtrArray in GLib.

Common Lisp provides a rudimentary support for resizable vectors by allowing to configure the built-in array type as adjustable and the location of insertion by the fill-pointer.

See also

Related Research Articles

In computer science, an array is a data structure consisting of a collection of elements, of same memory size, each identified by at least one array index or key. An array is stored such that the position of each element can be computed from its index tuple by a mathematical formula. The simplest type of data structure is a linear array, also called one-dimensional array.

<span class="mw-page-title-main">Data structure</span> Particular way of storing and organizing data in a computer

In computer science, a data structure is a data organization, and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, and the functions or operations that can be applied to the data, i.e., it is an algebraic structure about data.

In computer science, a double-ended queue is an abstract data type that generalizes a queue, for which elements can be added to or removed from either the front (head) or back (tail). It is also often called a head-tail linked list, though properly this refers to a specific data structure implementation of a deque.

<span class="mw-page-title-main">Hash table</span> Associative array for storing key-value pairs

In computing, a hash table, also known as a hash map or a hash set, is a data structure that implements an associative array, also called a dictionary, which is an abstract data type that maps keys to values. A hash table uses a hash function to compute an index, also called a hash code, into an array of buckets or slots, from which the desired value can be found. During lookup, the key is hashed and the resulting hash indicates where the corresponding value is stored.

In computer science, a linked list is a linear collection of data elements whose order is not given by their physical placement in memory. Instead, each element points to the next. It is a data structure consisting of a collection of nodes which together represent a sequence. In its most basic form, each node contains data, and a reference to the next node in the sequence. This structure allows for efficient insertion or removal of elements from any position in the sequence during iteration. More complex variants add additional links, allowing more efficient insertion or removal of nodes at arbitrary positions. A drawback of linked lists is that data access time is linear in respect to the number of nodes in the list. Because nodes are serially linked, accessing any node requires that the prior node be accessed beforehand. Faster access, such as random access, is not feasible. Arrays have better cache locality compared to linked lists.

<span class="mw-page-title-main">Queue (abstract data type)</span> Abstract data type

In computer science, a queue is a collection of entities that are maintained in a sequence and can be modified by the addition of entities at one end of the sequence and the removal of entities from the other end of the sequence. By convention, the end of the sequence at which elements are added is called the back, tail, or rear of the queue, and the end at which elements are removed is called the head or front of the queue, analogously to the words used when people line up to wait for goods or services.

In computer science, amortized analysis is a method for analyzing a given algorithm's complexity, or how much of a resource, especially time or memory, it takes to execute. The motivation for amortized analysis is that looking at the worst-case run time can be too pessimistic. Instead, amortized analysis averages the running times of operations in a sequence over that sequence. As a conclusion: "Amortized analysis is a useful tool that complements other techniques such as worst-case and average-case analysis."

In computer programming, a dope vector is a data structure used to hold information about a data object, especially its memory layout.

A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not – in other words, a query returns either "possibly in set" or "definitely not in set". Elements can be added to the set, but not removed ; the more items added, the larger the probability of false positives.

In computing, a persistent data structure or not ephemeral data structure is a data structure that always preserves the previous version of itself when it is modified. Such data structures are effectively immutable, as their operations do not (visibly) update the structure in-place, but instead always yield a new updated structure. The term was introduced in Driscoll, Sarnak, Sleator, and Tarjan's 1986 article.

A bit array is an array data structure that compactly stores bits. It can be used to implement a simple set data structure. A bit array is effective at exploiting bit-level parallelism in hardware to perform operations quickly. A typical bit array stores kw bits, where w is the number of bits in the unit of storage, such as a byte or word, and k is some nonnegative integer. If w does not divide the number of bits to be stored, some space is wasted due to internal fragmentation.

In computational complexity theory, the potential method is a method used to analyze the amortized time and space complexity of a data structure, a measure of its performance over sequences of operations that smooths out the cost of infrequent but expensive operations.

<span class="mw-page-title-main">Hashed array tree</span>

In computer science, a hashed array tree (HAT) is a dynamic array data-structure published by Edward Sitarski in 1996, maintaining an array of separate memory fragments to store the data elements, unlike simple dynamic arrays which maintain their data in one contiguous memory area. Its primary objective is to reduce the amount of element copying due to automatic array resizing operations, and to improve memory usage patterns.

A sorted array is an array data structure in which each element is sorted in numerical, alphabetical, or some other order, and placed at equally spaced addresses in computer memory. It is typically used in computer science to implement static lookup tables to hold multiple values which have the same data type. Sorting an array is useful in organising data in ordered form and recovering them rapidly.

In computer science, dynamic perfect hashing is a programming technique for resolving collisions in a hash table data structure. While more memory-intensive than its hash table counterparts, this technique is useful for situations where fast queries, insertions, and deletions must be made on a large set of elements.

In computer science, a linked data structure is a data structure which consists of a set of data records (nodes) linked together and organized by references. The link between data can also be called a connector.

In computer science, array is a data type that represents a collection of elements, each selected by one or more indices that can be computed at run time during program execution. Such a collection is usually called an array variable or array value. By analogy with the mathematical concepts vector and matrix, array types with one and two indices are often called vector type and matrix type, respectively. More generally, a multidimensional array type can be called a tensor type, by analogy with the physical concept, tensor.

In computer science, a search data structure is any data structure that allows the efficient retrieval of specific items from a set of items, such as a specific record from a database.

In computing, sequence containers refer to a group of container class templates in the standard library of the C++ programming language that implement storage of data elements. Being templates, they can be used to store arbitrary elements, such as integers or custom classes. One common property of all sequential containers is that the elements can be accessed sequentially. Like all other standard library components, they reside in namespace std.

Approximate Membership Query Filters comprise a group of space-efficient probabilistic data structures that support approximate membership queries. An approximate membership query answers whether an element is in a set or not with a false positive rate of .

References

  1. 1 2 See, for example, the source code of java.util.ArrayList class from OpenJDK 6.
  2. Lambert, Kenneth Alfred (2009), "Physical size and logical size", Fundamentals of Python: From First Programs Through Data Structures, Cengage Learning, p. 510, ISBN   978-1423902188
  3. 1 2 Goodrich, Michael T.; Tamassia, Roberto (2002), "1.5.2 Analyzing an Extendable Array Implementation", Algorithm Design: Foundations, Analysis and Internet Examples, Wiley, pp. 39–41.
  4. 1 2 Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2001) [1990]. "17.4 Dynamic tables". Introduction to Algorithms (2nd ed.). MIT Press and McGraw-Hill. pp. 416–424. ISBN   0-262-03293-7.
  5. 1 2 3 "C++ STL vector: definition, growth factor, member functions". Archived from the original on 2015-08-06. Retrieved 2015-08-05.
  6. "vector growth factor of 1.5". comp.lang.c++.moderated. Google Groups. Archived from the original on 2011-01-22. Retrieved 2015-08-05.
  7. List object implementation from github.com/python/cpython/, retrieved 2020-03-23.
  8. Brais, Hadi. "Dissecting the C++ STL Vector: Part 3 - Capacity & Size". Micromysteries. Retrieved 2015-08-05.
  9. "facebook/folly". GitHub. Retrieved 2015-08-05.
  10. "rust-lang/rust". GitHub. Retrieved 2020-06-09.
  11. "golang/go". GitHub. Retrieved 2021-09-14.
  12. "The Nim memory model". zevv.nl. Retrieved 2022-05-24.
  13. "sbcl/sbcl". GitHub. Retrieved 2023-02-15.
  14. Day 1 Keynote - Bjarne Stroustrup: C++11 Style at GoingNative 2012 on channel9.msdn.com from minute 45 or foil 44
  15. Number crunching: Why you should never, ever, EVER use linked-list in your code again at kjellkod.wordpress.com
  16. Brodnik, Andrej; Carlsson, Svante; Sedgewick, Robert; Munro, JI; Demaine, ED (1999), Resizable Arrays in Optimal Time and Space (Technical Report CS-99-09) (PDF), Department of Computer Science, University of Waterloo
  17. 1 2 3 Chris Okasaki (1995). "Purely Functional Random-Access Lists". Proceedings of the Seventh International Conference on Functional Programming Languages and Computer Architecture: 86–95. doi:10.1145/224164.224187.
  18. Goodrich, Michael T.; Kloss II, John G. (1999), "Tiered Vectors: Efficient Dynamic Arrays for Rank-Based Sequences" , Workshop on Algorithms and Data Structures , Lecture Notes in Computer Science, 1663: 205–216, doi:10.1007/3-540-48447-7_21, ISBN   978-3-540-66279-2
  19. 1 2 Sitarski, Edward (September 1996), "HATs: Hashed array trees", Algorithm Alley, Dr. Dobb's Journal, 21 (11)
  20. Brodnik, Andrej; Carlsson, Svante; Sedgewick, Robert; Munro, JI; Demaine, ED (1999), Resizable Arrays in Optimal Time and Space (PDF) (Technical Report CS-99-09), Department of Computer Science, University of Waterloo
  21. Bagwell, Phil (2002), Fast Functional Lists, Hash-Lists, Deques and Variable Length Arrays, EPFL
  22. Mike Lam. "Dynamic Arrays".
  23. "Amortized Time".
  24. "Hashed Array Tree: Efficient representation of Array".
  25. "Different notions of complexity".
  26. Peter Kankowski. "Dynamic arrays in C".
  27. Javadoc on ArrayList
  28. Bloch, Joshua (2018). "Effective Java: Programming Language Guide" (third ed.). Addison-Wesley. ISBN   978-0134685991.
  29. ArrayList Class
  30. Skeet, Jon. C# in Depth. Manning. ISBN   978-1617294532.
  31. listobject.c (github.com)