Collection (abstract data type)

Last updated January 29, 2025

In computer programming, a collection is an abstract data type that is a grouping of items that can be used in a polymorphic way.

Although easily confused with implementations in programming languages, collection, as an abstract concept, refers to mathematical concepts which can be misunderstood when the focus is on an implementation. For example, a priority queue is often implemented as a heap, while an associative array is often implemented as a hash table, so these abstract types are often referred to by this preferred implementation, as a "heap" or a "hash", though this is incorrect conceptually.

Subtypes

Other abstract data types are more specific than collection.

Linear

Some collections maintain a linear ordering of items – with access to one or both ends. The data structure implementing such a collection need not be linear. For example, a priority queue is often implemented as a heap, which is a kind of tree.

Notable linear collections include:

Associative

Some collections are interpreted as a sort of function: given an input, the collection yields an output.

Notable associative collections include:

A set can be interpreted as a specialized multiset, which in turn is a specialized associative array, in each case by limiting the possible values—considering a set as represented by its indicator function.

Implementation

As an abstract data type, collection does not prescribe an implementation, though type theory describes implementation considerations.

Some collection types are provided as primitive data types in a language, such as lists, while more complex collection types are implemented as composite data types in libraries, sometimes in a language's standard library. Examples include:

C++: known as containers, implemented in C++ Standard Library and earlier Standard Template Library
Java: implemented in the Java collections framework
Oracle PL/SQL implements collections as programmer-defined types^[1]
Python: some built-in, others implemented in the collections library
.NET provides the ICollection and IReadOnlyCollection interfaces and implementations such as List<T>.
Rust provides the Vec<T>^[2] and HashMap<K, V>^[3] structs in the std::collections namespace.^[4]

Related Research Articles

In computer science, a data structure is a data organization and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, and the functions or operations that can be applied to the data, i.e., it is an algebraic structure about data.

In computer science, a hash table is a data structure that implements an associative array, also called a dictionary or simply map; an associative array is an abstract data type that maps keys to values. A hash table uses a hash function to compute an index, also called a hash code, into an array of buckets or slots, from which the desired value can be found. During lookup, the key is hashed and the resulting hash indicates where the corresponding value is stored. A map implemented by a hash table is called a hash map.

<span class="mw-page-title-main">Heap (data structure)</span> Computer science data structure

In computer science, a heap is a tree-based data structure that satisfies the heap property: In a max heap, for any given node C, if P is the parent node of C, then the key of P is greater than or equal to the key of C. In a min heap, the key of P is less than or equal to the key of C. The node at the "top" of the heap is called the root node.

In computer science, a priority queue is an abstract data type similar to a regular queue or stack abstract data type. Each element in a priority queue has an associated priority. In a priority queue, elements with high priority are served before elements with low priority. In some implementations, if two elements have the same priority, they are served in the same order in which they were enqueued. In other implementations, the order of elements with the same priority is undefined.

In computer science, a queue is a collection of entities that are maintained in a sequence and can be modified by the addition of entities at one end of the sequence and the removal of entities from the other end of the sequence. By convention, the end of the sequence at which elements are added is called the back, tail, or rear of the queue, and the end at which elements are removed is called the head or front of the queue, analogously to the words used when people line up to wait for goods or services.

In computer science, an associative array, map, symbol table, or dictionary is an abstract data type that stores a collection of pairs, such that each possible key appears at most once in the collection. In mathematical terms, an associative array is a function with finite domain. It supports 'lookup', 'remove', and 'insert' operations.

In computer programming, an iterator is an object that progressively provides access to each item of a collection, in order.

In object-oriented (OO) and functional programming, an immutable object is an object whose state cannot be modified after it is created. This is in contrast to a mutable object, which can be modified after it is created. In some cases, an object is considered immutable even if some internally used attributes change, but the object's state appears unchanging from an external point of view. For example, an object that uses memoization to cache the results of expensive computations could still be considered an immutable object.

In computer science, a set is an abstract data type that can store unique values, without any particular order. It is a computer implementation of the mathematical concept of a finite set. Unlike most other collection types, rather than retrieving a specific element from a set, one typically tests a value for membership in a set.

In computer programming, a weak reference is a reference that does not protect the referenced object from collection by a garbage collector, unlike a strong reference. An object referenced only by weak references – meaning "every chain of references that reaches the object includes at least one weak reference as a link" – is considered weakly reachable, and can be treated as unreachable and so may be collected at any time. Some garbage-collected languages feature or support various levels of weak references, such as C#, Lua, Java, Lisp, OCaml, MATLAB, Perl, Python and PHP since the version 7.4.

<span class="mw-page-title-main">Foreach loop</span> Control flow statement for traversing items in a collection

In computer programming, foreach loop is a control flow statement for traversing items in a collection. foreach is usually used in place of a standard for loop statement. Unlike other for loop constructs, however, foreach loops usually maintain no explicit counter: they essentially say "do this to everything in this set", rather than "do this x times". This avoids potential off-by-one errors and makes code simpler to read. In object-oriented languages, an iterator, even if implicit, is often used as the means of traversal.

A bit array is an array data structure that compactly stores bits. It can be used to implement a simple set data structure. A bit array is effective at exploiting bit-level parallelism in hardware to perform operations quickly. A typical bit array stores kw bits, where w is the number of bits in the unit of storage, such as a byte or word, and k is some nonnegative integer. If w does not divide the number of bits to be stored, some space is wasted due to internal fragmentation.

In computer science, a dynamic array, growable array, resizable array, dynamic table, mutable array, or array list is a random access, variable-size list data structure that allows elements to be added or removed. It is supplied with standard libraries in many modern mainstream programming languages. Dynamic arrays overcome a limit of static arrays, which have a fixed capacity that needs to be specified at allocation.

In computer science, a multimap is a generalization of a map or associative array abstract data type in which more than one value may be associated with and returned for a given key. Both map and multimap are particular cases of containers. Often the multimap is implemented as a map with lists or sets as the map values.

The Java collections framework is a set of classes and interfaces that implement commonly reusable collection data structures.

This comparison of programming languages (associative arrays) compares the features of associative array data structures or array-lookup processing for over 40 computer programming languages.

OptimJ is an extension for Java with language support for writing optimization models and abstractions for bulk data processing. The extensions and the proprietary product implementing the extensions were developed by Ateji which went out of business in September 2011. OptimJ aims at providing a clear and concise algebraic notation for optimization modeling, removing compatibility barriers between optimization modeling and application programming tools, and bringing software engineering techniques such as object-orientation and modern IDE support to optimization experts.

Rust is a general-purpose programming language emphasizing performance, type safety, and concurrency. It enforces memory safety, meaning that all references point to valid memory. It does so without a traditional garbage collector; instead, memory safety errors and data races are prevented by the "borrow checker", which tracks the object lifetime of references at compile time.

Nim is a general-purpose, multi-paradigm, statically typed, compiled high-level system programming language, designed and developed by a team around Andreas Rumpf. Nim is designed to be "efficient, expressive, and elegant", supporting metaprogramming, functional, message passing, procedural, and object-oriented programming styles by providing several features such as compile time code generation, algebraic data types, a foreign function interface (FFI) with C, C++, Objective-C, and JavaScript, and supporting compiling to those same languages as intermediate representations.

References

↑ Feuerstein, Steven; Pribyl, Bill; Dawes, Chip (1999). "Collections in PL/SQL". Oracle PL/SQL Language Pocket Reference (4 ed.). Sebastopol, California: O'Reilly Media, Inc. (published 2007). p. 63. ISBN 9780596551612 . Retrieved 2017-06-26. Collections are implemented as TYPEs. As with any programmer-defined type, you must first define the type; then you can declare instances of that type.
↑ "Vec in std::vec - Rust". doc.rust-lang.org. Retrieved 28 January 2025.
↑ "HashMap in std::collections - Rust". doc.rust-lang.org. Retrieved 28 January 2025.
↑ "std::collections - Rust". doc.rust-lang.org. Retrieved 28 January 2025.

External links

Apache Commons Collections.
AS3Commons Collections Framework ActionScript3 implementation of the most common collections.
CollectionSpy — A profiler for Java's Collections Framework.
Guava.
Mango Java library.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Feuerstein, Steven; Pribyl, Bill; Dawes, Chip (1999). "Collections in PL/SQL". Oracle PL/SQL Language Pocket Reference (4 ed.). Sebastopol, California: O'Reilly Media, Inc. (published 2007). p. 63. ISBN 9780596551612 . Retrieved 2017-06-26. Collections are implemented as TYPEs. As with any programmer-defined type, you must first define the type; then you can declare instances of that type.

[2] "Vec in std::vec - Rust". doc.rust-lang.org. Retrieved 28 January 2025.

[3] "HashMap in std::collections - Rust". doc.rust-lang.org. Retrieved 28 January 2025.

[4] "std::collections - Rust". doc.rust-lang.org. Retrieved 28 January 2025.

[1]

[2]

[3]

[4]

v t e Data structures
Types	Collection Container
Abstract	Associative array Multimap Retrieval Data Structure List Stack Queue Double-ended queue Priority queue Double-ended priority queue Set Multiset Disjoint-set
Arrays	Bit array Circular buffer Dynamic array Hash table Hashed array tree Sparse matrix
Linked	Association list Linked list Skip list Unrolled linked list XOR linked list
Trees	B-tree Binary search tree AA tree AVL tree Red–black tree Self-balancing tree Splay tree Heap Binary heap Binomial heap Fibonacci heap R-tree R* tree R+ tree Hilbert R-tree Trie Hash tree
Graphs	Binary decision diagram Directed acyclic graph Directed acyclic word graph
List of data structures