Collection (abstract data type)

Last updated

In computer science, a collection is a grouping of some variable number of data items (possibly zero) that have some shared significance to the problem being solved and need to be operated upon together in some controlled fashion. Generally, the data items will be of the same type or, in languages supporting inheritance, derived from some common ancestor type. A collection is a concept applicable to abstract data types, and does not prescribe a specific implementation as a concrete data structure, though often there is a conventional choice (see Container for type theory discussion).

Contents

Examples of collections include lists, sets, multisets, trees and graphs.

Fixed-size arrays (or tables) are usually not considered a collection because they hold a fixed number of data items, although they commonly play a role in the implementation of collections. Variable-size arrays are generally considered collections.[ citation needed ]

Linear collections

Many collections define a particular linear ordering, with access to one or both ends. The actual data structure implementing such a collection need not be linear—for example, a priority queue is often implemented as a heap, which is a kind of tree. Important linear collections include:

Lists

In a list, the order of data items is significant. Duplicate data items are permitted. Examples of operations on lists are searching for a data item in the list and determining its location (if it is present), removing a data item from the list, adding a data item to the list at a specific location, etc. If the principal operations on the list are to be the addition of data items at one end and the removal of data items at the other, it will generally be called a queue or FIFO. If the principal operations are the addition and removal of data items at just one end, it will be called a stack or LIFO. In both cases, data items are maintained within the collection in the same order (unless they are removed and re-inserted somewhere else) and so these are special cases of the list collection. Other specialized operations on lists include sorting, where, again, the order of data items is of great importance.

Stacks

A stack is a LIFO data structure with two principal operations: push, which adds an element to the "top" of the collection, and pop, which removes the top element.

Queues

Priority queues

In a priority queue, the tracks of the minimum or maximum data item in the collection are kept, according to some ordering criterion, and the order of the other data items does not matter. One may think of a priority queue as a list that always keeps the minimum or maximum at the head, while the remaining elements are kept in a bag.

Double-ended queues

Double-ended priority queues

Associative collections

Other collections can instead be interpreted as a sort of function: given an input, the collection yields an output. Important associative collections include:

A set can be interpreted as a specialized multiset, which in turn is a specialized associative array, in each case by limiting the possible values—considering a set as represented by its indicator function.

Sets

In a set, the order of data items does not matter (or is undefined) but duplicate data items are not permitted. Examples of operations on sets are the addition and removal of data items and searching for a data item in the set. Some languages support sets directly. In others, sets can be implemented by a hash table with dummy values; only the keys are used in representing the set.

Multisets

In a multiset (or bag), like in a set, the order of data items does not matter, but in this case duplicate data items are permitted. Examples of operations on multisets are the addition and removal of data items and determining how many duplicates of a particular data item are present in the multiset. Multisets can be transformed into lists by the action of sorting.

Associative arrays

In an associative array (or map, dictionary, lookup table), like in a dictionary, a lookup on a key (like a word) provides a value (like a definition). The value might be a reference to a compound data structure. A hash table is usually an efficient implementation, and thus this data type is often known as a "hash".

Graphs

In a graph, data items have associations with one or more other data items in the collection and are somewhat like trees without the concept of a root or a parent-child relationship so that all data items are peers. Examples of operations on graphs are traversals and searches which explore the associations of data items looking for some specific property. Graphs are frequently used to model real-world situations and to solve related problems. An example is the Spanning tree protocol, which creates a graph (or mesh) representation of a data network and figures out which associations between switching nodes need to be broken to turn it into a tree and thus prevent data going around in loops.

Trees

In a tree, which is a special kind of graph, a root data item has associated with it some number of data items which in turn have associated with them some number of other data items in what is frequently viewed as a parent-child relationship. Every data item (other than the root) has a single parent (the root has no parent) and some number of children, possibly zero. Examples of operations on trees are the addition of data items so as to maintain a specific property of the tree to perform sorting, etc. and traversals to visit data items in a specific sequence.

Tree collections can be used to naturally store hierarchical data, which is presented in a tree-like manner, such as menu systems and files in directories on a data storage system.

Specialized trees are used in various algorithms. For example, the heap sort uses a kind of tree called a heap.

Abstract concept vs. implementation

As described here, a collection and the various kinds of collections are abstract concepts. There exists in the literature considerable confusion between the abstract concepts of computer science and their specific implementations in various languages or kinds of languages. Assertions that collections like lists, sets, trees, etc. are data structures, abstract data types or classes must be read with this in mind. Collections are first and foremost abstractions that are useful in formulating solutions to computing problems. Viewed in this light, they retain important links to underlying mathematical concepts which can be lost when the focus is on the implementation.

For example, a priority queue is often implemented as a heap, while an associative array is often implemented as a hash table, so these abstract types are often referred to by this preferred implementation, as a "heap" or a "hash", though this is not strictly correct.

Implementations

Some collections may be primitive data types in a language, such as lists, while more complex collections are implemented as composite data types in libraries, sometimes in the standard library. Examples include:

Related Research Articles

In computer science, an abstract data type (ADT) is a mathematical model for data types. An abstract data type is defined by its behavior (semantics) from the point of view of a user, of the data, specifically in terms of possible values, possible operations on data of this type, and the behavior of these operations. This mathematical model contrasts with data structures, which are concrete representations of data, and are the point of view of an implementer, not a user.

Binary search tree Data structure in tree form sorted for fast lookup

In computer science, a binary search tree (BST), also called an ordered or sorted binary tree, is a rooted binary tree whose internal nodes each store a key greater than all the keys in the node's left subtree and less than those in its right subtree. A binary tree is a type of data structure for storing data such as numbers in an organized way. Binary search trees allow binary search for fast lookup, addition and removal of data items, and can be used to implement dynamic sets and lookup tables. The order of nodes in a BST means that each comparison skips about half of the remaining tree, so the whole lookup takes time proportional to the binary logarithm of the number of items stored in the tree. This is much better than the linear time required to find items by key in an (unsorted) array, but slower than the corresponding operations on hash tables. Several variants of the binary search tree have been studied.

Data structure Particular way of storing and organizing data in a computer

In computer science, a data structure is a data organization, management, and storage format that enables efficient access and modification. More precisely, a data structure is a collection of data values, the relationships among them, and the functions or operations that can be applied to the data, i.e., it is an algebraic structure about data.A data structure is a named location that can be used to store and organize data.An algorithm is a collection of steps to solve a particular problem. Learning data structures and algorithms allow us to write efficient and optimized computer programs.

Heap (data structure) Computer science data structure

In computer science, a heap is a specialized tree-based data structure which is essentially an almost complete tree that satisfies the heap property: in a max heap, for any given node C, if P is a parent node of C, then the key of P is greater than or equal to the key of C. In a min heap, the key of P is less than or equal to the key of C. The node at the "top" of the heap is called the root node.

In computer science, a priority queue is an abstract data type similar to a regular queue or stack data structure in which each element additionally has a "priority" associated with it. In a priority queue, an element with high priority is served before an element with low priority. In some implementations, if two elements have the same priority, they are served according to the order in which they were enqueued, while in other implementations, ordering of elements with the same priority is undefined.

Queue (abstract data type) Abstract data type

In computer science, a queue is a collection of entities that are maintained in a sequence and can be modified by the addition of entities at one end of the sequence and the removal of entities from the other end of the sequence. By convention, the end of the sequence at which elements are added is called the back, tail, or rear of the queue, and the end at which elements are removed is called the head or front of the queue, analogously to the words used when people line up to wait for goods or services.

In computer science, an associative array, map, symbol table, or dictionary is an abstract data type composed of a collection of pairs, such that each possible key appears at most once in the collection. Not to be confused with Associative Processors

The Standard Template Library (STL) is a software library for the C++ programming language that influenced many parts of the C++ Standard Library. It provides four components called algorithms, containers, functions, and iterators.

In computer science, a set is an abstract data type that can store unique values, without any particular order. It is a computer implementation of the mathematical concept of a finite set. Unlike most other collection types, rather than retrieving a specific element from a set, one typically tests a value for membership in a set.

In computer science, a list or sequence is an abstract data type that represents a countable number of ordered values, where the same value may occur more than once. An instance of a list is a computer representation of the mathematical concept of a tuple or finite sequence; the (potentially) infinite analog of a list is a stream. Lists are a basic example of containers, as they contain other values. If the same value occurs multiple times, each occurrence is considered a distinct item.

Self-balancing binary search tree Any node-based binary search tree that automatically keeps its height small

In computer science, a self-balancingbinary search tree is any node-based binary search tree that automatically keeps its height small in the face of arbitrary item insertions and deletions.

In computing, a persistent data structure or not ephemeral data structure is a data structure that always preserves the previous version of itself when it is modified. Such data structures are effectively immutable, as their operations do not (visibly) update the structure in-place, but instead always yield a new updated structure. The term was introduced in Driscoll, Sarnak, Sleator, and Tarjans' 1986 article.

In computer science, a multimap is a generalization of a map or associative array abstract data type in which more than one value may be associated with and returned for a given key. Both map and multimap are particular cases of containers. Often the multimap is implemented as a map with lists or sets as the map values.

Java collections framework

The Java collections framework is a set of classes and interfaces that implement commonly reusable collection data structures.

In computer science, a container is a class or a data structure whose instances are collections of other objects. In other words, they store objects in an organized way that follows specific access rules. The size of the container depends on the number of objects (elements) it contains. Underlying (inherited) implementations of various container types may vary in size and complexity, and provide flexibility in choosing the right implementation for any given scenario.

InfinityDB is an all-Java embedded database engine and client/server DBMS with an extended java.util.concurrent.ConcurrentNavigableMap interface that is deployed in handheld devices, on servers, on workstations, and in distributed settings. The design is based on a proprietary lockless, concurrent, B-tree architecture that enables client programmers to reach high levels of performance without risk of failures.

In computer science, integer sorting is the algorithmic problem of sorting a collection of data values by integer keys. Algorithms designed for integer sorting may also often be applied to sorting problems in which the keys are floating point numbers, rational numbers, or text strings. The ability to perform integer arithmetic on the keys allows integer sorting algorithms to be faster than comparison sorting algorithms in many cases, depending on the details of which operations are allowed in the model of computing and how large the integers to be sorted are.

Bucket queue

A bucket queue is a data structure that implements the priority queue abstract data type: it maintains a dynamic collection of elements with numerical priorities and allows quick access to the element with minimum or maximum priority. In the bucket queue, the priorities must be integers, and it is particularly suited to applications in which the priorities have a small range. A bucket queue has the form of an array of buckets: an array data structure, indexed by the priorities, whose cells contain collections of items with the same priority as each other. With this data structure, insertion or removal of elements takes constant time. Searches for the minimum-priority element take time proportional to the number of buckets or, by maintaining a pointer to the most recently found bucket, in time proportional to the difference in priorities between successive operations.

References

  1. Feuerstein, Steven; Pribyl, Bill; Dawes, Chip (2007) [1999]. "Collections in PL/SQL". Oracle PL/SQL Language Pocket Reference. Pocket Reference (4 ed.). Sebastopol, California: O'Reilly Media, Inc. p. 63. ISBN   9780596551612 . Retrieved 2017-06-26. Collections are implemented as TYPEs. As with any programmer-defined type, you must first define the type; then you can declare instances of that type.