This article needs additional citations for verification .(October 2011) (Learn how and when to remove this template message) |

In computer science, a **set** is an abstract data type that can store unique values, without any particular order. It is a computer implementation of the mathematical concept of a finite set. Unlike most other collection types, rather than retrieving a specific element from a set, one typically tests a value for membership in a set.

- Type theory
- Operations
- Core set-theoretical operations
- Static sets
- Dynamic sets
- Additional operations
- Implementations
- Language support
- Multiset
- Multisets in SQL
- See also
- Notes
- References

Some set data structures are designed for **static** or **frozen sets** that do not change after they are constructed. Static sets allow only query operations on their elements — such as checking whether a given value is in the set, or enumerating the values in some arbitrary order. Other variants, called **dynamic** or **mutable sets**, allow also the insertion and deletion of elements from the set.

A ** multiset ** is a special kind of set in which an element can figure several times.

In type theory, sets are generally identified with their indicator function (characteristic function): accordingly, a set of values of type may be denoted by or . (Subtypes and subsets may be modeled by refinement types, and quotient sets may be replaced by setoids.) The characteristic function of a set is defined as:

In theory, many other abstract data structures can be viewed as set structures with additional operations and/or additional axioms imposed on the standard operations. For example, an abstract heap can be viewed as a set structure with a `min(`

operation that returns the element of smallest value.*S*)

One may define the operations of the algebra of sets:

`union(`

: returns the union of sets*S*,*T*)*S*and*T*.`intersection(`

: returns the intersection of sets*S*,*T*)*S*and*T*.`difference(`

: returns the difference of sets*S*,*T*)*S*and*T*.`subset(`

: a predicate that tests whether the set*S*,*T*)*S*is a subset of set*T*.

Typical operations that may be provided by a static set structure *S* are:

`is_element_of(`

: checks whether the value*x*,*S*)*x*is in the set*S*.`is_empty(`

: checks whether the set*S*)*S*is empty.`size(`

or*S*)`cardinality(`

: returns the number of elements in*S*)*S*.`iterate(`

: returns a function that returns one more value of*S*)*S*at each call, in some arbitrary order.`enumerate(`

: returns a list containing the elements of*S*)*S*in some arbitrary order.`build(`

: creates a set structure with values*x*_{1},*x*_{2},…,*x*_{n},)*x*_{1},*x*_{2},...,*x*_{n}.`create_from(`

: creates a new set structure containing all the elements of the given collection or all the elements returned by the given iterator.*collection*)

Dynamic set structures typically add:

`create()`

: creates a new, initially empty set structure.`create_with_capacity(`

: creates a new set structure, initially empty but capable of holding up to*n*)*n*elements.

`add(`

: adds the element*S*,*x*)*x*to*S*, if it is not present already.`remove(`

: removes the element*S*,*x*)*x*from*S*, if it is present.`capacity(`

: returns the maximum number of values that*S*)*S*can hold.

Some set structures may allow only some of these operations. The cost of each operation will depend on the implementation, and possibly also on the particular values stored in the set, and the order in which they are inserted.

There are many other operations that can (in principle) be defined in terms of the above, such as:

`pop(`

: returns an arbitrary element of*S*)*S*, deleting it from*S*.^{ [1] }`pick(`

: returns an arbitrary element of*S*)*S*.^{ [2] }^{ [3] }^{ [4] }Functionally, the mutator`pop`

can be interpreted as the pair of selectors`(pick, rest),`

where`rest`

returns the set consisting of all elements except for the arbitrary element.^{ [5] }Can be interpreted in terms of`iterate`

.^{ [lower-alpha 1] }`map(`

: returns the set of distinct values resulting from applying function*F*,*S*)*F*to each element of*S*.`filter(`

: returns the subset containing all elements of*P*,*S*)*S*that satisfy a given predicate*P*.`fold(`

: returns the value*A*_{0},*F*,*S*)*A*_{|S|}after applying

for each element*A*_{i+1}:=*F*(*A*,_{i}*e*)*e*of*S,*for some binary operation*F.**F*must be associative and commutative for this to be well-defined.`clear(`

: delete all elements of*S*)*S*.`equal(`

: checks whether the two given sets are equal (i.e. contain all and only the same elements).*S*_{1}',*S*_{2}')`hash(`

: returns a hash value for the static set*S*)*S*such that if`equal(`

then*S*_{1},*S*_{2})`hash(`

*S*) = hash(_{1}*S*)_{2}

Other operations can be defined for sets with elements of a special type:

`sum(`

: returns the sum of all elements of*S*)*S*for some definition of "sum". For example, over integers or reals, it may be defined as`fold(0, add,`

.*S*)`collapse(`

: given a set of sets, return the union.*S*)^{ [6] }For example,`collapse({{1}, {2, 3}}) == {1, 2, 3}`

. May be considered a kind of`sum`

.`flatten(`

: given a set consisting of sets and atomic elements (elements that are not sets), returns a set whose elements are the atomic elements of the original top-level set or elements of the sets it contains. In other words, remove a level of nesting – like*S*)`collapse,`

but allow atoms. This can be done a single time, or recursively flattening to obtain a set of only atomic elements.^{ [7] }For example,`flatten({1, {2, 3}}) == {1, 2, 3}`

.`nearest(`

: returns the element of*S*,*x*)*S*that is closest in value to*x*(by some metric).`min(`

,*S*)`max(`

: returns the minimum/maximum element of*S*)*S*.

Sets can be implemented using various data structures, which provide different time and space trade-offs for various operations. Some implementations are designed to improve the efficiency of very specialized operations, such as `nearest`

or `union`

. Implementations described as "general use" typically strive to optimize the `element_of`

, `add`

, and `delete`

operations. A simple implementation is to use a list, ignoring the order of the elements and taking care to avoid repeated values. This is simple but inefficient, as operations like set membership or element deletion are *O*(*n*), as they require scanning the entire list.^{ [lower-alpha 2] } Sets are often instead implemented using more efficient data structures, particularly various flavors of trees, tries, or hash tables.

As sets can be interpreted as a kind of map (by the indicator function), sets are commonly implemented in the same way as (partial) maps (associative arrays) – in this case in which the value of each key-value pair has the unit type or a sentinel value (like 1) – namely, a self-balancing binary search tree for sorted sets^{[ definition needed ]} (which has O(log n) for most operations), or a hash table for unsorted sets (which has O(1) average-case, but O(n) worst-case, for most operations). A sorted linear hash table^{ [8] } may be used to provide deterministically ordered sets.

Further, in languages that support maps but not sets, sets can be implemented in terms of maps. For example, a common programming idiom in Perl that converts an array to a hash whose values are the sentinel value 1, for use as a set, is:

`my%elements=map{$_=>1}@elements;`

Other popular methods include arrays. In particular a subset of the integers 1..*n* can be implemented efficiently as an *n*-bit bit array, which also support very efficient union and intersection operations. A Bloom map implements a set probabilistically, using a very compact representation but risking a small chance of false positives on queries.

The Boolean set operations can be implemented in terms of more elementary operations (`pop`

, `clear`

, and `add`

), but specialized algorithms may yield lower asymptotic time bounds. If sets are implemented as sorted lists, for example, the naive algorithm for `union(`

will take time proportional to the length *S*,*T*)*m* of *S* times the length *n* of *T*; whereas a variant of the list merging algorithm will do the job in time proportional to *m*+*n*. Moreover, there are specialized set data structures (such as the union-find data structure) that are optimized for one or more of these operations, at the expense of others.

One of the earliest languages to support sets was Pascal; many languages now include it, whether in the core language or in a standard library.

- In C++, the Standard Template Library (STL) provides the
`set`

template class, which is typically implemented using a binary search tree (e.g. red–black tree); SGI's STL also provides the`hash_set`

template class, which implements a set using a hash table. C++11 has support for the`unordered_set`

template class, which is implemented using a hash table. In sets, the elements themselves are the keys, in contrast to sequenced containers, where elements are accessed using their (relative or absolute) position. Set elements must have a strict weak ordering. - Java offers the
`Set`

interface to support sets (with the`HashSet`

class implementing it using a hash table), and the`SortedSet`

sub-interface to support sorted sets (with the`TreeSet`

class implementing it using a binary search tree). - Apple's Foundation framework (part of Cocoa) provides the Objective-C classes
`NSSet`

,`NSMutableSet`

,`NSCountedSet`

,`NSOrderedSet`

, and`NSMutableOrderedSet`

. The CoreFoundation APIs provide the CFSet and CFMutableSet types for use in C. - Python has built-in
`set`

and`frozenset`

types since 2.4, and since Python 3.0 and 2.7, supports non-empty set literals using a curly-bracket syntax, e.g.:`{x, y, z}`

; empty sets must be created using`set()`

, because Python uses`{}`

to represent the empty dictionary. - The .NET Framework provides the generic
`HashSet`

and`SortedSet`

classes that implement the generic`ISet`

interface. - Smalltalk's class library includes
`Set`

and`IdentitySet`

, using equality and identity for inclusion test respectively. Many dialects provide variations for compressed storage (`NumberSet`

,`CharacterSet`

), for ordering (`OrderedSet`

,`SortedSet`

, etc.) or for weak references (`WeakIdentitySet`

). - Ruby's standard library includes a
`set`

module which contains`Set`

and`SortedSet`

classes that implement sets using hash tables, the latter allowing iteration in sorted order. - OCaml's standard library contains a
`Set`

module, which implements a functional set data structure using binary search trees. - The GHC implementation of Haskell provides a
`Data.Set`

module, which implements immutable sets using binary search trees.^{ [9] } - The Tcl Tcllib package provides a set module which implements a set data structure based upon TCL lists.
- The Swift standard library contains a
`Set`

type, since Swift 1.2. - JavaScript introduced
`Set`

as a standard built-in object with the ECMAScript 2015^{ [10] }standard. - Erlang's standard library has a
`sets`

module. - Clojure has literal syntax for hashed sets, and also implements sorted sets.
- LabVIEW has native support for sets, from version 2019.

As noted in the previous section, in languages which do not directly support sets but do support associative arrays, sets can be emulated using associative arrays, by using the elements as keys, and using a dummy value as the values, which are ignored.

A generalization of the notion of a set is that of a ** multiset ** or **bag**, which is similar to a set but allows repeated ("equal") values (duplicates). This is used in two distinct senses: either equal values are considered *identical,* and are simply counted, or equal values are considered *equivalent,* and are stored as distinct items. For example, given a list of people (by name) and ages (in years), one could construct a multiset of ages, which simply counts the number of people of a given age. Alternatively, one can construct a multiset of people, where two people are considered equivalent if their ages are the same (but may be different people and have different names), in which case each pair (name, age) must be stored, and selecting on a given age gives all the people of a given age.

Formally, it is possible for objects in computer science to be considered "equal" under some equivalence relation but still distinct under another relation. Some types of multiset implementations will store distinct equal objects as separate items in the data structure; while others will collapse it down to one version (the first one encountered) and keep a positive integer count of the multiplicity of the element.

As with sets, multisets can naturally be implemented using hash table or trees, which yield different performance characteristics.

The set of all bags over type T is given by the expression bag T. If by multiset one considers equal items identical and simply counts them, then a multiset can be interpreted as a function from the input domain to the non-negative integers (natural numbers), generalizing the identification of a set with its indicator function. In some cases a multiset in this counting sense may be generalized to allow negative values, as in Python.

- C++'s Standard Template Library implements both sorted and unsorted multisets. It provides the
`multiset`

class for the sorted multiset, as a kind of associative container, which implements this multiset using a self-balancing binary search tree. It provides the`unordered_multiset`

class for the unsorted multiset, as a kind of unordered associative containers, which implements this multiset using a hash table. The unsorted multiset is standard as of C++11; previously SGI's STL provides the`hash_multiset`

class, which was copied and eventually standardized. - For Java, third-party libraries provide multiset functionality:
- Apache Commons Collections provides the
`Bag`

and`SortedBag`

interfaces, with implementing classes like`HashBag`

and`TreeBag`

. - Google Guava provides the
`Multiset`

interface, with implementing classes like`HashMultiset`

and`TreeMultiset`

.

- Apache Commons Collections provides the
- Apple provides the
`NSCountedSet`

class as part of Cocoa, and the`CFBag`

and`CFMutableBag`

types as part of CoreFoundation. - Python's standard library includes
`collections.Counter`

, which is similar to a multiset. - Smalltalk includes the
`Bag`

class, which can be instantiated to use either identity or equality as predicate for inclusion test.

Where a multiset data structure is not available, a workaround is to use a regular set, but override the equality predicate of its items to always return "not equal" on distinct objects (however, such will still not be able to store multiple occurrences of the same object) or use an associative array mapping the values to their integer multiplicities (this will not be able to distinguish between equal elements at all).

Typical operations on bags:

`contains(`

: checks whether the element*B*,*x*)*x*is present (at least once) in the bag*B*`is_sub_bag(`

: checks whether each element in the bag*B*_{1},*B*_{2})*B*_{1}occurs in*B*_{1}no more often than it occurs in the bag*B*_{2}; sometimes denoted as*B*_{1}⊑*B*_{2}.`count(`

: returns the number of times that the element*B*,*x*)*x*occurs in the bag*B*; sometimes denoted as*B*#*x*.`scaled_by(`

: given a natural number*B*,*n*)*n*, returns a bag which contains the same elements as the bag*B*, except that every element that occurs*m*times in*B*occurs*n***m*times in the resulting bag; sometimes denoted as*n*⊗*B*.`union(`

: returns a bag containing just those values that occur in either the bag*B*_{1},*B*_{2})*B*_{1}or the bag*B*_{2}, except that the number of times a value*x*occurs in the resulting bag is equal to (*B*_{1}# x) + (*B*_{2}# x); sometimes denoted as*B*_{1}⊎*B*_{2}.

In relational databases, a table can be a (mathematical) set or a multiset, depending on the presence of unicity constraints on some columns (which turns it into a candidate key).

SQL allows the selection of rows from a relational table: this operation will in general yield a multiset, unless the keyword `DISTINCT`

is used to force the rows to be all different, or the selection includes the primary (or a candidate) key.

In ANSI SQL the `MULTISET`

keyword can be used to transform a subquery into a collection expression:

`SELECTexpression1,expression2...FROMtable_name...`

is a general select that can be used as * subquery expression * of another more general query, while

`MULTISET(SELECTexpression1,expression2...FROMtable_name...)`

transforms the subquery into a * collection expression * that can be used in another query, or in assignment to a column of appropriate collection type.

In computer science, an **array data structure**, or simply an **array**, is a data structure consisting of a collection of *elements*, each identified by at least one *array index* or *key*. An array is stored such that the position of each element can be computed from its index tuple by a mathematical formula. The simplest type of data structure is a linear array, also called one-dimensional array.

In computer science, **binary search**, also known as **half-interval search**, **logarithmic search**, or **binary chop**, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the middle element of the array. If they are not equal, the half in which the target cannot lie is eliminated and the search continues on the remaining half, again taking the middle element to compare to the target value, and repeating this until the target value is found. If the search ends with the remaining half being empty, the target is not in the array.

In computer science, a **data structure** is a data organization, management, and storage format that enables efficient access and modification. More precisely, a data structure is a collection of data values, the relationships among them, and the functions or operations that can be applied to the data.

In computing, a **hash table** is a data structure that implements an associative array abstract data type, a structure that can map keys to values. A hash table uses a hash function to compute an *index*, also called a *hash code*, into an array of *buckets* or *slots*, from which the desired value can be found. During lookup, the key is hashed and the resulting hash indicates where the corresponding value is stored.

In computer science, a **heap** is a specialized tree-based data structure which is essentially an almost complete tree that satisfies the **heap property**: in a *max heap*, for any given node C, if P is a parent node of C, then the *key* of P is greater than or equal to the key of C. In a *min heap*, the key of P is less than or equal to the key of C. The node at the "top" of the heap is called the *root* node.

In computer science, an **associative array**, **map**, **symbol table**, or **dictionary** is an abstract data type composed of a collection of pairs, such that each possible key appears at most once in the collection.

The **Standard Template Library** (**STL**) is a software library for the C++ programming language that influenced many parts of the C++ Standard Library. It provides four components called *algorithms*, *containers*, *functions*, and *iterators*.

In object-oriented programming, the **iterator pattern** is a design pattern in which an iterator is used to traverse a container and access the container's elements. The iterator pattern decouples algorithms from containers; in some cases, algorithms are necessarily container-specific and thus cannot be decoupled.

In computer programming, an **iterator** is an object that enables a programmer to traverse a container, particularly lists. Various types of iterators are often provided via a container's interface. Though the interface and semantics of a given iterator are fixed, iterators are often implemented in terms of the structures underlying a container implementation and are often tightly coupled to the container to enable the operational semantics of the iterator. An iterator performs traversal and also gives access to data elements in a container, but does not itself perform iteration. An iterator is behaviorally similar to a database cursor. Iterators date to the CLU programming language in 1974.

In computer science, a **list** or **sequence** is an abstract data type that represents a countable number of ordered values, where the same value may occur more than once. An instance of a list is a computer representation of the mathematical concept of a tuple or finite sequence; the (potentially) infinite analog of a list is a stream. Lists are a basic example of containers, as they contain other values. If the same value occurs multiple times, each occurrence is considered a distinct item.

In computer science, a **selection algorithm** is an algorithm for finding the *k*th smallest number in a list or array; such a number is called the *k*th *order statistic*. This includes the cases of finding the minimum, maximum, and median elements. There are O(*n*)-time selection algorithms, and sublinear performance is possible for structured data; in the extreme, O(1) for an array of sorted data. Selection is a subproblem of more complex problems like the nearest neighbor and shortest path problems. Many selection algorithms are derived by generalizing a sorting algorithm, and conversely some sorting algorithms can be derived as repeated application of selection.

In computer science, a **dynamic array**, **growable array**, **resizable array**, **dynamic table**, **mutable array**, or **array list** is a random access, variable-size list data structure that allows elements to be added or removed. It is supplied with standard libraries in many modern mainstream programming languages. Dynamic arrays overcome a limit of static arrays, which have a fixed capacity that needs to be specified at allocation.

In computer science, a **radix tree** is a data structure that represents a space-optimized trie in which each node that is the only child is merged with its parent. The result is that the number of children of every internal node is at most the radix r of the radix tree, where r is a positive integer and a power x of 2, having x ≥ 1. Unlike regular trees, edges can be labeled with sequences of elements as well as single elements. This makes radix trees much more efficient for small sets and for sets of strings that share long prefixes.

In computer science, a **multimap** is a generalization of a map or associative array abstract data type in which more than one value may be associated with and returned for a given key. Both map and multimap are particular cases of containers. Often the multimap is implemented as a map with lists or sets as the map values.

The **Java collections framework** is a set of classes and interfaces that implement commonly reusable collection data structures.

In computer science, a **collection** or **container** is a grouping of some variable number of data items that have some shared significance to the problem being solved and need to be operated upon together in some controlled fashion. Generally, the data items will be of the same type or, in languages supporting inheritance, derived from some common ancestor type. A collection is a concept applicable to abstract data types, and does not prescribe a specific implementation as a concrete data structure, though often there is a conventional choice.

This **Comparison of programming languages ** compares the features of associative array data structures or array-lookup processing for over 40 computer programming languages.

In computer science, an **array type** is a data type that represents a collection of *elements*, each selected by one or more indices that can be computed at run time during program execution. Such a collection is usually called an **array variable**, **array value**, or simply **array**. By analogy with the mathematical concepts vector and matrix, array types with one and two indices are often called **vector type** and **matrix type**, respectively. More generally, a multidimensional array type can be called a **tensor type**.

In computing, **associative containers** refer to a group of class templates in the standard library of the C++ programming language that implement ordered associative arrays. Being templates, they can be used to store arbitrary elements, such as integers or custom classes. The following containers are defined in the current revision of the C++ standard: `set`

, `map`

, `multiset`

, `multimap`

. Each of these containers differ only on constraints placed on their elements.

- ↑ Python: pop()
- ↑
*Management and Processing of Complex Data Structures: Third Workshop on Information Systems and Artificial Intelligence, Hamburg, Germany, February 28 - March 2, 1994. Proceedings,*ed. Kai v. Luck, Heinz Marburger, p. 76 - ↑ Python Issue7212: Retrieve an arbitrary element from a set without removing it; see msg106593 regarding standard name
- ↑ Ruby Feature #4553: Add Set#pick and Set#pop
- ↑
*Inductive Synthesis of Functional Programs: Universal Planning, Folding of Finite Programs, and Schema Abstraction by Analogical Reasoning,*Ute Schmid, Springer, Aug 21, 2003, p. 240 - ↑
*Recent Trends in Data Type Specification: 10th Workshop on Specification of Abstract Data Types Joint with the 5th COMPASS Workshop, S. Margherita, Italy, May 30 - June 3, 1994. Selected Papers, Volume 10,*ed. Egidio Astesiano, Gianna Reggio, Andrzej Tarlecki, p. 38 - ↑ Ruby: flatten()
- ↑ Wang, Thomas (1997),
*Sorted Linear Hash Table*, archived from the original on 2006-01-12 - ↑ Stephen Adams, "
*Efficient sets: a balancing act*", Journal of Functional Programming 3(4):553-562, October 1993. Retrieved on 2015-03-11. - ↑ "ECMAScript 2015 Language Specification – ECMA-262 6th Edition".
*www.ecma-international.org*. Retrieved 2017-07-11.

This page is based on this Wikipedia article

Text is available under the CC BY-SA 4.0 license; additional terms may apply.

Images, videos and audio are available under their respective licenses.

Text is available under the CC BY-SA 4.0 license; additional terms may apply.

Images, videos and audio are available under their respective licenses.