Element distinctness problem

Last updated December 23, 2024

In computational complexity theory, the element distinctness problem or element uniqueness problem is the problem of determining whether all the elements of a list are distinct.

It is a well studied problem in many different models of computation. The problem may be solved by sorting the list and then checking if there are any consecutive equal elements; it may also be solved in linear expected time by a randomized algorithm that inserts each item into a hash table and compares only those elements that are placed in the same hash table cell.^[1]

Several lower bounds in computational complexity are proved by reducing the element distinctness problem to the problem in question, i.e., by demonstrating that the solution of the element uniqueness problem may be quickly found after solving the problem in question.

Decision tree complexity

The number of comparisons needed to solve the problem of size $n$ , in a comparison-based model of computation such as a decision tree or algebraic decision tree, is $\Theta (n\log n)$ . Here, $\Theta$ invokes big theta notation, meaning that the problem can be solved in a number of comparisons proportional to $n\log n$ (a linearithmic function) and that all solutions require this many comparisons.^[2] In these models of computation, the input numbers may not be used to index the computer's memory (as in the hash table solution) but may only be accessed by computing and comparing simple algebraic functions of their values. For these models, an algorithm based on comparison sort solves the problem within a constant factor of the best possible number of comparisons. The same lower bound applies as well to the expected number of comparisons in the randomized algebraic decision tree model.^[3]^[4]

Real RAM Complexity

If the elements in the problem are real numbers, the decision-tree lower bound extends to the real random-access machine model with an instruction set that includes addition, subtraction and multiplication of real numbers, as well as comparison and either division or remaindering ("floor").^[5] It follows that the problem's complexity in this model is also $\Theta (n\log n)$ . This RAM model covers more algorithms than the algebraic decision-tree model, as it encompasses algorithms that use indexing into tables. However, in this model all program steps are counted, not just decisions.

Turing Machine complexity

A single-tape deterministic Turing machine can solve the problem, for n elements of $m \geq log n$ bits each, in time $O (n 2 m (m +2-log n))$ , while on a nondeterministic machine the time complexity is $O (nm (n + log m))$ .^[6]

Quantum complexity

Quantum algorithms can solve this problem faster, in ${\textstyle \Theta (n^{2/3})}$ queries. The optimal algorithm is by Andris Ambainis.^[7] Yaoyun Shi first proved a tight lower bound when the size of the range is sufficiently large.^[8] Ambainis^[9] and Kutin^[10] independently (and via different proofs) extended his work to obtain the lower bound for all functions.

Generalization: Finding repeated elements

Elements that occur more than $n/k$ times in a multiset of size $n$ may be found by a comparison-based algorithm, the Misra–Gries heavy hitters algorithm, in time $O(n\log k)$ . The element distinctness problem is a special case of this problem where $k=n$ . This time is optimal under the decision tree model of computation.^[11]

Related Research Articles

In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the middle element of the array. If they are not equal, the half in which the target cannot lie is eliminated and the search continues on the remaining half, again taking the middle element to compare to the target value, and repeating this until the target value is found. If the search ends with the remaining half being empty, the target is not in the array.

In computer science, the computational complexity or simply complexity of an algorithm is the amount of resources required to run it. Particular focus is given to computation time and memory storage requirements. The complexity of a problem is the complexity of the best algorithms that allow solving the problem.

In theoretical computer science and mathematics, computational complexity theory focuses on classifying computational problems according to their resource usage, and explores the relationships between these classifications. A computational problem is a task solved by a computer. A computation problem is solvable by mechanical application of mathematical steps, such as an algorithm.

In quantum computing, Grover's algorithm, also known as the quantum search algorithm, is a quantum algorithm for unstructured search that finds with high probability the unique input to a black box function that produces a particular output value, using just $evaluations of the function, where is the size of the function's domain. It was devised by Lov Grover in 1996.$

In theoretical computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by the algorithm, supposing that each elementary operation takes a fixed amount of time to perform. Thus, the amount of time taken and the number of elementary operations performed by the algorithm are taken to be related by a constant factor.

A randomized algorithm is an algorithm that employs a degree of randomness as part of its logic or procedure. The algorithm typically uses uniformly random bits as an auxiliary input to guide its behavior, in the hope of achieving good performance in the "average case" over all possible choices of random determined by the random bits; thus either the running time, or the output are random variables.

In computational complexity theory, the 3SUM problem asks if a given set of $real numbers contains three elements that sum to zero. A generalized version, k -SUM, asks the same question on k elements, rather than simply 3. 3SUM can be easily solved in time, and matching lower bounds are known in some specialized models of computation.$

In quantum computing, a quantum algorithm is an algorithm that runs on a realistic model of quantum computation, the most commonly used model being the quantum circuit model of computation. A classical algorithm is a finite sequence of instructions, or a step-by-step procedure for solving a problem, where each step or instruction can be performed on a classical computer. Similarly, a quantum algorithm is a step-by-step procedure, where each of the steps can be performed on a quantum computer. Although all classical algorithms can also be performed on a quantum computer, the term quantum algorithm is generally reserved for algorithms that seem inherently quantum, or use some essential feature of quantum computation such as quantum superposition or quantum entanglement.

In computational complexity theory, Yao's principle relates the performance of randomized algorithms to deterministic (non-random) algorithms. It states that, for certain classes of algorithms, and certain measures of the performance of the algorithms, the following two quantities are equal:

<span class="mw-page-title-main">K-set (geometry)</span> Points separated from others by a line

In discrete geometry, a $-set of a finite point set in the Euclidean plane is a subset of elements of that can be strictly separated from the remaining points by a line. More generally, in Euclidean space of higher dimensions, a -set of a finite point set is a subset of elements that can be separated from the remaining points by a hyperplane. In particular, when, the line or hyperplane that separates a -set from the rest of is a halving line or halving plane .$

The closest pair of points problem or closest pair problem is a problem of computational geometry: given $points in metric space, find a pair of points with the smallest distance between them. The closest pair problem for points in the Euclidean plane was among the first geometric problems that were treated at the origins of the systematic study of the computational complexity of geometric algorithms.$

In theoretical computer science, the Aanderaa–Karp–Rosenberg conjecture is a group of related conjectures about the number of questions of the form "Is there an edge between vertex $and vertex ?" that have to be answered to determine whether or not an undirected graph has a particular property such as planarity or bipartiteness. They are named after Stål Aanderaa, Richard M. Karp, and Arnold L. Rosenberg. According to the conjecture, for a wide class of properties, no algorithm can guarantee that it will be able to skip any questions: any algorithm for determining whether the graph has the property, no matter how clever, might need to examine every pair of vertices before it can give its answer. A property satisfying this conjecture is called evasive.$

In computational complexity theory, arithmetic circuits are the standard model for computing polynomials. Informally, an arithmetic circuit takes as inputs either variables or numbers, and is allowed to either add or multiply two expressions it has already computed. Arithmetic circuits provide a formal way to understand the complexity of computing polynomials. The basic type of question in this line of research is "what is the most efficient way to compute a given polynomial $?"$

In computational complexity theory, the decision tree model is the model of computation in which an algorithm can be considered to be a decision tree, i.e. a sequence of queries or tests that are done adaptively, so the outcome of previous tests can influence the tests performed next.

Quantum complexity theory is the subfield of computational complexity theory that deals with complexity classes defined using quantum computers, a computational model based on quantum mechanics. It studies the hardness of computational problems in relation to these complexity classes, as well as the relationship between quantum complexity classes and classical complexity classes.

Random-access Turing machines (RATMs) represent a pivotal computational model in theoretical computer science, especially critical in the study of tractability within big data computing scenarios. Diverging from the sequential memory access limitations of conventional Turing machines, RATMs introduce the capability for random access to memory positions. This advancement is not merely a technical enhancement but a fundamental shift in computational paradigms, aligning more closely with the memory access patterns of modern computing systems. The inherent ability of RATMs to access any memory cell in a constant amount of time significantly bolsters computational efficiency, particularly for problem sets where data size and access speed are critical factors. Furthermore, the conceptual evolution of RATMs from traditional Turing machines marks a significant leap in the understanding of computational processes, providing a more realistic framework for analyzing algorithms that handle the complexities of large-scale data. This transition from a sequential to a random-access paradigm not only mirrors the advancements in real-world computing systems but also underscores the growing relevance of RATMs in addressing the challenges posed by big data applications.

In computer science, the cell-probe model is a model of computation similar to the random-access machine, except that all operations are free except memory access. This model is useful for proving lower bounds of algorithms for data structure problems.

In quantum computing, the Brassard–Høyer–Tapp algorithm or BHT algorithm is a quantum algorithm that solves the collision problem. In this problem, one is given n and an r-to-1 function $and needs to find two inputs that f maps to the same output. The BHT algorithm only makes queries to f, which matches the lower bound of in the black box model.$

In computer science, the predecessor problem involves maintaining a set of items to, given an element, efficiently query which element precedes or succeeds that element in an order. Data structures used to solve the problem include balanced binary search trees, van Emde Boas trees, and fusion trees. In the static predecessor problem, the set of elements does not change, but in the dynamic predecessor problem, insertions into and deletions from the set are allowed.

In computational geometry, Hopcroft's problem is the problem of testing, for a given system of points and lines in the Euclidean plane, whether at least one of the points lies on at least one of the lines. More generally, one may ask for the number of point–line incidences. Both versions of the problem can be solved in time $, where is the total number of points and lines. This time bound matches the bound of on the total number of point-line incidences given by the Szemerédi-Trotter theorem. Hopcroft's problem is named after John Hopcroft, who posed it in the early 1980s. Its computational complexity is closely connected to the complexity of several other problems in computational geometry, including that of three-dimensional Euclidean minimum spanning trees.$

References

↑ Gil, J.; Meyer auf der Heide, F.; Wigderson, A. (1990), "Not all keys can be hashed in constant time", Proc. 22nd ACM Symposium on Theory of Computing , pp. 244–253, doi: 10.1145/100216.100247 , S2CID 11943779 .
↑ Ben-Or, Michael (1983), "Lower bounds for algebraic computation trees", Proc. 15th ACM Symposium on Theory of Computing , pp. 80–86, doi: 10.1145/800061.808735 .
↑ Grigoriev, Dima; Karpinski, Marek; Heide, Friedhelm Meyer; Smolensky, Roman (1996), "A lower bound for randomized algebraic decision trees", Computational Complexity, 6 (4): 357, doi:10.1007/BF01270387, S2CID 1462184 .
↑ Grigoriev, Dima (1999), "Complexity lower bounds for randomized computation trees over zero characteristic fields", Computational Complexity, 8 (4): 316–329, doi:10.1007/s000370050002, S2CID 10641238 .
↑ Ben-Amram, Amir M.; Galil, Zvi (2001), "Topological Lower Bounds on Algebraic Random Access Machines", SIAM Journal on Computing, 31 (3): 722–761, doi:10.1137/S0097539797329397 .
↑ Ben-Amram, Amir M.; Berkman, Omer; Petersen, Holger (2003), "Element distinctness on one-tape Turing machines: a complete solution.", Acta Informatica, 40 (2): 81–94, doi:10.1007/s00236-003-0125-8, S2CID 24821585
↑ Ambainis, Andris (2007), "Quantum walk algorithm for element distinctness", SIAM Journal on Computing , 37 (1): 210–239, arXiv: quant-ph/0311001 , doi:10.1137/S0097539705447311
↑ Shi, Y. (2002). Quantum lower bounds for the collision and the element distinctness problems. Proceedings of the 43rd Symposium on Foundations of Computer Science. pp. 513–519. arXiv: quant-ph/0112086 . doi:10.1109/SFCS.2002.1181975.
↑ Ambainis, A. (2005). "Polynomial Degree and Lower Bounds in Quantum Complexity: Collision and Element Distinctness with Small Range". Theory of Computing . 1 (1): 37–46. doi: 10.4086/toc.2005.v001a003 .
↑ Kutin, S. (2005). "Quantum Lower Bound for the Collision Problem with Small Range". Theory of Computing . 1 (1): 29–36. doi: 10.4086/toc.2005.v001a002 .
↑ Misra, J.; Gries, D. (1982), "Finding repeated elements", Science of Computer Programming, 2 (2): 143–152, doi:10.1016/0167-6423(82)90012-0, hdl: 1813/6345 .

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Gil, J.; Meyer auf der Heide, F.; Wigderson, A. (1990), "Not all keys can be hashed in constant time", Proc. 22nd ACM Symposium on Theory of Computing , pp. 244–253, doi: 10.1145/100216.100247 , S2CID 11943779 .

[2] Ben-Or, Michael (1983), "Lower bounds for algebraic computation trees", Proc. 15th ACM Symposium on Theory of Computing , pp. 80–86, doi: 10.1145/800061.808735 .

[3] Grigoriev, Dima; Karpinski, Marek; Heide, Friedhelm Meyer; Smolensky, Roman (1996), "A lower bound for randomized algebraic decision trees", Computational Complexity, 6 (4): 357, doi:10.1007/BF01270387, S2CID 1462184 .

[4] Grigoriev, Dima (1999), "Complexity lower bounds for randomized computation trees over zero characteristic fields", Computational Complexity, 8 (4): 316–329, doi:10.1007/s000370050002, S2CID 10641238 .

[5] Ben-Amram, Amir M.; Galil, Zvi (2001), "Topological Lower Bounds on Algebraic Random Access Machines", SIAM Journal on Computing, 31 (3): 722–761, doi:10.1137/S0097539797329397 .

[6] Ben-Amram, Amir M.; Berkman, Omer; Petersen, Holger (2003), "Element distinctness on one-tape Turing machines: a complete solution.", Acta Informatica, 40 (2): 81–94, doi:10.1007/s00236-003-0125-8, S2CID 24821585

[7] Ambainis, Andris (2007), "Quantum walk algorithm for element distinctness", SIAM Journal on Computing , 37 (1): 210–239, arXiv: quant-ph/0311001 , doi:10.1137/S0097539705447311

[8] Shi, Y. (2002). Quantum lower bounds for the collision and the element distinctness problems. Proceedings of the 43rd Symposium on Foundations of Computer Science. pp. 513–519. arXiv: quant-ph/0112086 . doi:10.1109/SFCS.2002.1181975.

[9] Ambainis, A. (2005). "Polynomial Degree and Lower Bounds in Quantum Complexity: Collision and Element Distinctness with Small Range". Theory of Computing . 1 (1): 37–46. doi: 10.4086/toc.2005.v001a003 .

[10] Kutin, S. (2005). "Quantum Lower Bound for the Collision Problem with Small Range". Theory of Computing . 1 (1): 29–36. doi: 10.4086/toc.2005.v001a002 .

[11] Misra, J.; Gries, D. (1982), "Finding repeated elements", Science of Computer Programming, 2 (2): 143–152, doi:10.1016/0167-6423(82)90012-0, hdl: 1813/6345 .

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]