In computational complexity theory, the 3SUM problem asks if a given set of real numbers contains three elements that sum to zero. A generalized version, k-SUM, asks the same question on k elements, rather than simply 3. 3SUM can be easily solved in time, and matching lower bounds are known in some specialized models of computation ( Erickson 1999 ).
It was conjectured that any deterministic algorithm for the 3SUM requires time. In 2014, the original 3SUM conjecture was refuted by Allan Grønlund and Seth Pettie who gave a deterministic algorithm that solves 3SUM in time. [1] Additionally, Grønlund and Pettie showed that the 4-linear decision tree complexity of 3SUM is . These bounds were subsequently improved. [2] [3] [4] The current best known algorithm for 3SUM runs in time. [4] Kane, Lovett, and Moran showed that the 6-linear decision tree complexity of 3SUM is . [5] The latter bound is tight (up to a logarithmic factor). It is still conjectured that 3SUM is unsolvable in expected time. [6]
When the elements are integers in the range , 3SUM can be solved in time by representing the input set as a bit vector, computing the set of all pairwise sums as a discrete convolution using the fast Fourier transform, and finally comparing this set to . [7]
Suppose the input array is . In integer (word RAM) models of computing, 3SUM can be solved in time on average by inserting each number into a hash table, and then, for each index and , checking whether the hash table contains the integer .
It is also possible to solve the problem in the same time in a comparison-based model of computing or real RAM, for which hashing is not allowed. The algorithm below first sorts the input array and then tests all possible pairs in a careful order that avoids the need to binary search for the pairs in the sorted list, achieving worst-case time, as follows. [8]
sort(S); for i = 0 to n - 2 do a = S[i]; start = i + 1; end = n - 1; while (start < end) do b = S[start] c = S[end]; if (a + b + c == 0) thenoutput a, b, c; // Continue search for all triplet combinations summing to zero. // We need to update both end and start together since the array values are distinct. start = start + 1; end = end - 1; elseif (a + b + c > 0) then end = end - 1; else start = start + 1; endend
The following example shows this algorithm's execution on a small sorted array. Current values of a are shown in red, values of b and c are shown in magenta.
-25-10 -7 -3 2 4 8 10 (a+b+c==-25) -25 -10 -7 -3 2 4 8 10 (a+b+c==-22) . . . -25 -10 -7 -3 2 4 810 (a+b+c==-7) -25 -10-7 -3 2 4 8 10 (a+b+c==-7) -25 -10 -7 -3 2 4 8 10 (a+b+c==-3) -25 -10 -7 -3 2 4 8 10 (a+b+c==2) -25 -10 -7 -3 2 4 8 10 (a+b+c==0)
The correctness of the algorithm can be seen as follows. Suppose we have a solution a + b + c = 0. Since the pointers only move in one direction, we can run the algorithm until the leftmost pointer points to a. Run the algorithm until either one of the remaining pointers points to b or c, whichever occurs first. Then the algorithm will run until the last pointer points to the remaining term, giving the affirmative solution.
Instead of looking for numbers whose sum is 0, it is possible to look for numbers whose sum is any constant C. The simplest way would be to modify the original algorithm to search the hash table for the integer .
Another method:
For example, if A=[1,2,3,4] and if you are asked to find 3SUM for C=4, then subtract 4/3 from all the elements of A, and solve it in the usual 3sum way, i.e., .
Instead of searching for the 3 numbers in a single array, we can search for them in 3 different arrays. I.e., given three arrays X, Y and Z, find three numbers a∈X, b∈Y, c∈Z, such that . Call the 1-array variant 3SUM×1 and the 3-array variant 3SUM×3.
Given a solver for 3SUM×1, the 3SUM×3 problem can be solved in the following way (assuming all elements are integers):
By the way we transformed the arrays, it is guaranteed that a∈X, b∈Y, c∈Z. [9]
Instead of looking for arbitrary elements of the array such that:
the convolution 3sum problem (Conv3SUM) looks for elements in specific locations: [10]
Given a solver for 3SUM, the Conv3SUM problem can be solved in the following way. [10]
Correctness proof:
Given a solver for Conv3SUM, the 3SUM problem can be solved in the following way. [6] [10]
The reduction uses a hash function. As a first approximation, assume that we have a linear hash function, i.e. a function h such that:
Suppose that all elements are integers in the range: 0...N-1, and that the function h maps each element to an element in the smaller range of indices: 0...n-1. Create a new array T and send each element of S to its hash value in T, i.e., for every x in S():
Initially, suppose that the mappings are unique (i.e. each cell in T accepts only a single element from S). Solve Conv3SUM on T. Now:
This idealized solution doesn't work, because any hash function might map several distinct elements of S to the same cell of T. The trick is to create an array by selecting a single random element from each cell of T, and run Conv3SUM on . If a solution is found, then it is a correct solution for 3SUM on S. If no solution is found, then create a different random and try again. Suppose there are at most R elements in each cell of T. Then the probability of finding a solution (if a solution exists) is the probability that the random selection will select the correct element from each cell, which is . By running Conv3SUM times, the solution will be found with a high probability.
Unfortunately, we do not have linear perfect hashing, so we have to use an almost linear hash function, i.e. a function h such that:
This requires to duplicate the elements of S when copying them into T, i.e., put every element both in (as before) and in . So each cell will have 2R elements, and we will have to run Conv3SUM times.
A problem is called 3SUM-hard if solving it in subquadratic time implies a subquadratic-time algorithm for 3SUM. The concept of 3SUM-hardness was introduced by Gajentaan & Overmars (1995). They proved that a large class of problems in computational geometry are 3SUM-hard, including the following ones. (The authors acknowledge that many of these problems are contributed by other researchers.)
By now there are a multitude of other problems that fall into this category. An example is the decision version of X + Y sorting: given sets of numbers X and Y of n elements each, are there n² distinct x + y for x ∈ X, y ∈ Y? [11]
In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the middle element of the array. If they are not equal, the half in which the target cannot lie is eliminated and the search continues on the remaining half, again taking the middle element to compare to the target value, and repeating this until the target value is found. If the search ends with the remaining half being empty, the target is not in the array.
The knapsack problem is the following problem in combinatorial optimization:
The subset sum problem (SSP) is a decision problem in computer science. In its most general formulation, there is a multiset of integers and a target-sum , and the question is to decide whether any subset of the integers sum to precisely . The problem is known to be NP-complete. Moreover, some restricted variants of it are NP-complete too, for example:
The assignment problem is a fundamental combinatorial optimization problem. In its most general form, the problem is as follows:
An integer programming problem is a mathematical optimization or feasibility program in which some or all of the variables are restricted to be integers. In many settings the term refers to integer linear programming (ILP), in which the objective function and the constraints are linear.
In computer science, a longest common substring of two or more strings is a longest string that is a substring of all of them. There may be more than one longest common substring. Applications include data deduplication and plagiarism detection.
Quicksort is an efficient, general-purpose sorting algorithm. Quicksort was developed by British computer scientist Tony Hoare in 1959 and published in 1961. It is still a commonly used algorithm for sorting. Overall, it is slightly faster than merge sort and heapsort for randomized data, particularly on larger distributions.
Semidefinite programming (SDP) is a subfield of mathematical programming concerned with the optimization of a linear objective function over the intersection of the cone of positive semidefinite matrices with an affine space, i.e., a spectrahedron.
The study of facility location problems (FLP), also known as location analysis, is a branch of operations research and computational geometry concerned with the optimal placement of facilities to minimize transportation costs while considering factors like avoiding placing hazardous materials near housing, and competitors' facilities. The techniques also apply to cluster analysis.
In computer science, streaming algorithms are algorithms for processing data streams in which the input is presented as a sequence of items and can be examined in only a few passes, typically just one. These algorithms are designed to operate with limited memory, generally logarithmic in the size of the stream and/or in the maximum value in the stream, and may also have limited processing time per item.
In mathematics the Function Field Sieve is one of the most efficient algorithms to solve the Discrete Logarithm Problem (DLP) in a finite field. It has heuristic subexponential complexity. Leonard Adleman developed it in 1994 and then elaborated it together with M. D. Huang in 1999. Previous work includes the work of D. Coppersmith about the DLP in fields of characteristic two.
In cryptography, Very Smooth Hash (VSH) is a provably secure cryptographic hash function invented in 2005 by Scott Contini, Arjen Lenstra and Ron Steinfeld. Provably secure means that finding collisions is as difficult as some known hard mathematical problem. Unlike other provably secure collision-resistant hashes, VSH is efficient and usable in practice. Asymptotically, it only requires a single multiplication per log(n) message-bits and uses RSA-type arithmetic. Therefore, VSH can be useful in embedded environments where code space is limited.
In discrete mathematics, ideal lattices are a special class of lattices and a generalization of cyclic lattices. Ideal lattices naturally occur in many parts of number theory, but also in other areas. In particular, they have a significant place in cryptography. Micciancio defined a generalization of cyclic lattices as ideal lattices. They can be used in cryptosystems to decrease by a square root the number of parameters necessary to describe a lattice, making them more efficient. Ideal lattices are a new concept, but similar lattice classes have been used for a long time. For example, cyclic lattices, a special case of ideal lattices, are used in NTRUEncrypt and NTRUSign.
In computer science and data mining, MinHash is a technique for quickly estimating how similar two sets are. The scheme was invented by Andrei Broder, and initially used in the AltaVista search engine to detect duplicate web pages and eliminate them from search results. It has also been applied in large-scale clustering problems, such as clustering documents by the similarity of their sets of words.
In computer science, the range query problem consists of efficiently answering several queries regarding a given interval of elements within an array. For example, a common task, known as range minimum query, is finding the smallest value inside a given range within a list of numbers.
In computer science, sorting is the problem of sorting pairs of numbers by their sums. Applications of the problem include transit fare minimisation, VLSI design, and sparse polynomial multiplication. As with comparison sorting and integer sorting more generally, algorithms for this problem can be based only on comparisons of these sums, or on other operations that work only when the inputs are small integers.
In computer science, the count-distinct problem (also known in applied mathematics as the cardinality estimation problem) is the problem of finding the number of distinct elements in a data stream with repeated elements. This is a well-known problem with numerous applications. The elements might represent IP addresses of packets passing through a router, unique visitors to a web site, elements in a large database, motifs in a DNA sequence, or elements of RFID/sensor networks.
Quantum optimization algorithms are quantum algorithms that are used to solve optimization problems. Mathematical optimization deals with finding the best solution to a problem from a set of possible solutions. Mostly, the optimization problem is formulated as a minimization problem, where one tries to minimize an error which depends on the solution: the optimal solution has the minimal error. Different optimization techniques are applied in various fields such as mechanics, economics and engineering, and as the complexity and amount of data involved rise, more efficient ways of solving optimization problems are needed. Quantum computing may allow problems which are not practically feasible on classical computers to be solved, or suggest a considerable speed up with respect to the best known classical algorithm.
The configuration linear program (configuration-LP) is a linear programming technique used for solving combinatorial optimization problems. It was introduced in the context of the cutting stock problem. Later, it has been applied to the bin packing and job scheduling problems. In the configuration-LP, there is a variable for each possible configuration - each possible multiset of items that can fit in a single bin. Usually, the number of configurations is exponential in the problem size, but in some cases it is possible to attain approximate solutions using only a polynomial number of configurations.
The Karmarkar–Karp (KK) bin packing algorithms are several related approximation algorithm for the bin packing problem. The bin packing problem is a problem of packing items of different sizes into bins of identical capacity, such that the total number of bins is as small as possible. Finding the optimal solution is computationally hard. Karmarkar and Karp devised an algorithm that runs in polynomial time and finds a solution with at most bins, where OPT is the number of bins in the optimal solution. They also devised several other algorithms with slightly different approximation guarantees and run-time bounds.