3SUM

Last updated
Unsolved problem in computer science:
Is there an algorithm to solve the 3SUM problem in time , for some ?

In computational complexity theory, the 3SUM problem asks if a given set of real numbers contains three elements that sum to zero. A generalized version, k-SUM, asks the same question on k elements, rather than simply 3. 3SUM can be easily solved in time, and matching lower bounds are known in some specialized models of computation ( Erickson 1999 ).

Contents

It was conjectured that any deterministic algorithm for the 3SUM requires time. In 2014, the original 3SUM conjecture was refuted by Allan Grønlund and Seth Pettie who gave a deterministic algorithm that solves 3SUM in time. [1] Additionally, Grønlund and Pettie showed that the 4-linear decision tree complexity of 3SUM is . These bounds were subsequently improved. [2] [3] [4] The current best known algorithm for 3SUM runs in time. [4] Kane, Lovett, and Moran showed that the 6-linear decision tree complexity of 3SUM is . [5] The latter bound is tight (up to a logarithmic factor). It is still conjectured that 3SUM is unsolvable in expected time. [6]

When the elements are integers in the range , 3SUM can be solved in time by representing the input set as a bit vector, computing the set of all pairwise sums as a discrete convolution using the fast Fourier transform, and finally comparing this set to . [7]

Quadratic algorithm

Suppose the input array is . In integer (word RAM) models of computing, 3SUM can be solved in time on average by inserting each number into a hash table, and then, for each index and , checking whether the hash table contains the integer .

It is also possible to solve the problem in the same time in a comparison-based model of computing or real RAM, for which hashing is not allowed. The algorithm below first sorts the input array and then tests all possible pairs in a careful order that avoids the need to binary search for the pairs in the sorted list, achieving worst-case time, as follows. [8]

sort(S); for i = 0 to n - 2 do     a = S[i];     start = i + 1;     end = n - 1;     while (start < end) do         b = S[start]         c = S[end];         if (a + b + c == 0) thenoutput a, b, c;             // Continue search for all triplet combinations summing to zero.             // We need to update both end and start together since the array values are distinct.             start = start + 1;             end = end - 1;         elseif (a + b + c > 0) then             end = end - 1;         else             start = start + 1;     endend

The following example shows this algorithm's execution on a small sorted array. Current values of a are shown in red, values of b and c are shown in magenta.

-25-10 -7 -3 2 4 8 10  (a+b+c==-25)  -25 -10 -7 -3 2 4 8 10  (a+b+c==-22)  . . .  -25 -10 -7 -3 2 4 810  (a+b+c==-7)  -25 -10-7 -3 2 4 8 10  (a+b+c==-7)  -25 -10 -7 -3 2 4 8 10  (a+b+c==-3)  -25 -10 -7 -3 2 4 8 10  (a+b+c==2)  -25 -10 -7 -3 2 4 8 10  (a+b+c==0)

The correctness of the algorithm can be seen as follows. Suppose we have a solution a + b + c = 0. Since the pointers only move in one direction, we can run the algorithm until the leftmost pointer points to a. Run the algorithm until either one of the remaining pointers points to b or c, whichever occurs first. Then the algorithm will run until the last pointer points to the remaining term, giving the affirmative solution.

Variants

Non-zero sum

Instead of looking for numbers whose sum is 0, it is possible to look for numbers whose sum is any constant C. The simplest way would be to modify the original algorithm to search the hash table for the integer .

Another method:

For example, if A=[1,2,3,4] and if you are asked to find 3SUM for C=4, then subtract 4/3 from all the elements of A, and solve it in the usual 3sum way, i.e., .

Three different arrays

Instead of searching for the 3 numbers in a single array, we can search for them in 3 different arrays. I.e., given three arrays X, Y and Z, find three numbers aX, bY, cZ, such that . Call the 1-array variant 3SUM×1 and the 3-array variant 3SUM×3.

Given a solver for 3SUM×1, the 3SUM×3 problem can be solved in the following way (assuming all elements are integers):

By the way we transformed the arrays, it is guaranteed that aX, bY, cZ. [9]

Convolution sum

Instead of looking for arbitrary elements of the array such that:

the convolution 3sum problem (Conv3SUM) looks for elements in specific locations: [10]

Reduction from Conv3SUM to 3SUM

Given a solver for 3SUM, the Conv3SUM problem can be solved in the following way. [10]

  • Define a new array T, such that for every index i: (where n is the number of elements in the array, and the indices run from 0 to n-1).
  • Solve 3SUM on the array T.

Correctness proof:

  • If in the original array there is a triple with , then , so this solution will be found by 3SUM on T.
  • Conversely, if in the new array there is a triple with , then . Because , necessarily and , so this is a valid solution for Conv3SUM on S.

Reduction from 3SUM to Conv3SUM

Given a solver for Conv3SUM, the 3SUM problem can be solved in the following way. [6] [10]

The reduction uses a hash function. As a first approximation, assume that we have a linear hash function, i.e. a function h such that:

Suppose that all elements are integers in the range: 0...N-1, and that the function h maps each element to an element in the smaller range of indices: 0...n-1. Create a new array T and send each element of S to its hash value in T, i.e., for every x in S():

Initially, suppose that the mappings are unique (i.e. each cell in T accepts only a single element from S). Solve Conv3SUM on T. Now:

  • If there is a solution for 3SUM: , then: and , so this solution will be found by the Conv3SUM solver on T.
  • Conversely, if a Conv3SUM is found on T, then obviously it corresponds to a 3SUM solution on S since T is just a permutation of S.

This idealized solution doesn't work, because any hash function might map several distinct elements of S to the same cell of T. The trick is to create an array by selecting a single random element from each cell of T, and run Conv3SUM on . If a solution is found, then it is a correct solution for 3SUM on S. If no solution is found, then create a different random and try again. Suppose there are at most R elements in each cell of T. Then the probability of finding a solution (if a solution exists) is the probability that the random selection will select the correct element from each cell, which is . By running Conv3SUM times, the solution will be found with a high probability.

Unfortunately, we do not have linear perfect hashing, so we have to use an almost linear hash function, i.e. a function h such that:

or

This requires to duplicate the elements of S when copying them into T, i.e., put every element both in (as before) and in . So each cell will have 2R elements, and we will have to run Conv3SUM times.

3SUM-hardness

A problem is called 3SUM-hard if solving it in subquadratic time implies a subquadratic-time algorithm for 3SUM. The concept of 3SUM-hardness was introduced by Gajentaan & Overmars (1995). They proved that a large class of problems in computational geometry are 3SUM-hard, including the following ones. (The authors acknowledge that many of these problems are contributed by other researchers.)

By now there are a multitude of other problems that fall into this category. An example is the decision version of X + Y sorting: given sets of numbers X and Y of n elements each, are there n² distinct x + y for xX, yY? [11]

See also

Notes

  1. Grønlund & Pettie 2014.
  2. Freund 2017.
  3. Gold & Sharir 2017.
  4. 1 2 Chan 2018.
  5. Kane, Lovett & Moran 2018.
  6. 1 2 Kopelowitz, Tsvi; Pettie, Seth; Porat, Ely (2014). "3SUM Hardness in (Dynamic) Data Structures". arXiv: 1407.6756 [cs.DS].
  7. Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2009) [1990]. Introduction to Algorithms (3rd ed.). MIT Press and McGraw-Hill. ISBN   0-262-03384-4. Ex. 30.1–7, p. 906.
  8. Visibility Graphs and 3-Sum by Michael Hoffmann
  9. For a reduction in the other direction, see Variants of the 3-sum problem.
  10. 1 2 3 Patrascu, M. (2010). Towards polynomial lower bounds for dynamic problems. Proceedings of the 42nd ACM symposium on Theory of computing - STOC '10. p. 603. doi:10.1145/1806689.1806772. ISBN   9781450300506.
  11. Demaine, Erik; Erickson, Jeff; O'Rourke, Joseph (20 August 2006). "Problem 41: Sorting X + Y (Pairwise Sums)". The Open Problems Project. Retrieved 23 September 2014.

Related Research Articles

<span class="mw-page-title-main">Binary search</span> Search algorithm finding the position of a target value within a sorted array

In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the middle element of the array. If they are not equal, the half in which the target cannot lie is eliminated and the search continues on the remaining half, again taking the middle element to compare to the target value, and repeating this until the target value is found. If the search ends with the remaining half being empty, the target is not in the array.

<span class="mw-page-title-main">Knapsack problem</span> Problem in combinatorial optimization

The knapsack problem is the following problem in combinatorial optimization:

The subset sum problem (SSP) is a decision problem in computer science. In its most general formulation, there is a multiset of integers and a target-sum , and the question is to decide whether any subset of the integers sum to precisely . The problem is known to be NP-complete. Moreover, some restricted variants of it are NP-complete too, for example:

<span class="mw-page-title-main">Assignment problem</span> Combinatorial optimization problem

The assignment problem is a fundamental combinatorial optimization problem. In its most general form, the problem is as follows:

An integer programming problem is a mathematical optimization or feasibility program in which some or all of the variables are restricted to be integers. In many settings the term refers to integer linear programming (ILP), in which the objective function and the constraints are linear.

In computer science, a longest common substring of two or more strings is a longest string that is a substring of all of them. There may be more than one longest common substring. Applications include data deduplication and plagiarism detection.

<span class="mw-page-title-main">Quicksort</span> Divide and conquer sorting algorithm

Quicksort is an efficient, general-purpose sorting algorithm. Quicksort was developed by British computer scientist Tony Hoare in 1959 and published in 1961. It is still a commonly used algorithm for sorting. Overall, it is slightly faster than merge sort and heapsort for randomized data, particularly on larger distributions.

Semidefinite programming (SDP) is a subfield of mathematical programming concerned with the optimization of a linear objective function over the intersection of the cone of positive semidefinite matrices with an affine space, i.e., a spectrahedron.

The study of facility location problems (FLP), also known as location analysis, is a branch of operations research and computational geometry concerned with the optimal placement of facilities to minimize transportation costs while considering factors like avoiding placing hazardous materials near housing, and competitors' facilities. The techniques also apply to cluster analysis.

In computer science, streaming algorithms are algorithms for processing data streams in which the input is presented as a sequence of items and can be examined in only a few passes, typically just one. These algorithms are designed to operate with limited memory, generally logarithmic in the size of the stream and/or in the maximum value in the stream, and may also have limited processing time per item.

In mathematics the Function Field Sieve is one of the most efficient algorithms to solve the Discrete Logarithm Problem (DLP) in a finite field. It has heuristic subexponential complexity. Leonard Adleman developed it in 1994 and then elaborated it together with M. D. Huang in 1999. Previous work includes the work of D. Coppersmith about the DLP in fields of characteristic two.

In cryptography, Very Smooth Hash (VSH) is a provably secure cryptographic hash function invented in 2005 by Scott Contini, Arjen Lenstra and Ron Steinfeld. Provably secure means that finding collisions is as difficult as some known hard mathematical problem. Unlike other provably secure collision-resistant hashes, VSH is efficient and usable in practice. Asymptotically, it only requires a single multiplication per log(n) message-bits and uses RSA-type arithmetic. Therefore, VSH can be useful in embedded environments where code space is limited.

In discrete mathematics, ideal lattices are a special class of lattices and a generalization of cyclic lattices. Ideal lattices naturally occur in many parts of number theory, but also in other areas. In particular, they have a significant place in cryptography. Micciancio defined a generalization of cyclic lattices as ideal lattices. They can be used in cryptosystems to decrease by a square root the number of parameters necessary to describe a lattice, making them more efficient. Ideal lattices are a new concept, but similar lattice classes have been used for a long time. For example, cyclic lattices, a special case of ideal lattices, are used in NTRUEncrypt and NTRUSign.

In computer science and data mining, MinHash is a technique for quickly estimating how similar two sets are. The scheme was invented by Andrei Broder, and initially used in the AltaVista search engine to detect duplicate web pages and eliminate them from search results. It has also been applied in large-scale clustering problems, such as clustering documents by the similarity of their sets of words.

In computer science, the range query problem consists of efficiently answering several queries regarding a given interval of elements within an array. For example, a common task, known as range minimum query, is finding the smallest value inside a given range within a list of numbers.

<i>X</i> + <i>Y</i> sorting Problem of sorting pairs of numbers by their sum

In computer science, sorting is the problem of sorting pairs of numbers by their sums. Applications of the problem include transit fare minimisation, VLSI design, and sparse polynomial multiplication. As with comparison sorting and integer sorting more generally, algorithms for this problem can be based only on comparisons of these sums, or on other operations that work only when the inputs are small integers.

In computer science, the count-distinct problem (also known in applied mathematics as the cardinality estimation problem) is the problem of finding the number of distinct elements in a data stream with repeated elements. This is a well-known problem with numerous applications. The elements might represent IP addresses of packets passing through a router, unique visitors to a web site, elements in a large database, motifs in a DNA sequence, or elements of RFID/sensor networks.

Quantum optimization algorithms are quantum algorithms that are used to solve optimization problems. Mathematical optimization deals with finding the best solution to a problem from a set of possible solutions. Mostly, the optimization problem is formulated as a minimization problem, where one tries to minimize an error which depends on the solution: the optimal solution has the minimal error. Different optimization techniques are applied in various fields such as mechanics, economics and engineering, and as the complexity and amount of data involved rise, more efficient ways of solving optimization problems are needed. Quantum computing may allow problems which are not practically feasible on classical computers to be solved, or suggest a considerable speed up with respect to the best known classical algorithm.

The configuration linear program (configuration-LP) is a linear programming technique used for solving combinatorial optimization problems. It was introduced in the context of the cutting stock problem. Later, it has been applied to the bin packing and job scheduling problems. In the configuration-LP, there is a variable for each possible configuration - each possible multiset of items that can fit in a single bin. Usually, the number of configurations is exponential in the problem size, but in some cases it is possible to attain approximate solutions using only a polynomial number of configurations.

The Karmarkar–Karp (KK) bin packing algorithms are several related approximation algorithm for the bin packing problem. The bin packing problem is a problem of packing items of different sizes into bins of identical capacity, such that the total number of bins is as small as possible. Finding the optimal solution is computationally hard. Karmarkar and Karp devised an algorithm that runs in polynomial time and finds a solution with at most bins, where OPT is the number of bins in the optimal solution. They also devised several other algorithms with slightly different approximation guarantees and run-time bounds.

References