3SUM

Last updated
Unsolved problem in computer science:

Is there an algorithm to solve the 3SUM problem in time , for some ?

Contents

In computational complexity theory, the 3SUM problem asks if a given set of real numbers contains three elements that sum to zero. A generalized version, k-SUM, asks the same question on k numbers. 3SUM can be easily solved in time, and matching lower bounds are known in some specialized models of computation ( Erickson 1999 ).

It was conjectured that any deterministic algorithm for the 3SUM requires time. In 2014, the original 3SUM conjecture was refuted by Allan Grønlund and Seth Pettie who gave a deterministic algorithm that solves 3SUM in time. [1] Additionally, Grønlund and Pettie showed that the 4-linear decision tree complexity of 3SUM is . These bounds were subsequently improved. [2] [3] [4] The current best known algorithm for 3SUM runs in time. [4] Kane, Lovett, and Moran showed that the 6-linear decision tree complexity of 3SUM is . [5] The latter bound is tight (up to a logarithmic factor). It is still conjectured that 3SUM is unsolvable in expected time. [6]

When the elements are integers in the range , 3SUM can be solved in time by representing the input set as a bit vector, computing the set of all pairwise sums as a discrete convolution using the Fast Fourier transform, and finally comparing this set to . [7]

Quadratic algorithm

Suppose the input array is . In integer (word RAM) models of computing, 3SUM can be solved in time on average by inserting each number into a hash table, and then, for each index and , checking whether the hash table contains the integer .

It is also possible to solve the problem in the same time in a comparison-based model of computing or real RAM, for which hashing is not allowed. The algorithm below first sorts the input array and then tests all possible pairs in a careful order that avoids the need to binary search for the pairs in the sorted list, achieving worst-case time, as follows. [8]

sort(S); for i = 0 to n - 2 do     a = S[i];     start = i + 1;     end = n - 1;     while (start < end) do         b = S[start]         c = S[end];         if (a + b + c == 0) thenoutput a, b, c;             // Continue search for all triplet combinations summing to zero.             // We need to update both end and start together since the array values are distinct.             start = start + 1;             end = end - 1;         elseif (a + b + c > 0) then             end = end - 1;         else             start = start + 1;     endend

The following example shows this algorithm's execution on a small sorted array. Current values of a are shown in red, values of b and c are shown in magenta.

-25-10 -7 -3 2 4 8 10  (a+b+c==-25)  -25 -10 -7 -3 2 4 8 10  (a+b+c==-22)  . . .  -25 -10 -7 -3 2 4 810  (a+b+c==-7)  -25 -10-7 -3 2 4 8 10  (a+b+c==-7)  -25 -10 -7 -3 2 4 8 10  (a+b+c==-3)  -25 -10 -7 -3 2 4 8 10  (a+b+c==2)  -25 -10 -7 -3 2 4 8 10  (a+b+c==0)

The correctness of the algorithm can be seen as follows. Suppose we have a solution a + b + c = 0. Since the pointers only move in one direction, we can run the algorithm until the leftmost pointer points to a. Run the algorithm until either one of the remaining pointers points to b or c, whichever occurs first. Then the algorithm will run until the last pointer points to the remaining term, giving the affirmative solution.

Variants

Non-zero sum

Instead of looking for numbers whose sum is 0, it is possible to look for numbers whose sum is any constant C in the following way:

For e.g., if A=[1,2,3,4] and if you are asked to find 3sum for C=4, then subtract all the elements of A by 4/3 and solve it in the usual 3sum way, i.e.,

Or you could simply modify the original algorithm to search the hash table for the integer .

3 different arrays

Instead of searching for the 3 numbers in a single array, we can search for them in 3 different arrays. I.e., given three arrays X, Y and Z, find three numbers aX, bY, cZ, such that . Call the 1-array variant 3SUM×1 and the 3-array variant 3SUM×3.

Given a solver for 3SUM×1, the 3SUM×3 problem can be solved in the following way (assuming all elements are integers):

By the way we transformed the arrays, it is guaranteed that aX, bY, cZ. [9]

Convolution sum

Instead of looking for arbitrary elements of the array such that:

the convolution 3sum problem (Conv3SUM) looks for elements in specific locations: [10]

Reduction from Conv3SUM to 3SUM

Given a solver for 3SUM, the Conv3SUM problem can be solved in the following way. [10]

  • Define a new array T, such that for every index i: (where n is the number of elements in the array, and the indices run from 0 to n-1).
  • Solve 3SUM on the array T.

Correctness proof:

  • If in the original array there is a triple with , then , so this solution will be found by 3SUM on T.
  • Conversely, if in the new array there is a triple with , then . Because , necessarily and , so this is a valid solution for Conv3SUM on S.

Reduction from 3SUM to Conv3SUM

Given a solver for Conv3SUM, the 3SUM problem can be solved in the following way. [6] [10]

The reduction uses a hash function. As a first approximation, assume that we have a linear hash function, i.e. a function h such that:

Suppose that all elements are integers in the range: 0...N-1, and that the function h maps each element to an element in the smaller range of indices: 0...n-1. Create a new array T and send each element of S to its hash value in T, i.e., for every x in S():

Initially, suppose that the mappings are unique (i.e. each cell in T accepts only a single element from S). Solve Conv3SUM on T. Now:

  • If there is a solution for 3SUM: , then: and , so this solution will be found by the Conv3SUM solver on T.
  • Conversely, if a Conv3SUM is found on T, then obviously it corresponds to a 3SUM solution on S since T is just a permutation of S.

This idealized solution doesn't work, because any hash function might map several distinct elements of S to the same cell of T. The trick is to create an array by selecting a single random element from each cell of T, and run Conv3SUM on . If a solution is found, then it is a correct solution for 3SUM on S. If no solution is found, then create a different random and try again. Suppose there are at most R elements in each cell of T. Then the probability of finding a solution (if a solution exists) is the probability that the random selection will select the correct element from each cell, which is . By running Conv3SUM times, the solution will be found with a high probability.

Unfortunately, we do not have linear perfect hashing, so we have to use an almost linear hash function, i.e. a function h such that:

or

This requires to duplicate the elements of S when copying them into T, i.e., put every element both in (as before) and in . So each cell will have 2R elements, and we will have to run Conv3SUM times.

3SUM-hardness

A problem is called 3SUM-hard if solving it in subquadratic time implies a subquadratic-time algorithm for 3SUM. The concept of 3SUM-hardness was introduced by Gajentaan & Overmars (1995). They proved that a large class of problems in computational geometry are 3SUM-hard, including the following ones. (The authors acknowledge that many of these problems are contributed by other researchers.)

By now there are a multitude of other problems that fall into this category. An example is the decision version of X + Y sorting: given sets of numbers X and Y of n elements each, are there n² distinct x + y for xX, yY? [11]

See also

Notes

  1. Grønlund & Pettie 2014.
  2. Freund 2017.
  3. Gold & Sharir 2017.
  4. 1 2 Chan 2018.
  5. Kane, Lovett & Moran 2018.
  6. 1 2 Kopelowitz, Tsvi; Pettie, Seth; Porat, Ely (2014). "3SUM Hardness in (Dynamic) Data Structures". arXiv: 1407.6756 [cs.DS].
  7. Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2009) [1990]. Introduction to Algorithms (3rd ed.). MIT Press and McGraw-Hill. ISBN   0-262-03384-4. Ex. 30.1–7, p. 906.
  8. Visibility Graphs and 3-Sum by Michael Hoffmann
  9. For a reduction in the other direction, see Variants of the 3-sum problem.
  10. 1 2 3 Patrascu, M. (2010). Towards polynomial lower bounds for dynamic problems. Proceedings of the 42nd ACM symposium on Theory of computing - STOC '10. p. 603. doi:10.1145/1806689.1806772. ISBN   9781450300506.
  11. Demaine, Erik; Erickson, Jeff; O'Rourke, Joseph (20 August 2006). "Problem 41: Sorting X + Y (Pairwise Sums)". The Open Problems Project. Retrieved 23 September 2014.

Related Research Articles

Binary search algorithm Search algorithm finding the position of a target value within a sorted array

In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the middle element of the array. If they are not equal, the half in which the target cannot lie is eliminated and the search continues on the remaining half, again taking the middle element to compare to the target value, and repeating this until the target value is found. If the search ends with the remaining half being empty, the target is not in the array.

Knapsack problem Problem in combinatorial optimization

The knapsack problem is a problem in combinatorial optimization: Given a set of items, each with a weight and a value, determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible. It derives its name from the problem faced by someone who is constrained by a fixed-size knapsack and must fill it with the most valuable items. The problem often arises in resource allocation where the decision makers have to choose from a set of non-divisible projects or tasks under a fixed budget or time constraint, respectively.

The subset sum problem (SSP) is a decision problem in computer science. In its most general formulation, there is a multiset of integers and a target-sum , and the question is to decide whether any subset of the integers sum to precisely . The problem is known to be NP-complete. Moreover, some restricted variants of it are NP-complete too, for example:

Dynamic programming Problem optimization method.

Dynamic programming is both a mathematical optimization method and a computer programming method. The method was developed by Richard Bellman in the 1950s and has found applications in numerous fields, from aerospace engineering to economics.

Longest common subsequence problem

The longest common subsequence (LCS) problem is the problem of finding the longest subsequence common to all sequences in a set of sequences. It differs from the longest common substring problem: unlike substrings, subsequences are not required to occupy consecutive positions within the original sequences. The longest common subsequence problem is a classic computer science problem, the basis of data comparison programs such as the diff utility, and has applications in computational linguistics and bioinformatics. It is also widely used by revision control systems such as Git for reconciling multiple changes made to a revision-controlled collection of files.

Quicksort A divide and conquer sorting algorithm

Quicksort is an in-place sorting algorithm. Developed by British computer scientist Tony Hoare in 1959 and published in 1961, it is still a commonly used algorithm for sorting. When implemented well, it can be somewhat faster than merge sort and about two or three times faster than heapsort.

In mathematical optimization, constrained optimization is the process of optimizing an objective function with respect to some variables in the presence of constraints on those variables. The objective function is either a cost function or energy function, which is to be minimized, or a reward function or utility function, which is to be maximized. Constraints can be either hard constraints, which set conditions for the variables that are required to be satisfied, or soft constraints, which have some variable values that are penalized in the objective function if, and based on the extent that, the conditions on the variables are not satisfied.

Semidefinite programming (SDP) is a subfield of convex optimization concerned with the optimization of a linear objective function over the intersection of the cone of positive semidefinite matrices with an affine space, i.e., a spectrahedron.

In computer science, the prefix sum, cumulative sum, inclusive scan, or simply scan of a sequence of numbers x0, x1, x2, ... is a second sequence of numbers y0, y1, y2, ..., the sums of prefixes of the input sequence:

In numerical analysis and linear algebra, lower–upper (LU) decomposition or factorization factors a matrix as the product of a lower triangular matrix and an upper triangular matrix. The product sometimes includes a permutation matrix as well. LU decomposition can be viewed as the matrix form of Gaussian elimination. Computers usually solve square systems of linear equations using LU decomposition, and it is also a key step when inverting a matrix or computing the determinant of a matrix. The LU decomposition was introduced by the Polish mathematician Tadeusz Banachiewicz in 1938.

The study of facility location problems (FLP), also known as location analysis, is a branch of operations research and computational geometry concerned with the optimal placement of facilities to minimize transportation costs while considering factors like avoiding placing hazardous materials near housing, and competitors' facilities. The techniques also apply to cluster analysis.

In computer science, streaming algorithms are algorithms for processing data streams in which the input is presented as a sequence of items and can be examined in only a few passes. In most models, these algorithms have access to limited memory. They may also have limited processing time per item.

The maximum coverage problem is a classical question in computer science, computational complexity theory, and operations research. It is a problem that is widely taught in approximation algorithms.

In discrete mathematics, ideal lattices are a special class of lattices and a generalization of cyclic lattices. Ideal lattices naturally occur in many parts of number theory, but also in other areas. In particular, they have a significant place in cryptography. Micciancio defined a generalization of cyclic lattices as ideal lattices. They can be used in cryptosystems to decrease by a square root the number of parameters necessary to describe a lattice, making them more efficient. Ideal lattices are a new concept, but similar lattice classes have been used for a long time. For example, cyclic lattices, a special case of ideal lattices, are used in NTRUEncrypt and NTRUSign.

In computer science and data mining, MinHash is a technique for quickly estimating how similar two sets are. The scheme was invented by Andrei Broder (1997), and initially used in the AltaVista search engine to detect duplicate web pages and eliminate them from search results. It has also been applied in large-scale clustering problems, such as clustering documents by the similarity of their sets of words.

In data structures, a range query consists of preprocessing some input data into a data structure to efficiently answer any number of queries on any subset of the input. Particularly, there is a group of problems that have been extensively studied where the input is an array of unsorted numbers and a query consists of computing some function, such as the minimum, on a specific range of the array.

The quantum algorithm for linear systems of equations, also called HHL algorithm, designed by Aram Harrow, Avinatan Hassidim, and Seth Lloyd, is a quantum algorithm formulated in 2009 for solving linear systems. The algorithm estimates the result of a scalar measurement on the solution vector to a given linear system of equations.

<i>X</i> + <i>Y</i> sorting Problem of sorting pairs of numbers by their sum

In computer science, sorting is the problem of sorting pairs of numbers by their sums. Applications of the problem include transit fare minimisation, VLSI design, and sparse polynomial multiplication. As with comparison sorting and integer sorting more generally, algorithms for this problem can be based only on comparisons of these sums, or on other operations that work only when the inputs are small integers.

In computer science, the count-distinct problem (also known in applied mathematics as the cardinality estimation problem) is the problem of finding the number of distinct elements in a data stream with repeated elements. This is a well-known problem with numerous applications. The elements might represent IP addresses of packets passing through a router, unique visitors to a web site, elements in a large database, motifs in a DNA sequence, or elements of RFID/sensor networks.

The Flajolet–Martin algorithm is an algorithm for approximating the number of distinct elements in a stream with a single pass and space-consumption logarithmic in the maximal number of possible distinct elements in the stream. The algorithm was introduced by Philippe Flajolet and G. Nigel Martin in their 1984 article "Probabilistic Counting Algorithms for Data Base Applications". Later it has been refined in "LogLog counting of large cardinalities" by Marianne Durand and Philippe Flajolet, and "HyperLogLog: The analysis of a near-optimal cardinality estimation algorithm" by Philippe Flajolet et al.

References