In computer science, the longest increasing subsequence problem aims to find a subsequence of a given sequence in which the subsequence's elements are sorted in an ascending order and in which the subsequence is as long as possible. This subsequence is not necessarily contiguous or unique. The longest increasing subsequences are studied in the context of various disciplines related to mathematics, including algorithmics, random matrix theory, representation theory, and physics. [1] [2] The longest increasing subsequence problem is solvable in time where denotes the length of the input sequence. [3]
In the first 16 terms of the binary Van der Corput sequence
one of the longest increasing subsequences is
This subsequence has length six; the input sequence has no seven-member increasing subsequences. The longest increasing subsequence in this example is not the only solution: for instance,
are other increasing subsequences of equal length in the same input sequence.
The longest increasing subsequence problem is closely related to the longest common subsequence problem, which has a quadratic time dynamic programming solution: the longest increasing subsequence of a sequence is the longest common subsequence of and where is the result of sorting However, for the special case in which the input is a permutation of the integers this approach can be made much more efficient, leading to time bounds of the form [4]
The largest clique in a permutation graph corresponds to the longest decreasing subsequence of the permutation that defines the graph (assuming the original non-permuted sequence is sorted from lowest value to highest). Similarly, the maximum independent set in a permutation graph corresponds to the longest non-decreasing subsequence. Therefore, longest increasing subsequence algorithms can be used to solve the clique problem efficiently in permutation graphs. [5]
In the Robinson–Schensted correspondence between permutations and Young tableaux, the length of the first row of the tableau corresponding to a permutation equals the length of the longest increasing subsequence of the permutation, and the length of the first column equals the length of the longest decreasing subsequence. [3]
The algorithm outlined below solves the longest increasing subsequence problem efficiently with arrays and binary searching. It processes the sequence elements in order, maintaining the longest increasing subsequence found so far. Denote the sequence values as etc. Then, after processing the algorithm will have stored an integer and values in two arrays:
Because the algorithm below uses zero-based numbering, for clarity is padded with which goes unused so that corresponds to a subsequence of length A real implementation can skip and adjust the indices accordingly.
Note that, at any point in the algorithm, the sequence is increasing. For, if there is an increasing subsequence of length ending at then there is also a subsequence of length ending at a smaller value: namely the one ending at Thus, we may do binary searches in this sequence in logarithmic time.
The algorithm, then, proceeds as follows:
P = array of length N M = array of length N + 1 M[0] = -1 // undefined so can be set to any value L = 0 for i in range 0 to N-1: //N-1 included // Binary search for the smallest positive l ≤ L // such that X[M[l]] >= X[i] lo = 1 hi = L + 1 while lo < hi: mid = lo + floor((hi-lo)/2) // lo <= mid < hi if X[M[mid]] >= X[i] hi = mid else: // if X[M[mid]] < X[i] lo = mid + 1 // After searching, lo == hi is 1 greater than the // length of the longest prefix of X[i] newL = lo // The predecessor of X[i] is the last index of // the subsequence of length newL-1 P[i] = M[newL-1] M[newL] = i if newL > L: // If we found a subsequence longer than any we've // found yet, update L L = newL // Reconstruct the longest increasing subsequence // It consists of the values of X at the L indices: // ..., P[P[M[L]]], P[M[L]], M[L] S = array of length L k = M[L] for j in range L-1 to 0: //0 included S[j] = X[k] k = P[k] return S
Because the algorithm performs a single binary search per sequence element, its total time can be expressed using Big O notation as Fredman (1975) discusses a variant of this algorithm, which he credits to Donald Knuth; in the variant that he studies, the algorithm tests whether each value can be used to extend the current longest increasing sequence, in constant time, prior to doing the binary search. With this modification, the algorithm uses at most comparisons in the worst case, which is optimal for a comparison-based algorithm up to the constant factor in the term. [6]
Example run
Values stored in variables | X[i] | newL | P | M | X[M] | L |
---|---|---|---|---|---|---|
Before for i loop | P = [] | M = [-1] | X[M] = [N/A] | L = 0 | ||
At end of loop i = 0 | X[i] = 2 | newL = 1 | P = [-1] | M = [-1, 0] | X[M] = [N/A, 2] | L = 1 |
At end of loop i = 1 | X[i] = 8 | newL = 2 | P = [-1, 0] | M = [-1, 0, 1] | X[M] = [N/A, 2, 8] | L = 2 |
At end of loop i = 2 | X[i] = 9 | newL = 3 | P = [-1, 0, 1] | M = [-1, 0, 1, 2] | X[M] = [N/A, 2, 8, 9] | L = 3 |
At end of loop i = 3 | X[i] = 5 | newL = 2 | P = [-1, 0, 1, 0] | M = [-1, 0, 3, 2] | X[M] = [N/A, 2, 5, 9] | L = 3 |
At end of loop i = 4 | X[i] = 6 | newL = 3 | P = [-1, 0, 1, 0, 3] | M = [-1, 0, 3, 4] | X[M] = [N/A, 2, 5, 6] | L = 3 |
At end of loop i = 5 | X[i] = 7 | newL = 4 | P = [-1, 0, 1, 0, 3, 4] | M = [-1, 0, 3, 4, 5] | X[M] = [N/A, 2, 5, 6, 7] | L = 4 |
At end of loop i = 6 | X[i] = 1 | newL = 1 | P = [-1, 0, 1, 0, 3, 4, -1] | M = [-1, 6, 3, 4, 5] | X[M] = [N/A, 1, 5, 6, 7] | L = 4 |
Values stored in variables | S | k | X[k] | |||
Before for j loop | S = [N/A, N/A, N/A, N/A] | k = M[4] = 5 | X[5] = 7 | |||
At end of loop j = 3 | S = [N/A, N/A, N/A, 7] | k = P[5] = 4 | X[4] = 6 | |||
At end of loop j = 2 | S = [N/A, N/A, 6, 7] | k = P[4] = 3 | X[3] = 5 | |||
At end of loop j = 1 | S = [N/A, 5, 6, 7] | k = P[3] = 0 | X[0] = 2 | |||
At end of loop j = 0 | S = [2, 5, 6, 7] | k = P[0] = -1 | X[-1] = N/A |
According to the Erdős–Szekeres theorem, any sequence of distinct integers has an increasing or a decreasing subsequence of length [7] [8] For inputs in which each permutation of the input is equally likely, the expected length of the longest increasing subsequence is approximately [9] [2]
In the limit as approaches infinity, the Baik-Deift-Johansson theorem says, that the length of the longest increasing subsequence of a randomly permuted sequence of items has a distribution approaching the Tracy–Widom distribution, the distribution of the largest eigenvalue of a random matrix in the Gaussian unitary ensemble. [10]
The longest increasing subsequence has also been studied in the setting of online algorithms, in which the elements of a sequence of independent random variables with continuous distribution – or alternatively the elements of a random permutation – are presented one at a time to an algorithm that must decide whether to include or exclude each element, without knowledge of the later elements. In this variant of the problem, which allows for interesting applications in several contexts, it is possible to devise an optimal selection procedure that, given a random sample of size as input, will generate an increasing sequence with maximal expected length of size approximately [11] The length of the increasing subsequence selected by this optimal procedure has variance approximately equal to and its limiting distribution is asymptotically normal after the usual centering and scaling. [12] The same asymptotic results hold with more precise bounds for the corresponding problem in the setting of a Poisson arrival process. [13] A further refinement in the Poisson process setting is given through the proof of a central limit theorem for the optimal selection process which holds, with a suitable normalization, in a more complete sense than one would expect. The proof yields not only the "correct" functional limit theorem but also the (singular) covariance matrix of the three-dimensional process summarizing all interacting processes. [14]
In algorithmic information theory, the Kolmogorov complexity of an object, such as a piece of text, is the length of a shortest computer program that produces the object as output. It is a measure of the computational resources needed to specify the object, and is also known as algorithmic complexity, Solomonoff–Kolmogorov–Chaitin complexity, program-size complexity, descriptive complexity, or algorithmic entropy. It is named after Andrey Kolmogorov, who first published on the subject in 1963 and is a generalization of classical information theory.
In computer science, merge sort is an efficient, general-purpose, and comparison-based sorting algorithm. Most implementations produce a stable sort, which means that the relative order of equal elements is the same in the input and output. Merge sort is a divide-and-conquer algorithm that was invented by John von Neumann in 1945. A detailed description and analysis of bottom-up merge sort appeared in a report by Goldstine and von Neumann as early as 1948.
In mathematics, specifically in real analysis, the Bolzano–Weierstrass theorem, named after Bernard Bolzano and Karl Weierstrass, is a fundamental result about convergence in a finite-dimensional Euclidean space . The theorem states that each infinite bounded sequence in has a convergent subsequence. An equivalent formulation is that a subset of is sequentially compact if and only if it is closed and bounded. The theorem is sometimes called the sequential compactness theorem.
In mathematics, a permutation of a set can mean one of two different things:
A longest common subsequence (LCS) is the longest subsequence common to all sequences in a set of sequences. It differs from the longest common substring: unlike substrings, subsequences are not required to occupy consecutive positions within the original sequences. The problem of computing longest common subsequences is a classic computer science problem, the basis of data comparison programs such as the diff
utility, and has applications in computational linguistics and bioinformatics. It is also widely used by revision control systems such as Git for reconciling multiple changes made to a revision-controlled collection of files.
In mathematics, a low-discrepancy sequence is a sequence with the property that for all values of , its subsequence has a low discrepancy.
In mathematics, specifically order theory, a well-quasi-ordering or wqo on a set is a quasi-ordering of for which every infinite sequence of elements from contains an increasing pair with
In mathematics, subadditivity is a property of a function that states, roughly, that evaluating the function for the sum of two elements of the domain always returns something less than or equal to the sum of the function's values at each element. There are numerous examples of subadditive functions in various areas of mathematics, particularly norms and square roots. Additive maps are special cases of subadditive functions.
In computer science, patience sorting is a sorting algorithm inspired by, and named after, the card game patience. A variant of the algorithm efficiently computes the length of a longest increasing subsequence in a given array.
In mathematics, the Robinson–Schensted correspondence is a bijective correspondence between permutations and pairs of standard Young tableaux of the same shape. It has various descriptions, all of which are of algorithmic nature, it has many remarkable properties, and it has applications in combinatorics and other areas such as representation theory. The correspondence has been generalized in numerous ways, notably by Knuth to what is known as the Robinson–Schensted–Knuth correspondence, and a further generalization to pictures by Zelevinsky.
In computer science, the shortest common supersequence of two sequences X and Y is the shortest sequence which has X and Y as subsequences. This is a problem closely related to the longest common subsequence problem. Given two sequences X = < x1,...,xm > and Y = < y1,...,yn >, a sequence U = < u1,...,uk > is a common supersequence of X and Y if items can be removed from U to produce X and Y.
In mathematics, the Erdős–Szekeres theorem asserts that, given r, s, any sequence of distinct real numbers with length at least (r − 1)(s − 1) + 1 contains a monotonically increasing subsequence of length ror a monotonically decreasing subsequence of length s. The proof appeared in the same 1935 paper that mentions the Happy Ending problem.
Quicksort is an efficient, general-purpose sorting algorithm. Quicksort was developed by British computer scientist Tony Hoare in 1959 and published in 1961. It is still a commonly used algorithm for sorting. Overall, it is slightly faster than merge sort and heapsort for randomized data, particularly on larger distributions.
Intuitively, an algorithmically random sequence is a sequence of binary digits that appears random to any algorithm running on a universal Turing machine. The notion can be applied analogously to sequences on any finite alphabet. Random sequences are key objects of study in algorithmic information theory.
In computer science, the Hunt–Szymanski algorithm, also known as Hunt–McIlroy algorithm, is a solution to the longest common subsequence problem. It was one of the first non-heuristic algorithms used in diff which compares a pair of files each represented as a sequence of lines. To this day, variations of this algorithm are found in incremental version control systems, wiki engines, and molecular phylogenetics research software.
In combinatorial mathematics, probability, and computer science, in the longest alternating subsequence problem, one wants to find a subsequence of a given sequence in which the elements are in alternating order, and in which the sequence is as long as possible.
In mathematics, Kingman's subadditive ergodic theorem is one of several ergodic theorems. It can be seen as a generalization of Birkhoff's ergodic theorem. Intuitively, the subadditive ergodic theorem is a kind of random variable version of Fekete's lemma. As a result, it can be rephrased in the language of probability, e.g. using a sequence of random variables and expected values. The theorem is named after John Kingman.
In mathematics, the Chvátal–Sankoff constants are mathematical constants that describe the lengths of longest common subsequences of random strings. Although the existence of these constants has been proven, their exact values are unknown. They are named after Václav Chvátal and David Sankoff, who began investigating them in the mid-1970s.
In computer science, merge-insertion sort or the Ford–Johnson algorithm is a comparison sorting algorithm published in 1959 by L. R. Ford Jr. and Selmer M. Johnson. It uses fewer comparisons in the worst case than the best previously known algorithms, binary insertion sort and merge sort, and for 20 years it was the sorting algorithm with the fewest known comparisons. Although not of practical significance, it remains of theoretical interest in connection with the problem of sorting with a minimum number of comparisons. The same algorithm may have also been independently discovered by Stanisław Trybuła and Czen Ping.
The Baik–Deift–Johansson theorem is a result from probabilistic combinatorics. It deals with the subsequences of a randomly uniformly drawn permutation from the set . The theorem makes a statement about the distribution of the length of the longest increasing subsequence in the limit. The theorem was influential in probability theory since it connected the KPZ-universality with the theory of random matrices.