SMAWK algorithm

Last updated

The SMAWK algorithm is an algorithm for finding the minimum value in each row of an implicitly-defined totally monotone matrix. It is named after the initials of its five inventors, Peter Shor, Shlomo Moran, Alok Aggarwal, Robert Wilber, and Maria Klawe. [1]

Contents

Input

For the purposes of this algorithm, a matrix is defined to be monotone if each row's minimum value occurs in a column which is equal to or greater than the column of the previous row's minimum. It is totally monotone if the same property is true for every submatrix (defined by an arbitrary subset of the rows and columns of the given matrix). Equivalently, a matrix is totally monotone if there does not exist a 2×2 submatrix whose row minima are in the top right and bottom left corners. Every Monge array is totally monotone, but not necessarily vice versa.

For the SMAWK algorithm, the matrix to be searched should be defined as a function, and this function is given as input to the algorithm (together with the dimensions of the matrix). The algorithm then evaluates the function whenever it needs to know the value of a particular matrix cell. If this evaluation takes O(1), then, for a matrix with r rows and c columns, the running time and number of function evaluations are both O(c(1 + log(r/c))). This is much faster than the O(rc) time of a naive algorithm that evaluates all matrix cells.

Method

The basic idea of the algorithm is to follow a prune and search strategy in which the problem to be solved is reduced to a single recursive subproblem of the same type whose size is smaller by a constant factor. To do so, the algorithm first preprocesses the matrix to remove some of its columns that cannot contain a row-minimum, using a stack-based algorithm similar to the one in the Graham scan and all nearest smaller values algorithms. After this phase of the algorithm, the number of remaining columns will at most equal the number of rows. Next, the algorithm calls itself recursively to find the row minima of the even-numbered rows of the matrix. Finally, by searching the columns between the positions of consecutive even-row minima, the algorithm fills out the remaining minima in the odd rows.

Applications

The main applications of this method presented in the original paper by Aggarwal et al. were in computational geometry, in finding the farthest point from each point of a convex polygon, and in finding optimal enclosing polygons. Subsequent research found applications of the same algorithm in breaking paragraphs into lines, [2] RNA secondary structure prediction, [3] DNA and protein sequence alignment, [4] [5] the construction of prefix codes, [6] and image thresholding, [7] among others.

Related Research Articles

In Boolean logic, the majority function is the Boolean function that evaluates to false when half or more arguments are false and true otherwise, i.e. the value of the function equals the value of the majority of the inputs.

<span class="mw-page-title-main">Sequence alignment</span> Process in bioinformatics that identifies equivalent sites within molecular sequences

In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are typically represented as rows within a matrix. Gaps are inserted between the residues so that identical or similar characters are aligned in successive columns. Sequence alignments are also used for non-biological sequences such as calculating the distance cost between strings in a natural language, or to display financial data.

In mathematics applied to computer science, Monge arrays, or Monge matrices, are mathematical objects named for their discoverer, the French mathematician Gaspard Monge.

In mathematical optimization, Dantzig's simplex algorithm is a popular algorithm for linear programming.

In information theory, linguistics, and computer science, the Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits required to change one word into the other. It is named after the Soviet mathematician Vladimir Levenshtein, who considered this distance in 1965.

In computational linguistics and computer science, edit distance is a string metric, i.e. a way of quantifying how dissimilar two strings are to one another, that is measured by counting the minimum number of operations required to transform one string into the other. Edit distances find applications in natural language processing, where automatic spelling correction can determine candidate corrections for a misspelled word by selecting words from a dictionary that have a low distance to the word in question. In bioinformatics, it can be used to quantify the similarity of DNA sequences, which can be viewed as strings of the letters A, C, G and T.

In computer science, 2-satisfiability, 2-SAT or just 2SAT is a computational problem of assigning values to variables, each of which has two possible values, in order to satisfy a system of constraints on pairs of variables. It is a special case of the general Boolean satisfiability problem, which can involve constraints on more than two variables, and of constraint satisfaction problems, which can allow more than two choices for the value of each variable. But in contrast to those more general problems, which are NP-complete, 2-satisfiability can be solved in polynomial time.

In mathematics, a unimodular matrixM is a square integer matrix having determinant +1 or −1. Equivalently, it is an integer matrix that is invertible over the integers: there is an integer matrix N that is its inverse. Thus every equation Mx = b, where M and b both have integer components and M is unimodular, has an integer solution. The n × n unimodular matrices form a group called the n × n general linear group over , which is denoted .

In combinatorics, a greedoid is a type of set system. It arises from the notion of the matroid, which was originally introduced by Whitney in 1935 to study planar graphs and was later used by Edmonds to characterize a class of optimization problems that can be solved by greedy algorithms. Around 1980, Korte and Lovász introduced the greedoid to further generalize this characterization of greedy algorithms; hence the name greedoid. Besides mathematical optimization, greedoids have also been connected to graph theory, language theory, order theory, and other areas of mathematics.

<span class="mw-page-title-main">Polygon triangulation</span> Partition of a simple polygon into triangles

In computational geometry, polygon triangulation is the partition of a polygonal area P into a set of triangles, i.e., finding a set of triangles with pairwise non-intersecting interiors whose union is P.

<span class="mw-page-title-main">Needleman–Wunsch algorithm</span> Method for aligning biological sequences

The Needleman–Wunsch algorithm is an algorithm used in bioinformatics to align protein or nucleotide sequences. It was one of the first applications of dynamic programming to compare biological sequences. The algorithm was developed by Saul B. Needleman and Christian D. Wunsch and published in 1970. The algorithm essentially divides a large problem into a series of smaller problems, and it uses the solutions to the smaller problems to find an optimal solution to the larger problem. It is also sometimes referred to as the optimal matching algorithm and the global alignment technique. The Needleman–Wunsch algorithm is still widely used for optimal global alignment, particularly when the quality of the global alignment is of the utmost importance. The algorithm assigns a score to every possible alignment, and the purpose of the algorithm is to find all possible alignments having the highest score.

<span class="mw-page-title-main">Simple polygon</span> Shape bounded by non-intersecting line segments

In geometry, a simple polygon is a polygon that does not intersect itself and has no holes. That is, it is a piecewise-linear Jordan curve consisting of finitely many line segments. These polygons include as special cases the convex polygons, star-shaped polygons, and monotone polygons.

The art gallery problem or museum problem is a well-studied visibility problem in computational geometry. It originates from the following real-world problem:

"In an art gallery, what is the minimum number of guards who together can observe the whole gallery?"

<span class="mw-page-title-main">Smith–Waterman algorithm</span> Algorithm for determining similar regions between two molecular sequences

The Smith–Waterman algorithm performs local sequence alignment; that is, for determining similar regions between two strings of nucleic acid sequences or protein sequences. Instead of looking at the entire sequence, the Smith–Waterman algorithm compares segments of all possible lengths and optimizes the similarity measure.

<span class="mw-page-title-main">Maria Klawe</span> Canadian-American computer scientist

Maria Margaret Klawe is a computer scientist and the fifth president of Harvey Mudd College. Born in Toronto in 1951, she became a naturalized U.S. citizen in 2009. She was previously Dean of the School of Engineering and Applied Science at Princeton University. She is known for her advocacy for women in STEM fields.

In graph theory, the treewidth of an undirected graph is an integer number which specifies, informally, how far the graph is from being a tree. The smallest treewidth is 1; the graphs with treewidth 1 are exactly the trees and the forests. The graphs with treewidth at most 2 are the series–parallel graphs. The maximal graphs with treewidth exactly k are called k-trees, and the graphs with treewidth at most k are called partial k-trees. Many other well-studied graph families also have bounded treewidth.

<span class="mw-page-title-main">Matrix (mathematics)</span> Array of numbers

In mathematics, a matrix is a rectangular array or table of numbers, symbols, or expressions, arranged in rows and columns, which is used to represent a mathematical object or a property of such an object.

<span class="mw-page-title-main">Criss-cross algorithm</span> Method for mathematical optimization

In mathematical optimization, the criss-cross algorithm is any of a family of algorithms for linear programming. Variants of the criss-cross algorithm also solve more general problems with linear inequality constraints and nonlinear objective functions; there are criss-cross algorithms for linear-fractional programming problems, quadratic-programming problems, and linear complementarity problems.

In the theory of linear programming, a basic feasible solution (BFS) is a solution with a minimal set of non-zero variables. Geometrically, each BFS corresponds to a vertex of the polyhedron of feasible solutions. If there exists an optimal solution, then there exists an optimal BFS. Hence, to find an optimal solution, it is sufficient to consider the BFS-s. This fact is used by the simplex algorithm, which essentially travels from one BFS to another until an optimal solution is found.

The Knuth–Plass algorithm is a line-breaking algorithm designed for use in Donald Knuth's typesetting program TeX. It integrates the problems of text justification and hyphenation into a single algorithm by using a discrete dynamic programming method to minimize a loss function that attempts to quantify the aesthetic qualities desired in the finished output.

References

  1. Aggarwal, Alok; Klawe, Maria M.; Moran, Shlomo; Shor, Peter; Wilber, Robert (1987), "Geometric applications of a matrix-searching algorithm", Algorithmica, 2 (2): 195–208, doi:10.1007/BF01840359, MR   0895444 .
  2. Wilber, Robert (1988), "The concave least-weight subsequence problem revisited", Journal of Algorithms, 9 (3): 418–425, doi:10.1016/0196-6774(88)90032-6, MR   0955150
  3. Larmore, Lawrence L.; Schieber, Baruch (1991), "On-line dynamic programming with applications to the prediction of RNA secondary structure", Journal of Algorithms, 12 (3): 490–515, doi:10.1016/0196-6774(91)90016-R, MR   1114923 .
  4. Russo, Luís M. S. (2012), "Monge properties of sequence alignment", Theoretical Computer Science, 423: 30–49, doi: 10.1016/j.tcs.2011.12.068 , MR   2887979 .
  5. Crochemore, Maxime; Landau, Gad M.; Ziv-Ukelson, Michal (2003), "A subquadratic sequence alignment algorithm for unrestricted scoring matrices", SIAM Journal on Computing, 32 (6): 1654–1673 (electronic), CiteSeerX   10.1.1.57.8562 , doi:10.1137/S0097539702402007, MR   2034254 .
  6. Bradford, Phil; Golin, Mordecai J.; Larmore, Lawrence L.; Rytter, Wojciech (2002), "Optimal prefix-free codes for unequal letter costs: dynamic programming with the Monge property", Journal of Algorithms, 42 (2): 277–303, CiteSeerX   10.1.1.45.5501 , doi:10.1006/jagm.2002.1213, MR   1895977 .
  7. Luessi, M.; Eichmann, M.; Schuster, G.M.; Katsaggelos, A.K. (2006), "New results on efficient optimal multilevel image thresholding", IEEE International Conference on Image Processing, pp. 773–776, CiteSeerX   10.1.1.461.663 , doi:10.1109/ICIP.2006.312426, ISBN   978-1-4244-0480-3 .