SMAWK algorithm

Last updated

The SMAWK algorithm is an algorithm for finding the minimum value in each row of an implicitly-defined totally monotone matrix. It is named after the initials of its five inventors, Peter Shor, Shlomo Moran, Alok Aggarwal, Robert Wilber, and Maria Klawe. [1]

Contents

Input

For the purposes of this algorithm, a matrix is defined to be monotone if each row's minimum value occurs in a column which is equal to or greater than the column of the previous row's minimum. It is totally monotone if the same property is true for every submatrix (defined by an arbitrary subset of the rows and columns of the given matrix). Equivalently, a matrix is totally monotone if there does not exist a 2×2 submatrix whose row minima are in the top right and bottom left corners. Every Monge array is totally monotone, but not necessarily vice versa.

For the SMAWK algorithm, the matrix to be searched should be defined as a function, and this function is given as input to the algorithm (together with the dimensions of the matrix). The algorithm then evaluates the function whenever it needs to know the value of a particular matrix cell. If this evaluation takes O(1), then, for a matrix with r rows and c columns, the running time and number of function evaluations are both O(c(1 + log(r/c))). This is much faster than the O(rc) time of a naive algorithm that evaluates all matrix cells.

Method

The basic idea of the algorithm is to follow a prune and search strategy in which the problem to be solved is reduced to a single recursive subproblem of the same type whose size is smaller by a constant factor. To do so, the algorithm first preprocesses the matrix to remove some of its columns that cannot contain a row-minimum, using a stack-based algorithm similar to the one in the Graham scan and all nearest smaller values algorithms. After this phase of the algorithm, the number of remaining columns will at most equal the number of rows. Next, the algorithm calls itself recursively to find the row minima of the even-numbered rows of the matrix. Finally, by searching the columns between the positions of consecutive even-row minima, the algorithm fills out the remaining minima in the odd rows.

Applications

The main applications of this method presented in the original paper by Aggarwal et al. were in computational geometry, in finding the farthest point from each point of a convex polygon, and in finding optimal enclosing polygons. Subsequent research found applications of the same algorithm in breaking paragraphs into lines, [2] RNA secondary structure prediction, [3] DNA and protein sequence alignment, [4] [5] the construction of prefix codes, [6] and image thresholding, [7] among others.

Related Research Articles

<span class="mw-page-title-main">Sequence alignment</span> Process in bioinformatics that identifies equivalent sites within molecular sequences

In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are typically represented as rows within a matrix. Gaps are inserted between the residues so that identical or similar characters are aligned in successive columns. Sequence alignments are also used for non-biological sequences such as calculating the distance cost between strings in a natural language, or to display financial data.

In mathematics applied to computer science, Monge arrays, or Monge matrices, are mathematical objects named for their discoverer, the French mathematician Gaspard Monge.

In mathematical optimization, Dantzig's simplex algorithm is a popular algorithm for linear programming.

<span class="mw-page-title-main">Levenshtein distance</span> Computer science metric for string similarity

In information theory, linguistics, and computer science, the Levenshtein distance is a string metric for measuring the difference between two sequences. The Levenshtein distance between two words is the minimum number of single-character edits required to change one word into the other. It is named after Soviet mathematician Vladimir Levenshtein, who defined the metric in 1965.

In computational linguistics and computer science, edit distance is a string metric, i.e. a way of quantifying how dissimilar two strings are to one another, that is measured by counting the minimum number of operations required to transform one string into the other. Edit distances find applications in natural language processing, where automatic spelling correction can determine candidate corrections for a misspelled word by selecting words from a dictionary that have a low distance to the word in question. In bioinformatics, it can be used to quantify the similarity of DNA sequences, which can be viewed as strings of the letters A, C, G and T.

In computer science, 2-satisfiability, 2-SAT or just 2SAT is a computational problem of assigning values to variables, each of which has two possible values, in order to satisfy a system of constraints on pairs of variables. It is a special case of the general Boolean satisfiability problem, which can involve constraints on more than two variables, and of constraint satisfaction problems, which can allow more than two choices for the value of each variable. But in contrast to those more general problems, which are NP-complete, 2-satisfiability can be solved in polynomial time.

In mathematics, a unimodular matrixM is a square integer matrix having determinant +1 or −1. Equivalently, it is an integer matrix that is invertible over the integers: there is an integer matrix N that is its inverse. Thus every equation Mx = b, where M and b both have integer components and M is unimodular, has an integer solution. The n × n unimodular matrices form a group called the n × n general linear group over , which is denoted .

In combinatorics, a greedoid is a type of set system. It arises from the notion of the matroid, which was originally introduced by Whitney in 1935 to study planar graphs and was later used by Edmonds to characterize a class of optimization problems that can be solved by greedy algorithms. Around 1980, Korte and Lovász introduced the greedoid to further generalize this characterization of greedy algorithms; hence the name greedoid. Besides mathematical optimization, greedoids have also been connected to graph theory, language theory, order theory, and other areas of mathematics.

<span class="mw-page-title-main">Polygon triangulation</span> Partition of a simple polygon into triangles

In computational geometry, polygon triangulation is the partition of a polygonal area P into a set of triangles, i.e., finding a set of triangles with pairwise non-intersecting interiors whose union is P.

<span class="mw-page-title-main">Needleman–Wunsch algorithm</span> Method for aligning biological sequences

The Needleman–Wunsch algorithm is an algorithm used in bioinformatics to align protein or nucleotide sequences. It was one of the first applications of dynamic programming to compare biological sequences. The algorithm was developed by Saul B. Needleman and Christian D. Wunsch and published in 1970. The algorithm essentially divides a large problem into a series of smaller problems, and it uses the solutions to the smaller problems to find an optimal solution to the larger problem. It is also sometimes referred to as the optimal matching algorithm and the global alignment technique. The Needleman–Wunsch algorithm is still widely used for optimal global alignment, particularly when the quality of the global alignment is of the utmost importance. The algorithm assigns a score to every possible alignment, and the purpose of the algorithm is to find all possible alignments having the highest score.

The art gallery problem or museum problem is a well-studied visibility problem in computational geometry. It originates from the following real-world problem:

"In an art gallery, what is the minimum number of guards who together can observe the whole gallery?"

Matrix chain multiplication is an optimization problem concerning the most efficient way to multiply a given sequence of matrices. The problem is not actually to perform the multiplications, but merely to decide the sequence of the matrix multiplications involved. The problem may be solved using dynamic programming.

<span class="mw-page-title-main">Smith–Waterman algorithm</span> Algorithm for determining similar regions between two molecular sequences

The Smith–Waterman algorithm performs local sequence alignment; that is, for determining similar regions between two strings of nucleic acid sequences or protein sequences. Instead of looking at the entire sequence, the Smith–Waterman algorithm compares segments of all possible lengths and optimizes the similarity measure.

<span class="mw-page-title-main">Maria Klawe</span> Canadian-American computer scientist

Maria Margaret Klawe is a computer scientist and served as the fifth president of Harvey Mudd College from 2006 to 2023. Born in Toronto in 1951, she became a naturalized U.S. citizen in 2009. She was previously Dean of the School of Engineering and Applied Science at Princeton University. She is known for her advocacy for women in STEM fields.

Non-negative matrix factorization, also non-negative matrix approximation is a group of algorithms in multivariate analysis and linear algebra where a matrix V is factorized into (usually) two matrices W and H, with the property that all three matrices have no negative elements. This non-negativity makes the resulting matrices easier to inspect. Also, in applications such as processing of audio spectrograms or muscular activity, non-negativity is inherent to the data being considered. Since the problem is not exactly solvable in general, it is commonly approximated numerically.

<span class="mw-page-title-main">Multiple sequence alignment</span> Alignment of more than two molecular sequences

Multiple sequence alignment (MSA) is the process or the result of sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. These alignments are used to infer evolutionary relationships via phylogenetic analysis and can highlight homologous features between sequences. Alignments highlight mutation events such as point mutations, insertion mutations and deletion mutations, and alignments are used to assess sequence conservation and infer the presence and activity of protein domains, tertiary structures, secondary structures, and individual amino acids or nucleotides.

The pivot or pivot element is the element of a matrix, or an array, which is selected first by an algorithm, to do certain calculations. In the case of matrix algorithms, a pivot entry is usually required to be at least distinct from zero, and often distant from it; in this case finding this element is called pivoting. Pivoting may be followed by an interchange of rows or columns to bring the pivot to a fixed position and allow the algorithm to proceed successfully, and possibly to reduce round-off error. It is often used for verifying row echelon form.

Dantzig–Wolfe decomposition is an algorithm for solving linear programming problems with special structure. It was originally developed by George Dantzig and Philip Wolfe and initially published in 1960. Many texts on linear programming have sections dedicated to discussing this decomposition algorithm.

<span class="mw-page-title-main">Criss-cross algorithm</span> Method for mathematical optimization

In mathematical optimization, the criss-cross algorithm is any of a family of algorithms for linear programming. Variants of the criss-cross algorithm also solve more general problems with linear inequality constraints and nonlinear objective functions; there are criss-cross algorithms for linear-fractional programming problems, quadratic-programming problems, and linear complementarity problems.

The Knuth–Plass algorithm is a line-breaking algorithm designed for use in Donald Knuth's typesetting program TeX. It integrates the problems of text justification and hyphenation into a single algorithm by using a discrete dynamic programming method to minimize a loss function that attempts to quantify the aesthetic qualities desired in the finished output.

References

  1. Aggarwal, Alok; Klawe, Maria M.; Moran, Shlomo; Shor, Peter; Wilber, Robert (1987), "Geometric applications of a matrix-searching algorithm", Algorithmica, 2 (1–4): 195–208, doi:10.1007/BF01840359, MR   0895444 .
  2. Wilber, Robert (1988), "The concave least-weight subsequence problem revisited", Journal of Algorithms, 9 (3): 418–425, doi:10.1016/0196-6774(88)90032-6, MR   0955150
  3. Larmore, Lawrence L.; Schieber, Baruch (1991), "On-line dynamic programming with applications to the prediction of RNA secondary structure", Journal of Algorithms, 12 (3): 490–515, doi:10.1016/0196-6774(91)90016-R, MR   1114923 .
  4. Russo, Luís M. S. (2012), "Monge properties of sequence alignment", Theoretical Computer Science, 423: 30–49, doi: 10.1016/j.tcs.2011.12.068 , MR   2887979 .
  5. Crochemore, Maxime; Landau, Gad M.; Ziv-Ukelson, Michal (2003), "A subquadratic sequence alignment algorithm for unrestricted scoring matrices", SIAM Journal on Computing, 32 (6): 1654–1673 (electronic), CiteSeerX   10.1.1.57.8562 , doi:10.1137/S0097539702402007, MR   2034254 .
  6. Bradford, Phil; Golin, Mordecai J.; Larmore, Lawrence L.; Rytter, Wojciech (2002), "Optimal prefix-free codes for unequal letter costs: dynamic programming with the Monge property", Journal of Algorithms, 42 (2): 277–303, CiteSeerX   10.1.1.45.5501 , doi:10.1006/jagm.2002.1213, MR   1895977 .
  7. Luessi, M.; Eichmann, M.; Schuster, G.M.; Katsaggelos, A.K. (2006), "New results on efficient optimal multilevel image thresholding", IEEE International Conference on Image Processing, pp. 773–776, CiteSeerX   10.1.1.461.663 , doi:10.1109/ICIP.2006.312426, ISBN   978-1-4244-0480-3 .