Matching pursuit

Last updated
A signal and its wavelet representation. Each pixel in the heat map (top) represents an atom (a wavelet centered in time according to the horizontal position and with frequency corresponding to height). The color of the pixel gives the inner product of the corresponding wavelet atom with the signal (bottom). Matching pursuit should represent the signal by just a few atoms, such as the three at the centers of the clearly visible ellipses. Matching pursuit.png
A signal and its wavelet representation. Each pixel in the heat map (top) represents an atom (a wavelet centered in time according to the horizontal position and with frequency corresponding to height). The color of the pixel gives the inner product of the corresponding wavelet atom with the signal (bottom). Matching pursuit should represent the signal by just a few atoms, such as the three at the centers of the clearly visible ellipses.

Matching pursuit (MP) is a sparse approximation algorithm which finds the "best matching" projections of multidimensional data onto the span of an over-complete (i.e., redundant) dictionary . The basic idea is to approximately represent a signal from Hilbert space as a weighted sum of finitely many functions (called atoms) taken from . An approximation with atoms has the form

Contents

where is the th column of the matrix and is the scalar weighting factor (amplitude) for the atom . Normally, not every atom in will be used in this sum. Instead, matching pursuit chooses the atoms one at a time in order to maximally (greedily) reduce the approximation error. This is achieved by finding the atom that has the highest inner product with the signal (assuming the atoms are normalized), subtracting from the signal an approximation that uses only that one atom, and repeating the process until the signal is satisfactorily decomposed, i.e., the norm of the residual is small, where the residual after calculating and is denoted by

.

If converges quickly to zero, then only a few atoms are needed to get a good approximation to . Such sparse representations are desirable for signal coding and compression. More precisely, the sparsity problem that matching pursuit is intended to approximately solve is

where is the pseudo-norm (i.e. the number of nonzero elements of ). In the previous notation, the nonzero entries of are . Solving the sparsity problem exactly is NP-hard, which is why approximation methods like MP are used.

For comparison, consider the Fourier transform representation of a signal - this can be described using the terms given above, where the dictionary is built from sinusoidal basis functions (the smallest possible complete dictionary). The main disadvantage of Fourier analysis in signal processing is that it extracts only the global features of the signals and does not adapt to the analysed signals . By taking an extremely redundant dictionary, we can look in it for atoms (functions) that best match a signal .

The algorithm

Example of the retrieval of an unknown signal (gray line) from few measurements (black dots) using a orthogonal matching pursuit algorithm (purple dots show the retrieved coefficients). Orthogonal Matching Pursuit.gif
Example of the retrieval of an unknown signal (gray line) from few measurements (black dots) using a orthogonal matching pursuit algorithm (purple dots show the retrieved coefficients).

If contains a large number of vectors, searching for the most sparse representation of is computationally unacceptable for practical applications. In 1993, Mallat and Zhang [1] proposed a greedy solution that they named "Matching Pursuit." For any signal and any dictionary , the algorithm iteratively generates a sorted list of atom indices and weighting scalars, which form the sub-optimal solution to the problem of sparse signal representation.

Algorithm Matching Pursuit  Input: Signal: , dictionary  with normalized columns .  Output: List of coefficients  and indices for corresponding atoms .  Initialization:    ;    ;  Repeat:    Find  with maximum inner product ;    ;    ;    ;  Until stop condition (for example: )  return

In signal processing, the concept of matching pursuit is related to statistical projection pursuit, in which "interesting" projections are found; ones that deviate more from a normal distribution are considered to be more interesting.

Properties

.

Applications

Matching pursuit has been applied to signal, image [2] and video coding, [3] [4] shape representation and recognition, [5] 3D objects coding, [6] and in interdisciplinary applications like structural health monitoring. [7] It has been shown that it performs better than DCT based coding for low bit rates in both efficiency of coding and quality of image. [8] The main problem with matching pursuit is the computational complexity of the encoder. In the basic version of an algorithm, the large dictionary needs to be searched at each iteration. Improvements include the use of approximate dictionary representations and suboptimal ways of choosing the best match at each iteration (atom extraction). [9] The matching pursuit algorithm is used in MP/SOFT, a method of simulating quantum dynamics. [10]

MP is also used in dictionary learning. [11] [12] In this algorithm, atoms are learned from a database (in general, natural scenes such as usual images) and not chosen from generic dictionaries.

A very recent application of MP is its use in linear computation coding [13] to speed-up the computation of matrix-vector products.

Extensions

A popular extension of Matching Pursuit (MP) is its orthogonal version: Orthogonal Matching Pursuit [14] [15] (OMP). The main difference from MP is that after every step, all the coefficients extracted so far are updated, by computing the orthogonal projection of the signal onto the subspace spanned by the set of atoms selected so far. This can lead to results better than standard MP, but requires more computation. OMP was shown to have stability and performance guarantees under certain restricted isometry conditions. [16] The incremental multi-parameter algorithm (IMP), published three years before MP, works in the same way as OMP. [17]

Extensions such as Multichannel MP [18] and Multichannel OMP [19] allow one to process multicomponent signals. An obvious extension of Matching Pursuit is over multiple positions and scales, by augmenting the dictionary to be that of a wavelet basis. This can be done efficiently using the convolution operator without changing the core algorithm. [20]

Matching pursuit is related to the field of compressed sensing and has been extended by researchers in that community. Notable extensions are Orthogonal Matching Pursuit (OMP), [21] Stagewise OMP (StOMP), [22] compressive sampling matching pursuit (CoSaMP), [23] Generalized OMP (gOMP), [24] and Multipath Matching Pursuit (MMP). [25]

See also

Related Research Articles

<span class="mw-page-title-main">Dynamic time warping</span> An algorithm for measuring similarity between two temporal sequences, which may vary in speed

In time series analysis, dynamic time warping (DTW) is an algorithm for measuring similarity between two temporal sequences, which may vary in speed. For instance, similarities in walking could be detected using DTW, even if one person was walking faster than the other, or if there were accelerations and decelerations during the course of an observation. DTW has been applied to temporal sequences of video, audio, and graphics data — indeed, any data that can be turned into a one-dimensional sequence can be analyzed with DTW. A well-known application has been automatic speech recognition, to cope with different speaking speeds. Other applications include speaker recognition and online signature recognition. It can also be used in partial shape matching applications.

<span class="mw-page-title-main">Discrete wavelet transform</span> Transform in numerical harmonic analysis

In numerical analysis and functional analysis, a discrete wavelet transform (DWT) is any wavelet transform for which the wavelets are discretely sampled. As with other wavelet transforms, a key advantage it has over Fourier transforms is temporal resolution: it captures both frequency and location information.

<span class="mw-page-title-main">CORDIC</span> Algorithm for computing trigonometric, hyperbolic, logarithmic and exponential functions

CORDIC, also known as Volder's algorithm, or: Digit-by-digit methodCircular CORDIC, Linear CORDIC, Hyperbolic CORDIC, and Generalized Hyperbolic CORDIC, is a simple and efficient algorithm to calculate trigonometric functions, hyperbolic functions, square roots, multiplications, divisions, and exponentials and logarithms with arbitrary base, typically converging with one digit per iteration. CORDIC is therefore also an example of digit-by-digit algorithms. CORDIC and closely related methods known as pseudo-multiplication and pseudo-division or factor combining are commonly used when no hardware multiplier is available, as the only operations they require are additions, subtractions, bitshift and lookup tables. As such, they all belong to the class of shift-and-add algorithms. In computer science, CORDIC is often used to implement floating-point arithmetic when the target platform lacks hardware multiply for cost or space reasons.

<span class="mw-page-title-main">Gabor filter</span> Linear filter used for texture analysis

In image processing, a Gabor filter, named after Dennis Gabor, who first proposed it as a 1D filter. The Gabor filter was first generalized to 2D by Gösta Granlund, by adding a reference direction. The Gabor filter is a linear filter used for texture analysis, which essentially means that it analyzes whether there is any specific frequency content in the image in specific directions in a localized region around the point or region of analysis. Frequency and orientation representations of Gabor filters are claimed by many contemporary vision scientists to be similar to those of the human visual system. They have been found to be particularly appropriate for texture representation and discrimination. In the spatial domain, a 2D Gabor filter is a Gaussian kernel function modulated by a sinusoidal plane wave.

<span class="mw-page-title-main">Dominating set</span> Subset of a graphs nodes such that all other nodes link to at least one

In graph theory, a dominating set for a graph G is a subset D of its vertices, such that any vertex of G is in D, or has a neighbor in D. The domination numberγ(G) is the number of vertices in a smallest dominating set for G.

Non-negative matrix factorization, also non-negative matrix approximation is a group of algorithms in multivariate analysis and linear algebra where a matrix V is factorized into (usually) two matrices W and H, with the property that all three matrices have no negative elements. This non-negativity makes the resulting matrices easier to inspect. Also, in applications such as processing of audio spectrograms or muscular activity, non-negativity is inherent to the data being considered. Since the problem is not exactly solvable in general, it is commonly approximated numerically.

In mathematics, Fourier–Bessel series is a particular kind of generalized Fourier series based on Bessel functions.

<span class="mw-page-title-main">MUSIC (algorithm)</span> Algorithm used for frequency estimation and radio direction finding

MUSIC is an algorithm used for frequency estimation and radio direction finding.

Compressed sensing is a signal processing technique for efficiently acquiring and reconstructing a signal, by finding solutions to underdetermined linear systems. This is based on the principle that, through optimization, the sparsity of a signal can be exploited to recover it from far fewer samples than required by the Nyquist–Shannon sampling theorem. There are two conditions under which recovery is possible. The first one is sparsity, which requires the signal to be sparse in some domain. The second one is incoherence, which is applied through the isometric property, which is sufficient for sparse signals. Compressed sensing has applications in, for example, MRI where the incoherence condition is typically satisfied.

In mathematics, the Johnson–Lindenstrauss lemma is a result named after William B. Johnson and Joram Lindenstrauss concerning low-distortion embeddings of points from high-dimensional into low-dimensional Euclidean space. The lemma states that a set of points in a high-dimensional space can be embedded into a space of much lower dimension in such a way that distances between the points are nearly preserved. In the classical proof of the lemma, the embedding is a random orthogonal projection.

Sparse approximation theory deals with sparse solutions for systems of linear equations. Techniques for finding these solutions and exploiting them in applications have found wide use in image processing, signal processing, machine learning, medical imaging, and more.

In computer science, lattice problems are a class of optimization problems related to mathematical objects called lattices. The conjectured intractability of such problems is central to the construction of secure lattice-based cryptosystems: lattice problems are an example of NP-hard problems which have been shown to be average-case hard, providing a test case for the security of cryptographic algorithms. In addition, some lattice problems which are worst-case hard can be used as a basis for extremely secure cryptographic schemes. The use of worst-case hardness in such schemes makes them among the very few schemes that are very likely secure even against quantum computers. For applications in such cryptosystems, lattices over vector space or free modules are generally considered.

In linear algebra, the coherence or mutual coherence of a matrix A is defined as the maximum absolute value of the cross-correlations between the columns of A.

In applied mathematics and statistics, basis pursuit denoising (BPDN) refers to a mathematical optimization problem of the form

In data mining and machine learning, kq-flats algorithm is an iterative method which aims to partition m observations into k clusters where each cluster is close to a q-flat, where q is a given integer.

In applied mathematics, k-SVD is a dictionary learning algorithm for creating a dictionary for sparse representations, via a singular value decomposition approach. k-SVD is a generalization of the k-means clustering method, and it works by iteratively alternating between sparse coding the input data based on the current dictionary, and updating the atoms in the dictionary to better fit the data. It is structurally related to the expectation maximization (EM) algorithm. k-SVD can be found widely in use in applications such as image processing, audio processing, biology, and document analysis.

Proximal gradientmethods for learning is an area of research in optimization and statistical learning theory which studies algorithms for a general class of convex regularization problems where the regularization penalty may not be differentiable. One such example is regularization of the form

Sparse dictionary learning is a representation learning method which aims at finding a sparse representation of the input data in the form of a linear combination of basic elements as well as those basic elements themselves. These elements are called atoms and they compose a dictionary. Atoms in the dictionary are not required to be orthogonal, and they may be an over-complete spanning set. This problem setup also allows the dimensionality of the signals being represented to be higher than the one of the signals being observed. The above two properties lead to having seemingly redundant atoms that allow multiple representations of the same signal but also provide an improvement in sparsity and flexibility of the representation.

<span class="mw-page-title-main">L1-norm principal component analysis</span>

L1-norm principal component analysis (L1-PCA) is a general method for multivariate data analysis. L1-PCA is often preferred over standard L2-norm principal component analysis (PCA) when the analyzed data may contain outliers.

The convolutional sparse coding paradigm is an extension of the global sparse coding model, in which a redundant dictionary is modeled as a concatenation of circulant matrices. While the global sparsity constraint describes signal as a linear combination of a few atoms in the redundant dictionary , usually expressed as for a sparse vector , the alternative dictionary structure adopted by the convolutional sparse coding model allows the sparsity prior to be applied locally instead of globally: independent patches of are generated by "local" dictionaries operating over stripes of .

References

  1. Mallat, S. G.; Zhang, Z. (1993). "Matching Pursuits with Time-Frequency Dictionaries". IEEE Transactions on Signal Processing. 1993 (12): 3397–3415. Bibcode:1993ITSP...41.3397M. doi:10.1109/78.258082. S2CID   14427335.
  2. Perrinet, L. (2015). "Sparse models for Computer Vision". Biologically Inspired Computer Vision. 14: 319–346. arXiv: 1701.06859 . doi:10.1002/9783527680863.ch14. ISBN   9783527680863. S2CID   2085413.
  3. Bergeaud, F.; Mallat, S. (1995). "Matching pursuit of images". Proceedings., International Conference on Image Processing. Vol. 1. pp. 53–56. doi:10.1109/ICIP.1995.529037. ISBN   978-0-7803-3122-8. S2CID   721789.
  4. Neff, R.; Zakhor, A. (1997). "Very low bit-rate video coding based on matching pursuits". IEEE Transactions on Circuits and Systems for Video Technology. 7 (1): 158–171. doi:10.1109/76.554427. S2CID   15317511.
  5. Mendels, F.; Vandergheynst, P.; Thiran, J.P. (2006). "Matching pursuit-based shape representation and recognition using scale-space". International Journal of Imaging Systems and Technology. 16 (5): 162–180. doi:10.1002/ima.20078. S2CID   5132416.
  6. Tosic, I.; Frossard, P.; Vandergheynst, P. (2005). "Progressive coding of 3D objects based on over-complete decompositions". IEEE Transactions on Circuits and Systems for Video Technology. 16 (11): 1338–1349. doi:10.1109/tcsvt.2006.883502. S2CID   3031513.
  7. Chakraborty, Debejyo; Kovvali, Narayan; Wei, Jun; Papandreou-Suppappola, Antonia; Cochran, Douglas; Chattopadhyay, Aditi (2009). "Damage Classification Structural Health Monitoring in Bolted Structures Using Time-frequency Techniques". Journal of Intelligent Material Systems and Structures. 20 (11): 1289–1305. doi:10.1177/1045389X08100044. S2CID   109511712.
  8. Perrinet, L. U.; Samuelides, M.; Thorpe, S. (2002). "Sparse spike coding in an asynchronous feed-forward multi-layer neural network using Matching Pursuit". Neurocomputing. 57C: 125–34. doi:10.1016/j.neucom.2004.01.010.[ permanent dead link ]
  9. Lin, Jian-Liang; Hwang, Wen-Liang; Pei, Soo-Chang (2007). "Fast matching pursuit video coding by combining dictionary approximation and atom extraction". IEEE Transactions on Circuits and Systems for Video Technology. 17 (12): 1679–1689. CiteSeerX   10.1.1.671.9670 . doi:10.1109/tcsvt.2007.903120. S2CID   8315216.
  10. Wu, Yinghua; Batista, Victor S. (2003). "Matching-pursuit for simulations of quantum processes". J. Chem. Phys. 118 (15): 6720–6724. Bibcode:2003JChPh.118.6720W. doi:10.1063/1.1560636. S2CID   37544146.
  11. Perrinet, L. P. (2010). "Role of homeostasis in learning sparse representations". Neural Computation. 22 (7): 1812–1836. arXiv: 0706.3177 . doi:10.1162/neco.2010.05-08-795. PMC   2929690 . PMID   20235818.
  12. Aharon, M.; Elad, M.; Bruckstein, A.M. (2006). "The K-SVD: An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation". IEEE Transactions on Signal Processing. 54 (11): 4311–4322. Bibcode:2006ITSP...54.4311A. doi:10.1109/tsp.2006.881199. S2CID   7477309.
  13. Müller, Ralf R.; Gäde, Bernhard; Bereyhi, Ali (2021). "Linear computation coding". arXiv: 2102.00398 .{{cite journal}}: Cite journal requires |journal= (help)
  14. Pati, Y.; Rezaiifar, R.; Krishnaprasad, P. (1993). "Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition". Proceedings of 27th Asilomar Conference on Signals, Systems and Computers. pp. 40–44. CiteSeerX   10.1.1.348.5735 . doi:10.1109/acssc.1993.342465. ISBN   978-0-8186-4120-6. S2CID   16513805.{{cite book}}: |journal= ignored (help)
  15. Davis, G.; Mallat, S.; Zhang, Z. (1994). "Adaptive time-frequency decompositions with matching pursuits". Optical Engineering. 33 (7): 2183. Bibcode:1994OptEn..33.2183D. doi:10.1117/12.173207.
  16. Ding, J.; Chen, L.; Gu, Y. (2013). "Perturbation Analysis of Orthogonal Matching Pursuit". IEEE Transactions on Signal Processing. 61 (2): 398–410. arXiv: 1106.3373 . Bibcode:2013ITSP...61..398D. doi:10.1109/TSP.2012.2222377. ISSN   1941-0476. S2CID   17166658.
  17. Mather, John (1990). "The Incremental Multi-Parameter Algorithm". 1990 Conference Record Twenty-Fourth Asilomar Conference on Signals, Systems and Computers, 1990. Vol. 1. p. 368. doi:10.1109/ACSSC.1990.523362. ISBN   0-8186-2180-X. ISSN   1058-6393. S2CID   61327933.
  18. "Piecewise linear source separation", R. Gribonval, Proc. SPIE '03, 2003
  19. Tropp, Joel; Gilbert, A.; Strauss, M. (2006). "Algorithms for simultaneous sparse approximations; Part I : Greedy pursuit". Signal Proc. – Sparse Approximations in Signal and Image Processing. 86 (3): 572–588. doi:10.1016/j.sigpro.2005.05.030.
  20. Perrinet, Laurent U. (2015). "Sparse models for Computer Vision". Biologically Inspired Computer Vision. pp. 319–346. arXiv: 1701.06859 . doi:10.1002/9783527680863.ch14. ISBN   9783527680863. S2CID   2085413.
  21. Tropp, Joel A.; Gilbert, Anna C. (2007). "Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit" (PDF). IEEE Transactions on Information Theory. 53 (12): 4655–4666. doi:10.1109/tit.2007.909108. S2CID   6261304.
  22. Donoho, David L.; Tsaig, Yaakov; Drori, Iddo; Jean-luc, Starck (2006). "Sparse solution of underdetermined linear equations by stagewise orthogonal matching pursuit". IEEE Transactions on Information Theory. 58 (2): 1094–1121. doi:10.1109/tit.2011.2173241. S2CID   7923170.
  23. Needell, D.; Tropp, J.A. (2009). "CoSaMP: Iterative signal recovery from incomplete and inaccurate samples". Applied and Computational Harmonic Analysis. 26 (3): 301–321. arXiv: 0803.2392 . doi:10.1016/j.acha.2008.07.002. S2CID   1642637.
  24. Wang, J.; Kwon, S.; Shim, B. (2012). "Generalized Orthogonal Matching Pursuit". IEEE Transactions on Signal Processing. 60 (12): 6202–6216. arXiv: 1111.6664 . Bibcode:2012ITSP...60.6202J. doi:10.1109/TSP.2012.2218810. S2CID   2585677.
  25. Kwon, S.; Wang, J.; Shim, B. (2014). "Multipath Matching Pursuit". IEEE Transactions on Information Theory. 60 (5): 2986–3001. arXiv: 1308.4791 . doi:10.1109/TIT.2014.2310482. S2CID   15134308.