Compressed sensing (also known as compressive sensing, compressive sampling, or sparse sampling) is a signal processing technique for efficiently acquiring and reconstructing a signal, by finding solutions to underdetermined linear systems. This is based on the principle that, through optimization, the sparsity of a signal can be exploited to recover it from far fewer samples than required by the Nyquist–Shannon sampling theorem. There are two conditions under which recovery is possible. The first one is sparsity, which requires the signal to be sparse in some domain. The second one is incoherence, which is applied through the isometric property, which is sufficient for sparse signals. [1] [2] Compressed sensing has applications in, for example, MRI where the incoherence condition is typically satisfied. [3]
A common goal of the engineering field of signal processing is to reconstruct a signal from a series of sampling measurements. In general, this task is impossible because there is no way to reconstruct a signal during the times that the signal is not measured. Nevertheless, with prior knowledge or assumptions about the signal, it turns out to be possible to perfectly reconstruct a signal from a series of measurements (acquiring this series of measurements is called sampling). Over time, engineers have improved their understanding of which assumptions are practical and how they can be generalized.
An early breakthrough in signal processing was the Nyquist–Shannon sampling theorem. It states that if a real signal's highest frequency is less than half of the sampling rate, then the signal can be reconstructed perfectly by means of sinc interpolation. The main idea is that with prior knowledge about constraints on the signal's frequencies, fewer samples are needed to reconstruct the signal.
Around 2004, Emmanuel Candès, Justin Romberg, Terence Tao, and David Donoho proved that given knowledge about a signal's sparsity, the signal may be reconstructed with even fewer samples than the sampling theorem requires. [4] [5] This idea is the basis of compressed sensing.
Compressed sensing relies on techniques, which several other scientific fields have used historically. [6] In statistics, the least squares method was complemented by the -norm, which was introduced by Laplace. Following the introduction of linear programming and Dantzig's simplex algorithm, the -norm was used in computational statistics. In statistical theory, the -norm was used by George W. Brown and later writers on median-unbiased estimators. It was used by Peter J. Huber and others working on robust statistics. The -norm was also used in signal processing, for example, in the 1970s, when seismologists constructed images of reflective layers within the earth based on data that did not seem to satisfy the Nyquist–Shannon criterion. [7] It was used in matching pursuit in 1993, the LASSO estimator by Robert Tibshirani in 1996 [8] and basis pursuit in 1998. [9]
At first glance, compressed sensing might seem to violate the sampling theorem, because compressed sensing depends on the sparsity of the signal in question and not its highest frequency. This is a misconception, because the sampling theorem guarantees perfect reconstruction given sufficient, not necessary, conditions. A sampling method fundamentally different from classical fixed-rate sampling cannot "violate" the sampling theorem. Sparse signals with high frequency components can be highly under-sampled using compressed sensing compared to classical fixed-rate sampling. [10]
An underdetermined system of linear equations has more unknowns than equations and generally has an infinite number of solutions. The figure below shows such an equation system where we want to find a solution for .
In order to choose a solution to such a system, one must impose extra constraints or conditions (such as smoothness) as appropriate. In compressed sensing, one adds the constraint of sparsity, allowing only solutions which have a small number of nonzero coefficients. Not all underdetermined systems of linear equations have a sparse solution. However, if there is a unique sparse solution to the underdetermined system, then the compressed sensing framework allows the recovery of that solution.
Compressed sensing takes advantage of the redundancy in many interesting signals—they are not pure noise. In particular, many signals are sparse, that is, they contain many coefficients close to or equal to zero, when represented in some domain. [11] This is the same insight used in many forms of lossy compression.
Compressed sensing typically starts with taking a weighted linear combination of samples also called compressive measurements in a basis different from the basis in which the signal is known to be sparse. The results found by Emmanuel Candès, Justin Romberg, Terence Tao, and David Donoho showed that the number of these compressive measurements can be small and still contain nearly all the useful information. Therefore, the task of converting the image back into the intended domain involves solving an underdetermined matrix equation since the number of compressive measurements taken is smaller than the number of pixels in the full image. However, adding the constraint that the initial signal is sparse enables one to solve this underdetermined system of linear equations.
The least-squares solution to such problems is to minimize the norm—that is, minimize the amount of energy in the system. This is usually simple mathematically (involving only a matrix multiplication by the pseudo-inverse of the basis sampled in). However, this leads to poor results for many practical applications, for which the unknown coefficients have nonzero energy.
To enforce the sparsity constraint when solving for the underdetermined system of linear equations, one can minimize the number of nonzero components of the solution. The function counting the number of non-zero components of a vector was called the "norm" by David Donoho. [note 1]
Candès et al. proved that for many problems it is probable that the norm is equivalent to the norm, in a technical sense: This equivalence result allows one to solve the problem, which is easier than the problem. Finding the candidate with the smallest norm can be expressed relatively easily as a linear program, for which efficient solution methods already exist. [13] When measurements may contain a finite amount of noise, basis pursuit denoising is preferred over linear programming, since it preserves sparsity in the face of noise and can be solved faster than an exact linear program.
Total variation can be seen as a non-negative real-valued functional defined on the space of real-valued functions (for the case of functions of one variable) or on the space of integrable functions (for the case of functions of several variables). For signals, especially, total variation refers to the integral of the absolute gradient of the signal. In signal and image reconstruction, it is applied as total variation regularization where the underlying principle is that signals with excessive details have high total variation and that removing these details, while retaining important information such as edges, would reduce the total variation of the signal and make the signal subject closer to the original signal in the problem.
For the purpose of signal and image reconstruction, minimization models are used. Other approaches also include the least-squares as has been discussed before in this article. These methods are extremely slow and return a not-so-perfect reconstruction of the signal. The current CS Regularization models attempt to address this problem by incorporating sparsity priors of the original image, one of which is the total variation (TV). Conventional TV approaches are designed to give piece-wise constant solutions. Some of these include (as discussed ahead) – constrained -minimization which uses an iterative scheme. This method, though fast, subsequently leads to over-smoothing of edges resulting in blurred image edges. [14] TV methods with iterative re-weighting have been implemented to reduce the influence of large gradient value magnitudes in the images. This has been used in computed tomography (CT) reconstruction as a method known as edge-preserving total variation. However, as gradient magnitudes are used for estimation of relative penalty weights between the data fidelity and regularization terms, this method is not robust to noise and artifacts and accurate enough for CS image/signal reconstruction and, therefore, fails to preserve smaller structures.
Recent progress on this problem involves using an iteratively directional TV refinement for CS reconstruction. [15] This method would have 2 stages: the first stage would estimate and refine the initial orientation field – which is defined as a noisy point-wise initial estimate, through edge-detection, of the given image. In the second stage, the CS reconstruction model is presented by utilizing directional TV regularizer. More details about these TV-based approaches – iteratively reweighted l1 minimization, edge-preserving TV and iterative model using directional orientation field and TV- are provided below.
In the CS reconstruction models using constrained minimization, [16] larger coefficients are penalized heavily in the norm. It was proposed to have a weighted formulation of minimization designed to more democratically penalize nonzero coefficients. An iterative algorithm is used for constructing the appropriate weights. [17] Each iteration requires solving one minimization problem by finding the local minimum of a concave penalty function that more closely resembles the norm. An additional parameter, usually to avoid any sharp transitions in the penalty function curve, is introduced into the iterative equation to ensure stability and so that a zero estimate in one iteration does not necessarily lead to a zero estimate in the next iteration. The method essentially involves using the current solution for computing the weights to be used in the next iteration.
Early iterations may find inaccurate sample estimates, however this method will down-sample these at a later stage to give more weight to the smaller non-zero signal estimates. One of the disadvantages is the need for defining a valid starting point as a global minimum might not be obtained every time due to the concavity of the function. Another disadvantage is that this method tends to uniformly penalize the image gradient irrespective of the underlying image structures. This causes over-smoothing of edges, especially those of low contrast regions, subsequently leading to loss of low contrast information. The advantages of this method include: reduction of the sampling rate for sparse signals; reconstruction of the image while being robust to the removal of noise and other artifacts; and use of very few iterations. This can also help in recovering images with sparse gradients.
In the figure shown below, P1 refers to the first-step of the iterative reconstruction process, of the projection matrix P of the fan-beam geometry, which is constrained by the data fidelity term. This may contain noise and artifacts as no regularization is performed. The minimization of P1 is solved through the conjugate gradient least squares method. P2 refers to the second step of the iterative reconstruction process wherein it utilizes the edge-preserving total variation regularization term to remove noise and artifacts, and thus improve the quality of the reconstructed image/signal. The minimization of P2 is done through a simple gradient descent method. Convergence is determined by testing, after each iteration, for image positivity, by checking if for the case when (Note that refers to the different x-ray linear attenuation coefficients at different voxels of the patient image).
This is an iterative CT reconstruction algorithm with edge-preserving TV regularization to reconstruct CT images from highly undersampled data obtained at low dose CT through low current levels (milliampere). In order to reduce the imaging dose, one of the approaches used is to reduce the number of x-ray projections acquired by the scanner detectors. However, this insufficient projection data which is used to reconstruct the CT image can cause streaking artifacts. Furthermore, using these insufficient projections in standard TV algorithms end up making the problem under-determined and thus leading to infinitely many possible solutions. In this method, an additional penalty weighted function is assigned to the original TV norm. This allows for easier detection of sharp discontinuities in intensity in the images and thereby adapt the weight to store the recovered edge information during the process of signal/image reconstruction. The parameter controls the amount of smoothing applied to the pixels at the edges to differentiate them from the non-edge pixels. The value of is changed adaptively based on the values of the histogram of the gradient magnitude so that a certain percentage of pixels have gradient values larger than . The edge-preserving total variation term, thus, becomes sparser and this speeds up the implementation. A two-step iteration process known as forward–backward splitting algorithm is used. [18] The optimization problem is split into two sub-problems which are then solved with the conjugate gradient least squares method [19] and the simple gradient descent method respectively. The method is stopped when the desired convergence has been achieved or if the maximum number of iterations is reached. [14]
Some of the disadvantages of this method are the absence of smaller structures in the reconstructed image and degradation of image resolution. This edge preserving TV algorithm, however, requires fewer iterations than the conventional TV algorithm. [14] Analyzing the horizontal and vertical intensity profiles of the reconstructed images, it can be seen that there are sharp jumps at edge points and negligible, minor fluctuation at non-edge points. Thus, this method leads to low relative error and higher correlation as compared to the TV method. It also effectively suppresses and removes any form of image noise and image artifacts such as streaking.
To prevent over-smoothing of edges and texture details and to obtain a reconstructed CS image which is accurate and robust to noise and artifacts, this method is used. First, an initial estimate of the noisy point-wise orientation field of the image , , is obtained. This noisy orientation field is defined so that it can be refined at a later stage to reduce the noise influences in orientation field estimation. A coarse orientation field estimation is then introduced based on structure tensor, which is formulated as: [20] . Here, refers to the structure tensor related with the image pixel point (i,j) having standard deviation . refers to the Gaussian kernel with standard deviation . refers to the manually defined parameter for the image below which the edge detection is insensitive to noise. refers to the gradient of the image and refers to the tensor product obtained by using this gradient. [15]
The structure tensor obtained is convolved with a Gaussian kernel to improve the accuracy of the orientation estimate with being set to high values to account for the unknown noise levels. For every pixel (i,j) in the image, the structure tensor J is a symmetric and positive semi-definite matrix. Convolving all the pixels in the image with , gives orthonormal eigen vectors ω and υ of the matrix. ω points in the direction of the dominant orientation having the largest contrast and υ points in the direction of the structure orientation having the smallest contrast. The orientation field coarse initial estimation is defined as = υ. This estimate is accurate at strong edges. However, at weak edges or on regions with noise, its reliability decreases.
To overcome this drawback, a refined orientation model is defined in which the data term reduces the effect of noise and improves accuracy while the second penalty term with the L2-norm is a fidelity term which ensures accuracy of initial coarse estimation.
This orientation field is introduced into the directional total variation optimization model for CS reconstruction through the equation: . is the objective signal which needs to be recovered. Y is the corresponding measurement vector, d is the iterative refined orientation field and is the CS measurement matrix. This method undergoes a few iterations ultimately leading to convergence. is the orientation field approximate estimation of the reconstructed image from the previous iteration (in order to check for convergence and the subsequent optical performance, the previous iteration is used). For the two vector fields represented by and , refers to the multiplication of respective horizontal and vertical vector elements of and followed by their subsequent addition. These equations are reduced to a series of convex minimization problems which are then solved with a combination of variable splitting and augmented Lagrangian (FFT-based fast solver with a closed form solution) methods. [15] It (Augmented Lagrangian) is considered equivalent to the split Bregman iteration which ensures convergence of this method. The orientation field, d is defined as being equal to , where define the horizontal and vertical estimates of .
The Augmented Lagrangian method for the orientation field, , involves initializing and then finding the approximate minimizer of with respect to these variables. The Lagrangian multipliers are then updated and the iterative process is stopped when convergence is achieved. For the iterative directional total variation refinement model, the augmented lagrangian method involves initializing . [21]
Here, are newly introduced variables where = , = , = , and = . are the Lagrangian multipliers for . For each iteration, the approximate minimizer of with respect to variables () is calculated. And as in the field refinement model, the lagrangian multipliers are updated and the iterative process is stopped when convergence is achieved.
For the orientation field refinement model, the Lagrangian multipliers are updated in the iterative process as follows:
For the iterative directional total variation refinement model, the Lagrangian multipliers are updated as follows:
Here, are positive constants.
Based on peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) metrics and known ground-truth images for testing performance, it is concluded that iterative directional total variation has a better reconstructed performance than the non-iterative methods in preserving edge and texture areas. The orientation field refinement model plays a major role in this improvement in performance as it increases the number of directionless pixels in the flat area while enhancing the orientation field consistency in the regions with edges.
The field of compressive sensing is related to several topics in signal processing and computational mathematics, such as underdetermined linear systems, group testing, heavy hitters, sparse coding, multiplexing, sparse sampling, and finite rate of innovation. Its broad scope and generality has enabled several innovative CS-enhanced approaches in signal processing and compression, solution of inverse problems, design of radiating systems, radar and through-the-wall imaging, and antenna characterization. [22] Imaging techniques having a strong affinity with compressive sensing include coded aperture and computational photography.
Conventional CS reconstruction uses sparse signals (usually sampled at a rate less than the Nyquist sampling rate) for reconstruction through constrained minimization. One of the earliest applications of such an approach was in reflection seismology which used sparse reflected signals from band-limited data for tracking changes between sub-surface layers. [23] When the LASSO model came into prominence in the 1990s as a statistical method for selection of sparse models, [24] this method was further used in computational harmonic analysis for sparse signal representation from over-complete dictionaries. Some of the other applications include incoherent sampling of radar pulses. The work by Boyd et al. [16] has applied the LASSO model- for selection of sparse models- towards analog to digital converters (the current ones use a sampling rate higher than the Nyquist rate along with the quantized Shannon representation). This would involve a parallel architecture in which the polarity of the analog signal changes at a high rate followed by digitizing the integral at the end of each time-interval to obtain the converted digital signal.
Compressed sensing has been used in an experimental mobile phone camera sensor. The approach allows a reduction in image acquisition energy per image by as much as a factor of 15 at the cost of complex decompression algorithms; the computation may require an off-device implementation. [25]
Compressed sensing is used in single-pixel cameras from Rice University. [26] Bell Labs employed the technique in a lensless single-pixel camera that takes stills using repeated snapshots of randomly chosen apertures from a grid. Image quality improves with the number of snapshots, and generally requires a small fraction of the data of conventional imaging, while eliminating lens/focus-related aberrations. [27] [28]
Compressed sensing can be used to improve image reconstruction in holography by increasing the number of voxels one can infer from a single hologram. [29] [30] [31] It is also used for image retrieval from undersampled measurements in optical [32] [33] and millimeter-wave [34] holography.
Compressed sensing has been used in facial recognition applications. [35]
Compressed sensing has been used [36] [37] to shorten magnetic resonance imaging scanning sessions on conventional hardware. [38] Reconstruction methods include
Compressed sensing addresses the issue of high scan time by enabling faster acquisition by measuring fewer Fourier coefficients. This produces a high-quality image with relatively lower scan time. Another application (also discussed ahead) is for CT reconstruction with fewer X-ray projections. Compressed sensing, in this case, removes the high spatial gradient parts – mainly, image noise and artifacts. This holds tremendous potential as one can obtain high-resolution CT images at low radiation doses (through lower current-mA settings). [42]
Compressed sensing has showed outstanding results in the application of network tomography to network management. Network delay estimation and network congestion detection can both be modeled as underdetermined systems of linear equations where the coefficient matrix is the network routing matrix. Moreover, in the Internet, network routing matrices usually satisfy the criterion for using compressed sensing. [43]
In 2013 one company announced shortwave-infrared cameras which utilize compressed sensing. [44] These cameras have light sensitivity from 0.9 μm to 1.7 μm, wavelengths invisible to the human eye.
In radio astronomy and optical astronomical interferometry, full coverage of the Fourier plane is usually absent and phase information is not obtained in most hardware configurations. In order to obtain aperture synthesis images, various compressed sensing algorithms are employed. [45] The Högbom CLEAN algorithm has been in use since 1974 for the reconstruction of images obtained from radio interferometers, which is similar to the matching pursuit algorithm mentioned above.
Compressed sensing combined with a moving aperture has been used to increase the acquisition rate of images in a transmission electron microscope. [46] In scanning mode, compressive sensing combined with random scanning of the electron beam has enabled both faster acquisition and less electron dose, which allows for imaging of electron beam sensitive materials. [47]
In numerical analysis, inverse iteration is an iterative eigenvalue algorithm. It allows one to find an approximate eigenvector when an approximation to a corresponding eigenvalue is already known. The method is conceptually similar to the power method. It appears to have originally been developed to compute resonance frequencies in the field of structural mechanics.
Stochastic gradient descent is an iterative method for optimizing an objective function with suitable smoothness properties. It can be regarded as a stochastic approximation of gradient descent optimization, since it replaces the actual gradient by an estimate thereof. Especially in high-dimensional optimization problems this reduces the very high computational burden, achieving faster iterations in exchange for a lower convergence rate.
In mathematics, statistics, finance, and computer science, particularly in machine learning and inverse problems, regularization is a process that changes the result answer to be "simpler". It is often used to obtain results for ill-posed problems or to prevent overfitting.
Least mean squares (LMS) algorithms are a class of adaptive filter used to mimic a desired filter by finding the filter coefficients that relate to producing the least mean square of the error signal. It is a stochastic gradient descent method in that the filter is only adapted based on the error at the current time. It was invented in 1960 by Stanford University professor Bernard Widrow and his first Ph.D. student, Ted Hoff, based on their research in single-layer neural networks (ADALINE). Specifically, they used gradient descent to train ADALINE to recognize patterns, and called the algorithm "delta rule". They then applied the rule to filters, resulting in the LMS algorithm.
Geometry processing is an area of research that uses concepts from applied mathematics, computer science and engineering to design efficient algorithms for the acquisition, reconstruction, analysis, manipulation, simulation and transmission of complex 3D models. As the name implies, many of the concepts, data structures, and algorithms are directly analogous to signal processing and image processing. For example, where image smoothing might convolve an intensity signal with a blur kernel formed using the Laplace operator, geometric smoothing might be achieved by convolving a surface geometry with a blur kernel formed using the Laplace-Beltrami operator.
The Frank–Wolfe algorithm is an iterative first-order optimization algorithm for constrained convex optimization. Also known as the conditional gradient method, reduced gradient algorithm and the convex combination algorithm, the method was originally proposed by Marguerite Frank and Philip Wolfe in 1956. In each iteration, the Frank–Wolfe algorithm considers a linear approximation of the objective function, and moves towards a minimizer of this linear function.
Corner detection is an approach used within computer vision systems to extract certain kinds of features and infer the contents of an image. Corner detection is frequently used in motion detection, image registration, video tracking, image mosaicing, panorama stitching, 3D reconstruction and object recognition. Corner detection overlaps with the topic of interest point detection.
In mathematics, preconditioning is the application of a transformation, called the preconditioner, that conditions a given problem into a form that is more suitable for numerical solving methods. Preconditioning is typically related to reducing a condition number of the problem. The preconditioned problem is then usually solved by an iterative method.
Sequential quadratic programming (SQP) is an iterative method for constrained nonlinear optimization which may be considered a quasi-Newton method. SQP methods are used on mathematical problems for which the objective function and the constraints are twice continuously differentiable, but not necessarily convex.
Modified Richardson iteration is an iterative method for solving a system of linear equations. Richardson iteration was proposed by Lewis Fry Richardson in his work dated 1910. It is similar to the Jacobi and Gauss–Seidel method.
Sparse approximation theory deals with sparse solutions for systems of linear equations. Techniques for finding these solutions and exploiting them in applications have found wide use in image processing, signal processing, machine learning, medical imaging, and more.
The Bregman method is an iterative algorithm to solve certain convex optimization problems involving regularization. The original version is due to Lev M. Bregman, who published it in 1967.
In signal processing, particularly image processing, total variation denoising, also known as total variation regularization or total variation filtering, is a noise removal process (filter). It is based on the principle that signals with excessive and possibly spurious detail have high total variation, that is, the integral of the image gradient magnitude is high. According to this principle, reducing the total variation of the signal—subject to it being a close match to the original signal—removes unwanted detail whilst preserving important details such as edges. The concept was pioneered by L. I. Rudin, S. Osher, and E. Fatemi in 1992 and so is today known as the ROF model.
In applied mathematics and statistics, basis pursuit denoising (BPDN) refers to a mathematical optimization problem of the form
Augmented Lagrangian methods are a certain class of algorithms for solving constrained optimization problems. They have similarities to penalty methods in that they replace a constrained optimization problem by a series of unconstrained problems and add a penalty term to the objective, but the augmented Lagrangian method adds yet another term designed to mimic a Lagrange multiplier. The augmented Lagrangian is related to, but not identical with, the method of Lagrange multipliers.
Natural evolution strategies (NES) are a family of numerical optimization algorithms for black box problems. Similar in spirit to evolution strategies, they iteratively update the (continuous) parameters of a search distribution by following the natural gradient towards higher expected fitness.
The Landweber iteration or Landweber algorithm is an algorithm to solve ill-posed linear inverse problems, and it has been extended to solve non-linear problems that involve constraints. The method was first proposed in the 1950s by Louis Landweber, and it can be now viewed as a special case of many other more general methods.
Proximal gradientmethods for learning is an area of research in optimization and statistical learning theory which studies algorithms for a general class of convex regularization problems where the regularization penalty may not be differentiable. One such example is regularization of the form
Sparse dictionary learning is a representation learning method which aims at finding a sparse representation of the input data in the form of a linear combination of basic elements as well as those basic elements themselves. These elements are called atoms and they compose a dictionary. Atoms in the dictionary are not required to be orthogonal, and they may be an over-complete spanning set. This problem setup also allows the dimensionality of the signals being represented to be higher than the one of the signals being observed. The above two properties lead to having seemingly redundant atoms that allow multiple representations of the same signal but also provide an improvement in sparsity and flexibility of the representation.
Structured sparsity regularization is a class of methods, and an area of research in statistical learning theory, that extend and generalize sparsity regularization learning methods. Both sparsity and structured sparsity regularization methods seek to exploit the assumption that the output variable to be learned can be described by a reduced number of variables in the input space . Sparsity regularization methods focus on selecting the input variables that best describe the output. Structured sparsity regularization methods generalize and extend sparsity regularization methods, by allowing for optimal selection over structures like groups or networks of input variables in .