Verification-based message-passing algorithms in compressed sensing

Last updated

Verification-based message-passing algorithms (VB-MPAs) in compressed sensing (CS), a branch of digital signal processing that deals with measuring sparse signals, are some methods to efficiently solve the recovery problem in compressed sensing. One of the main goal in compressed sensing is the recovery process. Generally speaking, recovery process in compressed sensing is a method by which the original signal is estimated using the knowledge of the compressed signal and the measurement matrix. [1] Mathematically, the recovery process in Compressed Sensing is finding the sparsest possible solution of an under-determined system of linear equations. Based on the nature of the measurement matrix one can employ different reconstruction methods. If the measurement matrix is also sparse, one efficient way is to use Message Passing Algorithms for signal recovery. Although there are message passing approaches that deals with dense matrices, the nature of those algorithms are to some extent different from the algorithms working on sparse matrices. [1] [2]

Contents

Overview

The main problem in recovery process in CS is to find the sparsest possible solution to the following under-determined system of linear equations where is the measurement matrix, is the original signal to be recovered and is the compresses known signal. When the matrix is sparse, one can represent this matrix by a bipartite graph for better understanding. [2] [3] [4] [5] is the set of variable nodes in which represents the set of elements of and also is the set of check nodes corresponding to the set of elements of . Besides, there is an edge between and if the corresponding elements in is non-zero, i.e. . Moreover, the weight of the edge . [6] Here is an example of a binary sparse measurement matrix where the weights of the edges are either zero or one.

bi-regular bipartite graph corresponding to the measurement matrix A Example1 VB MPA.png
bi-regular bipartite graph corresponding to the measurement matrix A

The basic idea behind message passing algorithms in CS is to transmit appropriate messages between variable nodes and check nodes in an iterative manner in order to efficiently find signal . These messages are different for variable nodes and check nodes. However, the basic nature of the messages for all variable node and check nodes are the same in all of the verification based message passing algorithms. [6] The messages emanating from variable node contains the value of the check node and an indicator which shows if the variable node is verified or not. Moreover, the messages emanating from check node contains the value of the check node and the remaining degree of the check node in the graph. [6] [7]

In each iteration, every variable node and check node produce a new message to be transmitted to all of its neighbors based on the messages that they have received from their own neighbors. This local property of the message passing algorithms enables them to be implemented as parallel processing algorithms and makes the time complexity of these algorithm so efficient. [8]

Message passing rules [9]

The common rule between all verification based message passing algorithms is the fact that once a variable node become verified then this variable node can be removed from the graph and the algorithm can be executed to solve the rest of the graph. Different verification bases message passing algorithms use different combinations of verification rules. [6]

The verification rules are as follows:

The message passing rules given above are the basic and only rules that should be used in any verification based message passing algorithm. It is shown that these simple rules can efficiently recover the original signal provided that certain conditions are satisfied. [8] [6]

Algorithms

There are four algorithms known as VB-MPA's, namely Genie, LM, XH, and SBB. [6] All of these algorithms use the same strategy for recovery of the original signal; however, they use different combination of the message passing rules to verify variable nodes.

Genie Algorithm [6]

Genie algorithm is the benchmark in this topic. Firstly, Genie algorithm is assumed to have the knowledge of the support set of the signal, i.e. the set of non-zero elements of the original signal. Using this knowledge, Genie should not care about the zero variable nodes in the graph, and the only task of the Genie algorithm is to recover the values of the non-zero elements of the original signal. Although, Genie does not have any practical aspect, it can be regarded as the benchmark of the problem especially in the sense that this algorithm outperforms other algorithms in this category and one can measure how successful one algorithms is by comparing that to the Genie algorithm.

Since Genie only wants to find the value of the non-zero elements of the signal it is not necessary to employ rules that are responsible for zero valued variable node in this algorithm. Therefore, Genie only uses D1CN as the verification rule.

LM algorithm [6] [8]

This algorithm unlike the Genie algorithm does not have any knowledge about the support set of signal, and it uses D1CN and ZCN together to solve the recovery process in CS. In fact, ZCN is the rule that attempts to verify the zero valued variable nodes and D1CN is responsible for non-zero valued variable nodes. This usage of this algorithm is when one does not have non-binary matrix. In such cases, employing the third rule violated the locality nature of the algorithms. This issue will be considered in SBB algorithm. [6]

XH algorithm [6]

This algorithm is the same as LM, but it only uses ECN instead of D1CN for the verification of the non-zero variable nodes. If the non-zero elements of the measurement matrix are binary, then this algorithm cannot be implemented efficiently and the locality of the algorithm will be violated.

SBB algorithm [6] [9]

The most powerful practical algorithm among all of the verification message passing algorithms is the SBB algorithm that employs all of the verification rules for the recovery of the original signal. In this algorithm, D1CN and ECN are responsible for the verification of the non-zero elements of the signal and ZCN and ECN will verify zero variable nodes.

The pseudo code of the VB-MPAs is as follows. In the following algorithm represents the component of the messages emanating from variable and check nodes. is in fact a variable that keeps the labels of the verified variable nodes. is also used to keep the set of verified variable nodes in the previous iteration. By using these two variables one can see if there is any progress in the number of verified variable nodes in the algorithm, and if there is no progress then the algorithm will terminate. [6] [9]

1  function VB_MPA(Measurement Matrix A, Compressed Vector y): [7]  2                 // Initializations 3                      // Initializations 4                                           // Initializations 5                                             // Initializations 6      While()               // Main Loop 7           9          /*============================= Half round 1 of round 1 ============================ */     10         for every  11              12              13              14         end for 15         /*============================= Half round 2 of round 1 ============================ */ 16         for every  17             update_rule(v,Algorithm) 18             If a variable node v is verified then 19                 add v to VN 20             end if 21         end for 22         /*============================= Half round 1 of round 2 ============================ */ 23         for every  24              25              26              27         end for 28         /*============================= Half round 2 of round 2 ============================ */ 29         for every  30             ifthen 31                  32                 add v to VN 33             end if 34         end for 35     end while  36     return
SBB Algorithm Output HO8M5R.gif
SBB Algorithm

In all of the algorithms the messages emanating from check nodes are the same; however, since the verification rules are different for different algorithms the messages produced by variable nodes will be different in each algorithm. [6] The algorithm given above works for all of the VB-MPA's, and different algorithms use different rules in half round 2 of round 1 and 2. For instance, Genie algorithm uses D1CN rule in Half round 2 of round 1, and in fact the half round 2 of round 2 which uses ZCN rule is useless in Genie algorithm. LM algorithm uses D1CN in Half round 2 of round 1 and XH algorithm uses ECN rule in this stage instead of D1CN. SBB algorithm also uses both D1CN and ECN rule in the second half round of round 1. All of these rules can be efficiently implemented in update_rule function in the second half round of round 1.

Proof of correctness

Although there is no guarantee that these algorithms succeed in all of the cases but we can guarantee that if some of the variable nodes become verified during these algorithms then the values of those variable nodes are correct almost surely. In order to show that it is enough to show that all of the verification rules work perfectly and without false verification. [6] [8]

Correctness of ZCN

The algebraic point of view of ZCN rule is that if in a system of linear equations the right hand side of the equation is zero then almost surely all of the unknowns in that equations are zero. This is due to the fact that the original signal is assumed to be sparse, besides, we also should have the assumption that the non-zero elements of the signals are chosen form a continuous distribution. Suppose that there are variables in that equation, if some of them in elements are non-zero then the other variable node value should have exactly the negative value of the summation of those variable nodes. If the non-zero elements of the original signal are chosen from a continuous distribution then the probability of this to occur is zero. Therefore, ZCN rule works perfectly. [6] [8]

Correctness of D1CN

D1CN says that if a variable node is the only unknown variable in an equation then the value of that variable equals the right hand side of that equation. In fact, an equation with just one unknown variable is a check node with degree one, i.e. a check node with just one unverified variable node in its neighborhood. [6] [8]

Correctness of ECN

This rule has two parts, the first part deals with non-zero elements of the signal while the second one is responsible for the zero elements of the original signal. For the first part, it says that if we have two or more equations with the same right hand side, and if we only have one single unknown variable common in all of those equations then the value of this common variable should be the value of the right hand side of those equations. Besides, it says that all other variables in those equations should be zero. Suppose that one of those variables is not zero, then the right hand side of the equation which contains both should be (For simplicity assume that the edge weights are all 1 or zero). Besides, since we know that is the only unique variable in all of these equations then there should be one equation in which exists and does not exist. On the other hand, we know that the right hand side of these equations are the same; therefore, the right hand side of equation should also be . If we remove from this equation we should have the summation of some unknown variables to be a non-zero value . Since the non-zero elements of are chosen randomly from a continuous distribution the probability that this summation equals exactly is zero. Therefore, almost surely the value of is zero and all other variables in these equations have value zero. [6] [8] [7]

There is just one scenario remained for the second part of the ECN rule as most of it has been covered in the first part. This scenario is the one that we have some equations with the same right hand side but there is two or more variable node common in all of those equations. In this case, we can say nothing about those common variable nodes; however, we can say that all the other variable nodes in those equations are zero. The proof of this claim can be achieved by a change of variable in those equations. Suppose that are the common variable nodes in those equations. If we set then the problem will be changed to the first part where we only have one common variable node in all of those equations. Therefore, with the same reasoning as in the first part we can see that all other variable nodes that are not common in all of those equations can be verified with value zero almost surely. [6] [8] [7]

When the non-zero elements of the measurement matrix are chosen randomly from a continuous distribution, then it can be shown that if one variable node receives equal messages divided by the edge weights from its neighbors then this variable node is the only unique variable connected to all of those check nodes, therefore, the rule can be applied using a local decision approach, and the variable node can verify itself without further knowledge about the other connections of those check nodes. Moreover, the second part of the ECN rule is not necessary to be implemented as the non-zero verified variable node in the ECN rule will be removed from the bipartite graph in the next iteration and ZCN rule will be enough to verify all the zero valued variable nodes remained from those equations with the same right hand side. All in all, when the non-zero elements of the measurement matrix are chosen form a continuous distribution then the SBB and XH algorithm that use ECN rule can be implemented efficiently. [6]

Run-time analysis

Every minor loop in the main loop of the algorithm can be executed in parallel processors, if we consider each variable and check node as a separate processor. Therefore, every minor loop in the algorithm can be executed in constant time . Moreover, since the algorithm will terminate when there is no progress in verification of the variable nodes then the if in the worst case in each iteration of the main loop there is only one variable node to be verified, then the maximum number of times that the main loop will be executed is . Therefore, the whole algorithm will be executed in time. [7]

Related Research Articles

<span class="mw-page-title-main">Entropy (information theory)</span> Expected amount of information needed to specify the output of a stochastic data source

In information theory, the entropy of a random variable quantifies the average level of uncertainty or information associated with the variable's potential states or possible outcomes. This measures the expected amount of information needed to describe the state of the variable, considering the distribution of probabilities across all potential states. Given a discrete random variable , which takes values in the set and is distributed according to , the entropy is where denotes the sum over the variable's possible values. The choice of base for , the logarithm, varies for different applications. Base 2 gives the unit of bits, while base e gives "natural units" nat, and base 10 gives units of "dits", "bans", or "hartleys". An equivalent definition of entropy is the expected value of the self-information of a variable.

<span class="mw-page-title-main">Normal distribution</span> Probability distribution

In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is The parameter is the mean or expectation of the distribution, while the parameter is the variance. The standard deviation of the distribution is . A random variable with a Gaussian distribution is said to be normally distributed, and is called a normal deviate.

An adaptive filter is a system with a linear filter that has a transfer function controlled by variable parameters and a means to adjust those parameters according to an optimization algorithm. Because of the complexity of the optimization algorithms, almost all adaptive filters are digital filters. Adaptive filters are required for some applications because some parameters of the desired processing operation are not known in advance or are changing. The closed loop adaptive filter uses feedback in the form of an error signal to refine its transfer function.

<span class="mw-page-title-main">Expectation–maximization algorithm</span> Iterative method for finding maximum likelihood estimates in statistical models

In statistics, an expectation–maximization (EM) algorithm is an iterative method to find (local) maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables. The EM iteration alternates between performing an expectation (E) step, which creates a function for the expectation of the log-likelihood evaluated using the current estimate for the parameters, and a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step. These parameter-estimates are then used to determine the distribution of the latent variables in the next E step. It can be used, for example, to estimate a mixture of gaussians, or to solve the multiple linear regression problem.

In computer science, cycle detection or cycle finding is the algorithmic problem of finding a cycle in a sequence of iterated function values.

Belief propagation, also known as sum–product message passing, is a message-passing algorithm for performing inference on graphical models, such as Bayesian networks and Markov random fields. It calculates the marginal distribution for each unobserved node, conditional on any observed nodes. Belief propagation is commonly used in artificial intelligence and information theory, and has demonstrated empirical success in numerous applications, including low-density parity-check codes, turbo codes, free energy approximation, and satisfiability.

A Hopfield network is a spin glass system used to model neural networks, based on Ernst Ising's work with Wilhelm Lenz on the Ising model of magnetic materials. Hopfield networks were first described with respect to recurrent neural networks independently by Kaoru Nakano in 1971 and Shun'ichi Amari in 1972, and with respect to biological neural networks by William Little in 1974, and were popularised by John Hopfield in 1982. Hopfield networks serve as content-addressable ("associative") memory systems with binary threshold nodes, or with continuous variables. Hopfield networks also provide a model for understanding human memory.

<span class="mw-page-title-main">Linear discriminant analysis</span> Method used in statistics, pattern recognition, and other fields

Linear discriminant analysis (LDA), normal discriminant analysis (NDA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistics and other fields, to find a linear combination of features that characterizes or separates two or more classes of objects or events. The resulting combination may be used as a linear classifier, or, more commonly, for dimensionality reduction before later classification.

Least mean squares (LMS) algorithms are a class of adaptive filter used to mimic a desired filter by finding the filter coefficients that relate to producing the least mean square of the error signal. It is a stochastic gradient descent method in that the filter is only adapted based on the error at the current time. It was invented in 1960 by Stanford University professor Bernard Widrow and his first Ph.D. student, Ted Hoff, based on their research in single-layer neural networks (ADALINE). Specifically, they used gradient descent to train ADALINE to recognize patterns, and called the algorithm "delta rule". They then applied the rule to filters, resulting in the LMS algorithm.

In mathematics, a change of variables is a basic technique used to simplify problems in which the original variables are replaced with functions of other variables. The intent is that when expressed in new variables, the problem may become simpler, or equivalent to a better understood problem.

In computer programming languages, a recursive data type is a data type for values that may contain other values of the same type. Data of recursive types are usually viewed as directed graphs.

<span class="mw-page-title-main">Bifurcation theory</span> Study of sudden qualitative behavior changes caused by small parameter changes

Bifurcation theory is the mathematical study of changes in the qualitative or topological structure of a given family of curves, such as the integral curves of a family of vector fields, and the solutions of a family of differential equations. Most commonly applied to the mathematical study of dynamical systems, a bifurcation occurs when a small smooth change made to the parameter values of a system causes a sudden 'qualitative' or topological change in its behavior. Bifurcations occur in both continuous systems and discrete systems.

Covariance matrix adaptation evolution strategy (CMA-ES) is a particular kind of strategy for numerical optimization. Evolution strategies (ES) are stochastic, derivative-free methods for numerical optimization of non-linear or non-convex continuous optimization problems. They belong to the class of evolutionary algorithms and evolutionary computation. An evolutionary algorithm is broadly based on the principle of biological evolution, namely the repeated interplay of variation and selection: in each generation (iteration) new individuals are generated by variation of the current parental individuals, usually in a stochastic way. Then, some individuals are selected to become the parents in the next generation based on their fitness or objective function value . Like this, individuals with better and better -values are generated over the generation sequence.

Compressed sensing is a signal processing technique for efficiently acquiring and reconstructing a signal, by finding solutions to underdetermined linear systems. This is based on the principle that, through optimization, the sparsity of a signal can be exploited to recover it from far fewer samples than required by the Nyquist–Shannon sampling theorem. There are two conditions under which recovery is possible. The first one is sparsity, which requires the signal to be sparse in some domain. The second one is incoherence, which is applied through the isometric property, which is sufficient for sparse signals. Compressed sensing has applications in, for example, MRI where the incoherence condition is typically satisfied.

BIRCH is an unsupervised data mining algorithm used to perform hierarchical clustering over particularly large data-sets. With modifications it can also be used to accelerate k-means clustering and Gaussian mixture modeling with the expectation–maximization algorithm. An advantage of BIRCH is its ability to incrementally and dynamically cluster incoming, multi-dimensional metric data points in an attempt to produce the best quality clustering for a given set of resources. In most cases, BIRCH only requires a single scan of the database.

In queueing theory, a discipline within the mathematical theory of probability, mean value analysis (MVA) is a recursive technique for computing expected queue lengths, waiting time at queueing nodes and throughput in equilibrium for a closed separable system of queues. The first approximate techniques were published independently by Schweitzer and Bard, followed later by an exact version by Lavenberg and Reiser published in 1980.

In mathematics, the Beltrami equation, named after Eugenio Beltrami, is the partial differential equation

In queueing theory, a discipline within the mathematical theory of probability, the backpressure routing algorithm is a method for directing traffic around a queueing network that achieves maximum network throughput, which is established using concepts of Lyapunov drift. Backpressure routing considers the situation where each job can visit multiple service nodes in the network. It is an extension of max-weight scheduling where each job visits only a single service node.

The generalized distributive law (GDL) is a generalization of the distributive property which gives rise to a general message passing algorithm. It is a synthesis of the work of many authors in the information theory, digital communications, signal processing, statistics, and artificial intelligence communities. The law and algorithm were introduced in a semi-tutorial by Srinivas M. Aji and Robert J. McEliece with the same title.

In machine learning, the kernel embedding of distributions comprises a class of nonparametric methods in which a probability distribution is represented as an element of a reproducing kernel Hilbert space (RKHS). A generalization of the individual data-point feature mapping done in classical kernel methods, the embedding of distributions into infinite-dimensional feature spaces can preserve all of the statistical features of arbitrary distributions, while allowing one to compare and manipulate distributions using Hilbert space operations such as inner products, distances, projections, linear transformations, and spectral analysis. This learning framework is very general and can be applied to distributions over any space on which a sensible kernel function may be defined. For example, various kernels have been proposed for learning from data which are: vectors in , discrete classes/categories, strings, graphs/networks, images, time series, manifolds, dynamical systems, and other structured objects. The theory behind kernel embeddings of distributions has been primarily developed by Alex Smola, Le Song , Arthur Gretton, and Bernhard Schölkopf. A review of recent works on kernel embedding of distributions can be found in.

References

  1. 1 2 D. L. Donoho, A. Javanmard, and A. Montanari, “Information-theoretically optimal compressed sensing via spatial coupling and approximate message passing,” in Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on, 2012, pp. 1231–1235.
  2. 1 2 Chandar, Venkat, Devavrat Shah, and Gregory W. Wornell. "A simple message-passing algorithm for compressed sensing." Information Theory Proceedings (ISIT), 2010 IEEE International Symposium on. IEEE, 2010.
  3. Indyk, Piotr. "Explicit constructions for compressed sensing of sparse signals." Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics, 2008.
  4. Gilbert, Anna C., et al. "One sketch for all: fast algorithms for compressed sensing." Proceedings of the thirty-ninth annual ACM symposium on Theory of computing. ACM, 2007.
  5. Sarvotham, Shriram, Dror Baron, and Richard G. Baraniuk. "Sudocodes-Fast Measurement and Reconstruction of Sparse Signals." Information Theory, 2006 IEEE International Symposium on. IEEE, 2006.
  6. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Y. Eftekhari, A. Heidarzadeh, A. H. Banihashemi, and I. Lambadaris, “Density evolution analysis of node-based verification-based algorithms in compressed sensing,” Information Theory, IEEE Transactions on, vol. 58, no. 10, pp. 6616–6645, 2012.
  7. 1 2 3 4 5 6 7 Farhangdoust, Seyed Mohammad Ebrhiam. "Message Passing Algorithms" (PDF).
  8. 1 2 3 4 5 6 7 8 9 10 11 F. Zhang and H. D. Pfister, “On the iterative decoding of high-rate LDPC codes with applications in compressed sensing,” arXiv preprint arXiv:0903.2232, 2009.
  9. 1 2 3 Luby, Michael G., and Michael Mitzenmacher. "Verification-based decoding for packet-based low-density parity-check codes." IEEE Transactions on Information Theory 51.1 (2005): 120-127.