Babel function

Last updated

The Babel function (also known as cumulative coherence) measures the maximum total coherence between a fixed atom and a collection of other atoms in a dictionary. The Babel function was conceived of in the context of signals for which there exists a sparse representation consisting of atoms or columns of a redundant dictionary matrix, A.

Contents

Definition and formulation

The Babel function of a dictionary with normalized columns is a real-valued function that is defined as

where are the columns (atoms) of the dictionary . [1] [2]

Special case

When p=1, the babel function is the mutual coherence.

Practical Applications

Li and Lin have used the Babel function to aid in creating effective dictionaries for Machine Learning applications. [3]

Related Research Articles

In physics, coherence length is the propagation distance over which a coherent wave maintains a specified degree of coherence. Wave interference is strong when the paths taken by all of the interfering waves differ by less than the coherence length. A wave with a longer coherence length is closer to a perfect sinusoidal wave. Coherence length is important in holography and telecommunications engineering.

In physics, the CHSH inequality can be used in the proof of Bell's theorem, which states that certain consequences of entanglement in quantum mechanics cannot be reproduced by local hidden-variable theories. Experimental verification of the inequality being violated is seen as confirmation that nature cannot be described by such theories. CHSH stands for John Clauser, Michael Horne, Abner Shimony, and Richard Holt, who described it in a much-cited paper published in 1969. They derived the CHSH inequality, which, as with John Stewart Bell's original inequality, is a constraint on the statistical occurrence of "coincidences" in a Bell test which is necessarily true if there exist underlying local hidden variables, an assumption that is sometimes termed local realism. In practice, the inequality is routinely violated by modern experiments in quantum mechanics.

In mathematics and computing, the Levenberg–Marquardt algorithm, also known as the damped least-squares (DLS) method, is used to solve non-linear least squares problems. These minimization problems arise especially in least squares curve fitting. The LMA interpolates between the Gauss–Newton algorithm (GNA) and the method of gradient descent. The LMA is more robust than the GNA, which means that in many cases it finds a solution even if it starts very far off the final minimum. For well-behaved functions and reasonable starting parameters, the LMA tends to be slower than the GNA. LMA can also be viewed as Gauss–Newton using a trust region approach.

<span class="mw-page-title-main">Gauss–Newton algorithm</span> Mathematical algorithm

The Gauss–Newton algorithm is used to solve non-linear least squares problems, which is equivalent to minimizing a sum of squared function values. It is an extension of Newton's method for finding a minimum of a non-linear function. Since a sum of squares must be nonnegative, the algorithm can be viewed as using Newton's method to iteratively approximate zeroes of the components of the sum, and thus minimizing the sum. In this sense, the algorithm is also an effective method for solving overdetermined systems of equations. It has the advantage that second derivatives, which can be challenging to compute, are not required.

In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of a specified class of probability distributions. According to the principle of maximum entropy, if nothing is known about a distribution except that it belongs to a certain class, then the distribution with the largest entropy should be chosen as the least-informative default. The motivation is twofold: first, maximizing entropy minimizes the amount of prior information built into the distribution; second, many physical systems tend to move towards maximal entropy configurations over time.

In probability theory the hypoexponential distribution or the generalized Erlang distribution is a continuous distribution, that has found use in the same fields as the Erlang distribution, such as queueing theory, teletraffic engineering and more generally in stochastic processes. It is called the hypoexponetial distribution as it has a coefficient of variation less than one, compared to the hyper-exponential distribution which has coefficient of variation greater than one and the exponential distribution which has coefficient of variation of one.

In mathematics, the Bauer–Fike theorem is a standard result in the perturbation theory of the eigenvalue of a complex-valued diagonalizable matrix. In its substance, it states an absolute upper bound for the deviation of one perturbed matrix eigenvalue from a properly chosen eigenvalue of the exact matrix. Informally speaking, what it says is that the sensitivity of the eigenvalues is estimated by the condition number of the matrix of eigenvectors.

A phase-type distribution is a probability distribution constructed by a convolution or mixture of exponential distributions. It results from a system of one or more inter-related Poisson processes occurring in sequence, or phases. The sequence in which each of the phases occurs may itself be a stochastic process. The distribution can be represented by a random variable describing the time until absorption of a Markov process with one absorbing state. Each of the states of the Markov process represents one of the phases.

<span class="mw-page-title-main">Hyperelastic material</span>

A hyperelastic or Green elastic material is a type of constitutive model for ideally elastic material for which the stress–strain relationship derives from a strain energy density function. The hyperelastic material is a special case of a Cauchy elastic material.

Bayesian linear regression is a type of conditional modeling in which the mean of one variable is described by a linear combination of other variables, with the goal of obtaining the posterior probability of the regression coefficients and ultimately allowing the out-of-sample prediction of the regressandconditional on observed values of the regressors. The simplest and most widely used version of this model is the normal linear model, in which given is distributed Gaussian. In this model, and under a particular choice of prior probabilities for the parameters—so-called conjugate priors—the posterior can be found analytically. With more arbitrarily chosen priors, the posteriors generally have to be approximated.

Linear Programming Boosting (LPBoost) is a supervised classifier from the boosting family of classifiers. LPBoost maximizes a margin between training samples of different classes and hence also belongs to the class of margin-maximizing supervised classification algorithms. Consider a classification function

Compressed sensing is a signal processing technique for efficiently acquiring and reconstructing a signal, by finding solutions to underdetermined linear systems. This is based on the principle that, through optimization, the sparsity of a signal can be exploited to recover it from far fewer samples than required by the Nyquist–Shannon sampling theorem. There are two conditions under which recovery is possible. The first one is sparsity, which requires the signal to be sparse in some domain. The second one is incoherence, which is applied through the isometric property, which is sufficient for sparse signals.

Sparse approximation theory deals with sparse solutions for systems of linear equations. Techniques for finding these solutions and exploiting them in applications have found wide use in image processing, signal processing, machine learning, medical imaging, and more.

In statistics, principal component regression (PCR) is a regression analysis technique that is based on principal component analysis (PCA). More specifically, PCR is used for estimating the unknown regression coefficients in a standard linear regression model.

In linear algebra, the coherence or mutual coherence of a matrix A is defined as the maximum absolute value of the cross-correlations between the columns of A.

In mathematics, the method of steepest descent or saddle-point method is an extension of Laplace's method for approximating an integral, where one deforms a contour integral in the complex plane to pass near a stationary point, in roughly the direction of steepest descent or stationary phase. The saddle-point approximation is used with integrals in the complex plane, whereas Laplace’s method is used with real integrals.

The acoustoelastic effect is how the sound velocities of an elastic material change if subjected to an initial static stress field. This is a non-linear effect of the constitutive relation between mechanical stress and finite strain in a material of continuous mass. In classical linear elasticity theory small deformations of most elastic materials can be described by a linear relation between the applied stress and the resulting strain. This relationship is commonly known as the generalised Hooke's law. The linear elastic theory involves second order elastic constants and yields constant longitudinal and shear sound velocities in an elastic material, not affected by an applied stress. The acoustoelastic effect on the other hand include higher order expansion of the constitutive relation between the applied stress and resulting strain, which yields longitudinal and shear sound velocities dependent of the stress state of the material. In the limit of an unstressed material the sound velocities of the linear elastic theory are reproduced.

<span class="mw-page-title-main">Sparse dictionary learning</span> Representation learning method

Sparse coding is a representation learning method which aims at finding a sparse representation of the input data in the form of a linear combination of basic elements as well as those basic elements themselves. These elements are called atoms and they compose a dictionary. Atoms in the dictionary are not required to be orthogonal, and they may be an over-complete spanning set. This problem setup also allows the dimensionality of the signals being represented to be higher than the one of the signals being observed. The above two properties lead to having seemingly redundant atoms that allow multiple representations of the same signal but also provide an improvement in sparsity and flexibility of the representation.

Structured sparsity regularization is a class of methods, and an area of research in statistical learning theory, that extend and generalize sparsity regularization learning methods. Both sparsity and structured sparsity regularization methods seek to exploit the assumption that the output variable to be learned can be described by a reduced number of variables in the input space . Sparsity regularization methods focus on selecting the input variables that best describe the output. Structured sparsity regularization methods generalize and extend sparsity regularization methods, by allowing for optimal selection over structures like groups or networks of input variables in .

The convolutional sparse coding paradigm is an extension of the global sparse coding model, in which a redundant dictionary is modeled as a concatenation of circulant matrices. While the global sparsity constraint describes signal as a linear combination of a few atoms in the redundant dictionary , usually expressed as for a sparse vector , the alternative dictionary structure adopted by the convolutional sparse coding model allows the sparsity prior to be applied locally instead of globally: independent patches of are generated by "local" dictionaries operating over stripes of .

References

  1. Joel A. Tropp (2004). "Greed is good: Algorithmic results for sparse approximation" (PDF). IEEE Trans. Inform. Theory. 50 (10): 2231–2242. CiteSeerX   10.1.1.84.5256 . doi:10.1109/TIT.2004.834793. S2CID   675692.
  2. Just Relax: Convex Programming Methods for Identifying Sparse Signals in Noise
  3. Huan Li and Zhouchen Lin. "Construction of Incoherent Dictionaries via Direct Babel Function Minimization" (PDF).

See also