Similarities between Wiener and LMS

Last updated August 11, 2019

The Least mean squares filter solution converges to the Wiener filter solution, assuming that the unknown system is LTI and the noise is stationary. Both filters can be used to identify the impulse response of an unknown system, knowing only the original input signal and the output of the unknown system. By relaxing the error criterion to reduce current sample error instead of minimizing the total error over all of n, the LMS algorithm can be derived from the Wiener filter.

Least mean squares (LMS) algorithms are a class of adaptive filter used to mimic a desired filter by finding the filter coefficients that relate to producing the least mean square of the error signal. It is a stochastic gradient descent method in that the filter is only adapted based on the error at the current time. It was invented in 1960 by Stanford University professor Bernard Widrow and his first Ph.D. student, Ted Hoff.

In signal processing, the Wiener filter is a filter used to produce an estimate of a desired or target random process by linear time-invariant (LTI) filtering of an observed noisy process, assuming known stationary signal and noise spectra, and additive noise. The Wiener filter minimizes the mean square error between the estimated random process and the desired process.

In mathematics and statistics, a stationary process is a stochastic process whose unconditional joint probability distribution does not change when shifted in time. Consequently, parameters such as mean and variance also do not change over time.

Derivation of the Wiener filter for system identification

Given a known input signal $s[n]$ , the output of an unknown LTI system $x[n]$ can be expressed as:

$x[n]=\sum _{k=0}^{N-1}h_{k}s[n-k]+w[n]$

where $h_{k}$ is an unknown filter tap coefficients and $w[n]$ is noise.

The model system ${\hat {x}}[n]$ , using a Wiener filter solution with an order N, can be expressed as:

${\hat {x}}[n]=\sum _{k=0}^{N-1}{\hat {h}}_{k}s[n-k]$

where ${\hat {h}}_{k}$ are the filter tap coefficients to be determined.

The error between the model and the unknown system can be expressed as:

$e[n]=x[n]-{\hat {x}}[n]$

The total squared error $E$ can be expressed as:

$E=\sum _{n=-\infty }^{\infty }e[n]^{2}$

$E=\sum _{n=-\infty }^{\infty }(x[n]-{\hat {x}}[n])^{2}$

$E=\sum _{n=-\infty }^{\infty }(x[n]^{2}-2x[n]{\hat {x}}[n]+{\hat {x}}[n]^{2})$

Use the Minimum mean-square error criterion over all of $n$ by setting its gradient to zero:

Gradient Multi-variable generalization of the derivative of a function

In vector calculus, the gradient is a multi-variable generalization of the derivative. Whereas the ordinary derivative of a function of a single variable is a scalar-valued function, the gradient of a function of several variables is a vector-valued function. Specifically, the gradient of a differentiable function $of several variables, at a point, is the vector whose components are the partial derivatives of at .$

$\nabla E=0$ which is ${\frac {\partial E}{\partial {\hat {h}}_{i}}}=0$ for all $i=0,1,2,...,N-1$

${\frac {\partial E}{\partial {\hat {h}}_{i}}}={\frac {\partial }{\partial {\hat {h}}_{i}}}\sum _{n=-\infty }^{\infty }[x[n]^{2}-2x[n]{\hat {x}}[n]+{\hat {x}}[n]^{2}]$

Substitute the definition of ${\hat {x}}[n]$ :

${\frac {\partial E}{\partial {\hat {h}}_{i}}}={\frac {\partial }{\partial {\hat {h}}_{i}}}\sum _{n=-\infty }^{\infty }[x[n]^{2}-2x[n]\sum _{k=0}^{N-1}{\hat {h}}_{k}s[n-k]+(\sum _{k=0}^{N-1}{\hat {h}}_{k}s[n-k])^{2}]$

Distribute the partial derivative:

${\frac {\partial E}{\partial {\hat {h}}_{i}}}=\sum _{n=-\infty }^{\infty }[-2x[n]s[n-i]+2(\sum _{k=0}^{N-1}{\hat {h}}_{k}s[n-k])s[n-i]]$

Using the definition of discrete cross-correlation:

In signal processing, cross-correlation is a measure of similarity of two series as a function of the displacement of one relative to the other. This is also known as a sliding dot product or sliding inner-product. It is commonly used for searching a long signal for a shorter, known feature. It has applications in pattern recognition, single particle analysis, electron tomography, averaging, cryptanalysis, and neurophysiology. The cross-correlation is similar in nature to the convolution of two functions. In an autocorrelation, which is the cross-correlation of a signal with itself, there will always be a peak at a lag of zero, and its size will be the signal energy.

$R_{xy}(i)=\sum _{n=-\infty }^{\infty }x[n]y[n-i]$

${\frac {\partial E}{\partial {\hat {h}}_{i}}}=-2R_{xs}[i]+2\sum _{k=0}^{N-1}{\hat {h}}_{k}R_{ss}[i-k]=0$

Rearrange the terms:

$R_{xs}[i]=\sum _{k=0}^{N-1}{\hat {h}}_{k}R_{ss}[i-k]$ for all $i=0,1,2,...,N-1$

This system of N equations with N unknowns can be determined.

The resulting coefficients of the Wiener filter can be determined by: $W=R_{xx}^{-1}P_{xs}$ , where $P_{xs}$ is the cross-correlation vector between $x$ and $s$ .

Derivation of the LMS algorithm

By relaxing the infinite sum of the Wiener filter to just the error at time $n$ , the LMS algorithm can be derived.

The squared error can be expressed as:

$E=(d[n]-y[n])^{2}$

Using the Minimum mean-square error criterion, take the gradient:

${\frac {\partial E}{\partial w}}={\frac {\partial }{\partial w}}(d[n]-y[n])^{2}$

Apply chain rule and substitute definition of y[n]

${\frac {\partial E}{\partial w}}=2(d[n]-y[n]){\frac {\partial }{\partial w}}(d[n]-\sum _{k=0}^{N-1}{\hat {w}}_{k}x[n-k])$

${\frac {\partial E}{\partial w_{i}}}=-2(e[n])(x[n-i])$

Using gradient descent and a step size $\mu$ :

$w[n+1]=w[n]-\mu {\frac {\partial E}{\partial w}}$

which becomes, for i = 0, 1, ..., N-1,

$w_{i}[n+1]=w_{i}[n]+2\mu (e[n])(x[n-i])$

This is the LMS update equation.

Related Research Articles

In quantum mechanics, a Hamiltonian is an operator corresponding to the sum of the kinetic energies plus the potential energies for all the particles in the system. It is usually denoted by $, but also or to highlight its function as an operator. Its spectrum is the set of possible outcomes when one measures the total energy of a system. Because of its close relation to the time-evolution of a system, it is of fundamental importance in most formulations of quantum theory.$

In mathematics, the Dirac delta function is a generalized function or distribution introduced by the physicist Paul Dirac. It is used to model the density of an idealized point mass or point charge as a function equal to zero everywhere except for zero and whose integral over the entire real line is equal to one. As there is no function that has these properties, the computations made by the theoretical physicists appeared to mathematicians as nonsense until the introduction of distributions by Laurent Schwartz to formalize and validate the computations. As a distribution, the Dirac delta function is a linear functional that maps every function to its value at zero. The Kronecker delta function, which is usually defined on a discrete domain and takes values 0 and 1, is a discrete analog of the Dirac delta function.

The Fourier transform (FT) decomposes a function of time into its constituent frequencies. This is similar to the way a musical chord can be expressed in terms of the volumes and frequencies of its constituent notes. The term Fourier transform refers to both the frequency domain representation and the mathematical operation that associates the frequency domain representation to a function of time. The Fourier transform of a function of time is itself a complex-valued function of frequency, whose magnitude (modulus) represents the amount of that frequency present in the original function, and whose argument is the phase offset of the basic sinusoid in that frequency. The Fourier transform is not limited to functions of time, but the domain of the original function is commonly referred to as the time domain. There is also an inverse Fourier transform that mathematically synthesizes the original function from its frequency domain representation.

Fourier series Decomposition of periodic functions into sums of simpler sinusoidal forms

In mathematics, a Fourier series is a periodic function composed of harmonically related sinusoids, combined by a weighted summation. With appropriate weights, one cycle of the summation can be made to approximate an arbitrary function in that interval. As such, the summation is a synthesis of another function. The discrete-time Fourier transform is an example of Fourier series. The process of deriving the weights that describe a given function is a form of Fourier analysis. For functions on unbounded intervals, the analysis and synthesis analogies are Fourier transform and inverse transform.

In mathematics, the Wiener process is a continuous-time stochastic process named in honor of American mathematician Norbert Wiener. It is often called standard Brownian motion process or Brownian motion, due to its historical connection with the physical process of the same name originally observed by Scottish botanist Robert Brown. It is one of the best known Lévy processes and occurs frequently in pure and applied mathematics, economics, quantitative finance, evolutionary biology, and physics.

Heat equation partial differential equation for distribution of heat in a given region over time

In physics and mathematics, the heat equation is a partial differential equation that describes how the distribution of some quantity evolves over time in a solid medium, as it spontaneously flows from places where it is higher towards places where it is lower. It is a special case of the diffusion equation.

In statistics and control theory, Kalman filtering, also known as linear quadratic estimation (LQE), is an algorithm that uses a series of measurements observed over time, containing statistical noise and other inaccuracies, and produces estimates of unknown variables that tend to be more accurate than those based on a single measurement alone, by estimating a joint probability distribution over the variables for each timeframe. The filter is named after Rudolf E. Kálmán, one of the primary developers of its theory.

Additive white Gaussian noise (AWGN) is a basic noise model used in Information theory to mimic the effect of many random processes that occur in nature. The modifiers denote specific characteristics:

In mathematics, the Dawson function or Dawson integral is either

In mathematics, the Radon transform is the integral transform which takes a function f defined on the plane to a function Rf defined on the (two-dimensional) space of lines in the plane, whose value at a particular line is equal to the line integral of the function over that line. The transform was introduced in 1917 by Johann Radon, who also provided a formula for the inverse transform. Radon further included formulas for the transform in three dimensions, in which the integral is taken over planes. It was later generalized to higher-dimensional Euclidean spaces, and more broadly in the context of integral geometry. The complex analog of the Radon transform is known as the Penrose transform. The Radon transform is widely applicable to tomography, the creation of an image from the projection data associated with cross-sectional scans of an object.

Gaussian integral Integral of the Gaussian function, equal to sqrt(π)

The Gaussian integral, also known as the Euler–Poisson integral, is the integral of the Gaussian function e^−x² over the entire real line. It is named after the German mathematician Carl Friedrich Gauss. The integral is:

In the theory of stochastic processes, the Karhunen–Loève theorem, also known as the Kosambi–Karhunen–Loève theorem is a representation of a stochastic process as an infinite linear combination of orthogonal functions, analogous to a Fourier series representation of a function on a bounded interval. The transformation is also known as Hotelling transform and eigenvector transform, and is closely related to principal component analysis (PCA) technique widely used in image processing and in data analysis in many fields.

In signal processing, a matched filter is obtained by correlating a known delayed signal, or template, with an unknown signal to detect the presence of the template in the unknown signal. This is equivalent to convolving the unknown signal with a conjugated time-reversed version of the template. The matched filter is the optimal linear filter for maximizing the signal-to-noise ratio (SNR) in the presence of additive stochastic noise.

Particle filters or Sequential Monte Carlo (SMC) methods are a set of Monte Carlo algorithms used to solve filtering problems arising in signal processing and Bayesian statistical inference. The filtering problem consists of estimating the internal states in dynamical systems when partial observations are made, and random perturbations are present in the sensors as well as in the dynamical system. The objective is to compute the posterior distributions of the states of some Markov process, given some noisy and partial observations. The term "particle filters" was first coined in 1996 by Del Moral in reference to mean field interacting particle methods used in fluid mechanics since the beginning of the 1960s. The terminology "sequential Monte Carlo" was proposed by Liu and Chen in 1998.

Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value affects the distribution of the measured data. An estimator attempts to approximate the unknown parameters using the measurements.

In mathematics, Maass forms or Maass wave forms are studied in the theory of automorphic forms. Maass forms are complex-valued smooth functions of the upper half plane, which transform in a similar way under the operation of a discrete subgroup $of as modular forms. They are Eigenforms of the hyperbolic Laplace Operator defined on and satisfy certain growth conditions at the cusps of a fundamental domain of . In contrast to the modular forms the Maass forms need not be holomorphic. They were studied first by Hans Maass in 1949.$

Filtering in the context of large eddy simulation (LES) is a mathematical operation intended to remove a range of small scales from the solution to the Navier-Stokes equations. Because the principal difficulty in simulating turbulent flows comes from the wide range of length and time scales, this operation makes turbulent flow simulation cheaper by reducing the range of scales that must be resolved. The LES filter operation is low-pass, meaning it filters out the scales associated with high frequencies.

In statistics, a zero-inflated model is a statistical model based on a zero-inflated probability distribution, i.e. a distribution that allows for frequent zero-valued observations.

References

J.G. Proakis and D.G. Manolakis, Digital Signal Processing: Principles, Algorithms, and Applications, Prentice-Hall, 4th ed., 2007.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

Similarities between Wiener and LMS

Contents

Derivation of the Wiener filter for system identification

Derivation of the LMS algorithm

See also

Related Research Articles

References