Volterra series

Last updated

The Volterra series is a model for non-linear behavior similar to the Taylor series. It differs from the Taylor series in its ability to capture "memory" effects. The Taylor series can be used for approximating the response of a nonlinear system to a given input if the output of the system depends strictly on the input at that particular time. In the Volterra series, the output of the nonlinear system depends on the input to the system at all other times. This provides the ability to capture the "memory" effect of devices like capacitors and inductors.

Contents

It has been applied in the fields of medicine (biomedical engineering) and biology, especially neuroscience.[ citation needed ] It is also used in electrical engineering to model intermodulation distortion in many devices, including power amplifiers and frequency mixers.[ citation needed ] Its main advantage lies in its generalizability: it can represent a wide range of systems. Thus, it is sometimes considered a non-parametric model.

In mathematics, a Volterra series denotes a functional expansion of a dynamic, nonlinear [ disambiguation needed ], time-invariant functional. The Volterra series are frequently used in system identification. The Volterra series, which is used to prove the Volterra theorem, is an infinite sum of multidimensional convolutional integrals.

History

The Volterra series is a modernized version of the theory of analytic functionals from the Italian mathematician Vito Volterra, in his work dating from 1887. [1] [2] Norbert Wiener became interested in this theory in the 1920s due to his contact with Volterra's student Paul Lévy. Wiener applied his theory of Brownian motion for the integration of Volterra analytic functionals. The use of the Volterra series for system analysis originated from a restricted 1942 wartime report [3] of Wiener's, who was then a professor of mathematics at MIT. He used the series to make an approximate analysis of the effect of radar noise in a nonlinear receiver circuit. The report became public after the war. [4] As a general method of analysis of nonlinear systems, the Volterra series came into use after about 1957 as the result of a series of reports, at first privately circulated, from MIT and elsewhere. [5] The name itself, Volterra series, came into use a few years later.

Mathematical theory

The theory of the Volterra series can be viewed from two different perspectives:

The latter functional mapping perspective is more frequently used due to the assumed time-invariance of the system.

Continuous time

A continuous time-invariant system with x(t) as input and y(t) as output can be expanded in the Volterra series as

Here the constant term on the right side is usually taken to be zero by suitable choice of output level . The function is called the n-th-order Volterra kernel . It can be regarded as a higher-order impulse response of the system. For the representation to be unique, the kernels must be symmetrical in the n variables . If it is not symmetrical, it can be replaced by a symmetrized kernel, which is the average over the n! permutations of these n variables .

If N is finite, the series is said to be truncated. If a, b, and N are finite, the series is called doubly finite.

Sometimes the n-th-order term is divided by n!, a convention which is convenient when taking the output of one Volterra system as the input of another ("cascading").

The causality condition: Since in any physically realizable system the output can only depend on previous values of the input, the kernels will be zero if any of the variables are negative. The integrals may then be written over the half range from zero to infinity. So if the operator is causal, .

Fréchet's approximation theorem: The use of the Volterra series to represent a time-invariant functional relation is often justified by appealing to a theorem due to Fréchet. This theorem states that a time-invariant functional relation (satisfying certain very general conditions) can be approximated uniformly and to an arbitrary degree of precision by a sufficiently high finite-order Volterra series. Among other conditions, the set of admissible input functions for which the approximation will hold is required to be compact. It is usually taken to be an equicontinuous, uniformly bounded set of functions, which is compact by the Arzelà–Ascoli theorem. In many physical situations, this assumption about the input set is a reasonable one. The theorem, however, gives no indication as to how many terms are needed for a good approximation, which is an essential question in applications.

Discrete time

The discrete-time case is similar to the continuous-time case, except that the integrals are replaced by summations:

where Each function is called a discrete-time Volterra kernels. If P is finite, the series operator is said to be truncated. If a, b and P are finite, the series operator is called a doubly finite Volterra series. If , the operator is said to be causal.

We can always consider, without loss of the generality, the kernel as symmetrical. In fact, for the commutativity of the multiplication it is always possible to symmetrize it by forming a new kernel taken as the average of the kernels for all permutations of the variables .

For a causal system with symmetrical kernels we can rewrite the n-th term approximately in triangular form

Methods to estimate the kernel coefficients

Estimating the Volterra coefficients individually is complicated, since the basis functionals of the Volterra series are correlated. This leads to the problem of simultaneously solving a set of integral equations for the coefficients. Hence, estimation of Volterra coefficients is generally performed by estimating the coefficients of an orthogonalized series, e.g. the Wiener series, and then recomputing the coefficients of the original Volterra series. The Volterra series main appeal over the orthogonalized series lies in its intuitive, canonical structure, i.e. all interactions of the input have one fixed degree. The orthogonalized basis functionals will generally be quite complicated.

An important aspect, with respect to which the following methods differ, is whether the orthogonalization of the basis functionals is to be performed over the idealized specification of the input signal (e.g. gaussian, white noise) or over the actual realization of the input (i.e. the pseudo-random, bounded, almost-white version of gaussian white noise, or any other stimulus). The latter methods, despite their lack of mathematical elegance, have been shown to be more flexible (as arbitrary inputs can be easily accommodated) and precise (due to the effect that the idealized version of the input signal is not always realizable).

Crosscorrelation method

This method, developed by Lee and Schetzen, orthogonalizes with respect to the actual mathematical description of the signal, i.e. the projection onto the new basis functionals is based on the knowledge of the moments of the random signal.

We can write the Volterra series in terms of homogeneous operators, as

where

To allow identification orthogonalization, Volterra series must be rearranged in terms of orthogonal non-homogeneous G operators (Wiener series):

The G operators can be defined by the following:

whenever is arbitrary homogeneous Volterra, x(n) is some stationary white noise (SWN) with zero mean and variance A.

Recalling that every Volterra functional is orthogonal to all Wiener functional of greater order, and considering the following Volterra functional:

we can write

If x is SWN, and by letting , we have

So if we exclude the diagonal elements, , it is

If we want to consider the diagonal elements, the solution proposed by Lee and Schetzen is

The main drawback of this technique is that the estimation errors, made on all elements of lower-order kernels, will affect each diagonal element of order p by means of the summation , conceived as the solution for the estimation of the diagonal elements themselves. Efficient formulas to avoid this drawback and references for diagonal kernel element estimation exist [6] [7]

Once the Wiener kernels were identified, Volterra kernels can be obtained by using Wiener-to-Volterra formulas, in the following reported for a fifth-order Volterra series:

Multiple-variance method

In the traditional orthogonal algorithm, using inputs with high has the advantage of stimulating high-order nonlinearity, so as to achieve more accurate high-order kernel identification. As a drawback, the use of high values causes high identification error in lower-order kernels, [8] mainly due to nonideality of the input and truncation errors.

On the contrary, the use of lower in the identification process can lead to a better estimation of lower-order kernel, but can be insufficient to stimulate high-order nonlinearity.

This phenomenon, which can be called locality of truncated Volterra series, can be revealed by calculating the output error of a series as a function of different variances of input. This test can be repeated with series identified with different input variances, obtaining different curves, each with a minimum in correspondence of the variance used in the identification.

To overcome this limitation, a low value should be used for the lower-order kernel and gradually increased for higher-order kernels. This is not a theoretical problem in Wiener kernel identification, since the Wiener functional are orthogonal to each other, but an appropriate normalization is needed in Wiener-to-Volterra conversion formulas for taking into account the use of different variances. Furthermore, new Wiener to Volterra conversion formulas are needed.

The traditional Wiener kernel identification should be changed as follows: [8]

In the above formulas the impulse functions are introduced for the identification of diagonal kernel points. If the Wiener kernels are extracted with the new formulas, the following Wiener-to-Volterra formulas (explicited up the fifth order) are needed:

As can be seen, the drawback with respect to the previous formula [7] is that for the identification of the n-th-order kernel, all lower kernels must be identified again with the higher variance. However, an outstanding improvement in the output MSE will be obtained if the Wiener and Volterra kernels are obtained with the new formulas. [8]

Feedforward network

This method was developed by Wray and Green (1994) and utilizes the fact that a simple 2-fully connected layer neural network (i.e., a multilayer perceptron) is computationally equivalent to the Volterra series and therefore contains the kernels hidden in its architecture. After such a network has been trained to successfully predict the output based on the current state and memory of the system, the kernels can then be computed from the weights and biases of that network.

The general notation for the n-th-order Volterra kernel is given by

where is the order, the weights to the linear output node, the coefficients of the polynomial expansion of the output function of the hidden nodes, and are the weights from the input layer to the non-linear hidden layer. It is important to note that this method allows kernel extraction up until the number of input delays in the architecture of the network. Furthermore, it is vital to carefully construct the size of the network input layer so that it represents the effective memory of the system.

Exact orthogonal algorithm

This method and its more efficient version (fast orthogonal algorithm) were invented by Korenberg. [9] In this method the orthogonalization is performed empirically over the actual input. It has been shown to perform more precisely than the crosscorrelation method. Another advantage is that arbitrary inputs can be used for the orthogonalization and that fewer data points suffice to reach a desired level of accuracy. Also, estimation can be performed incrementally until some criterion is fulfilled.

Linear regression

Linear regression is a standard tool from linear analysis. Hence, one of its main advantages is the widespread existence of standard tools for solving linear regressions efficiently. It has some educational value, since it highlights the basic property of Volterra series: linear combination of non-linear basis-functionals. For estimation, the order of the original should be known, since the Volterra basis functionals are not orthogonal, and thus estimation cannot be performed incrementally.

Kernel method

This method was invented by Franz and Schölkopf [10] and is based on statistical learning theory. Consequently, this approach is also based on minimizing the empirical error (often called empirical risk minimization). Franz and Schölkopf proposed that the kernel method could essentially replace the Volterra series representation, although noting that the latter is more intuitive. [11]

Differential sampling

This method was developed by van Hemmen and coworkers [12] and utilizes Dirac delta functions to sample the Volterra coefficients.

See also

Related Research Articles

<span class="mw-page-title-main">Autocorrelation</span> Correlation of a signal with a time-shifted copy of itself, as a function of shift

Autocorrelation, sometimes known as serial correlation in the discrete time case, is the correlation of a signal with a delayed copy of itself as a function of delay. Informally, it is the similarity between observations of a random variable as a function of the time lag between them. The analysis of autocorrelation is a mathematical tool for finding repeating patterns, such as the presence of a periodic signal obscured by noise, or identifying the missing fundamental frequency in a signal implied by its harmonic frequencies. It is often used in signal processing for analyzing functions or series of values, such as time domain signals.

In mathematics, more specifically in functional analysis, a Banach space is a complete normed vector space. Thus, a Banach space is a vector space with a metric that allows the computation of vector length and distance between vectors and is complete in the sense that a Cauchy sequence of vectors always converges to a well-defined limit that is within the space.

<span class="mw-page-title-main">Convolution</span> Integral expressing the amount of overlap of one function as it is shifted over another

In mathematics, convolution is a mathematical operation on two functions that produces a third function. The term convolution refers to both the result function and to the process of computing it. It is defined as the integral of the product of the two functions after one is reflected about the y-axis and shifted. The integral is evaluated for all values of shift, producing the convolution function. The choice of which function is reflected and shifted before the integral does not change the integral result. Graphically, it expresses how the 'shape' of one function is modified by the other.

In mathematics, integral equations are equations in which an unknown function appears under an integral sign. In mathematical notation, integral equations may thus be expressed as being of the form:

In mathematics and signal processing, the Hilbert transform is a specific singular integral that takes a function, u(t) of a real variable and produces another function of a real variable H(u)(t). The Hilbert transform is given by the Cauchy principal value of the convolution with the function (see § Definition). The Hilbert transform has a particularly simple representation in the frequency domain: It imparts a phase shift of ±90° (π/2 radians) to every frequency component of a function, the sign of the shift depending on the sign of the frequency (see § Relationship with the Fourier transform). The Hilbert transform is important in signal processing, where it is a component of the analytic representation of a real-valued signal u(t). The Hilbert transform was first introduced by David Hilbert in this setting, to solve a special case of the Riemann–Hilbert problem for analytic functions.

<span class="mw-page-title-main">Reproducing kernel Hilbert space</span> In functional analysis, a Hilbert space

In functional analysis, a reproducing kernel Hilbert space (RKHS) is a Hilbert space of functions in which point evaluation is a continuous linear functional. Roughly speaking, this means that if two functions and in the RKHS are close in norm, i.e., is small, then and are also pointwise close, i.e., is small for all . The converse does not need to be true. Informally, this can be shown by looking at the supremum norm: the sequence of functions converges pointwise, but does not converge uniformly i.e. does not converge with respect to the supremum norm.

<span class="mw-page-title-main">Cross-correlation</span> Covariance and correlation

In signal processing, cross-correlation is a measure of similarity of two series as a function of the displacement of one relative to the other. This is also known as a sliding dot product or sliding inner-product. It is commonly used for searching a long signal for a shorter, known feature. It has applications in pattern recognition, single particle analysis, electron tomography, averaging, cryptanalysis, and neurophysiology. The cross-correlation is similar in nature to the convolution of two functions. In an autocorrelation, which is the cross-correlation of a signal with itself, there will always be a peak at a lag of zero, and its size will be the signal energy.

In systems theory, a linear system is a mathematical model of a system based on the use of a linear operator. Linear systems typically exhibit features and properties that are much simpler than the nonlinear case. As a mathematical abstraction or idealization, linear systems find important applications in automatic control theory, signal processing, and telecommunications. For example, the propagation medium for wireless communication systems can often be modeled by linear systems.

<span class="mw-page-title-main">Large eddy simulation</span> Mathematical model for turbulence

Large eddy simulation (LES) is a mathematical model for turbulence used in computational fluid dynamics. It was initially proposed in 1963 by Joseph Smagorinsky to simulate atmospheric air currents, and first explored by Deardorff (1970). LES is currently applied in a wide variety of engineering applications, including combustion, acoustics, and simulations of the atmospheric boundary layer.

<span class="mw-page-title-main">Linear time-invariant system</span> Mathematical model which is both linear and time-invariant

In system analysis, among other fields of study, a linear time-invariant (LTI) system is a system that produces an output signal from any input signal subject to the constraints of linearity and time-invariance; these terms are briefly defined below. These properties apply (exactly or approximately) to many important physical systems, in which case the response y(t) of the system to an arbitrary input x(t) can be found directly using convolution: y(t) = (xh)(t) where h(t) is called the system's impulse response and ∗ represents convolution (not to be confused with multiplication). What's more, there are systematic methods for solving any such system (determining h(t)), whereas systems not meeting both properties are generally more difficult (or impossible) to solve analytically. A good example of an LTI system is any electrical circuit consisting of resistors, capacitors, inductors and linear amplifiers.

<span class="mw-page-title-main">Biological neuron model</span> Mathematical descriptions of the properties of certain cells in the nervous system

Biological neuron models, also known as spiking neuron models, are mathematical descriptions of the conduction of electrical signals in neurons. Neurons are electrically excitable cells within the nervous system, able to fire electric signals, called action potentials, across a neural network.These mathematical models describe the role of the biophysical and geometrical characteristics of neurons on the conduction of electrical activity.

In mathematics, the Wiener series, or Wiener G-functional expansion, originates from the 1958 book of Norbert Wiener. It is an orthogonal expansion for nonlinear functionals closely related to the Volterra series and having the same relation to it as an orthogonal Hermite polynomial expansion has to a power series. For this reason it is also known as the Wiener–Hermite expansion. The analogue of the coefficients are referred to as Wiener kernels. The terms of the series are orthogonal (uncorrelated) with respect to a statistical input of white noise. This property allows the terms to be identified in applications by the Lee–Schetzen method.

V-statistics are a class of statistics named for Richard von Mises who developed their asymptotic distribution theory in a fundamental paper in 1947. V-statistics are closely related to U-statistics introduced by Wassily Hoeffding in 1948. A V-statistic is a statistical function defined by a particular statistical functional of a probability distribution.

In mathematics, the oscillator representation is a projective unitary representation of the symplectic group, first investigated by Irving Segal, David Shale, and André Weil. A natural extension of the representation leads to a semigroup of contraction operators, introduced as the oscillator semigroup by Roger Howe in 1988. The semigroup had previously been studied by other mathematicians and physicists, most notably Felix Berezin in the 1960s. The simplest example in one dimension is given by SU(1,1). It acts as Möbius transformations on the extended complex plane, leaving the unit circle invariant. In that case the oscillator representation is a unitary representation of a double cover of SU(1,1) and the oscillator semigroup corresponds to a representation by contraction operators of the semigroup in SL(2,C) corresponding to Möbius transformations that take the unit disk into itself.

System identification is a method of identifying or measuring the mathematical model of a system from measurements of the system inputs and outputs. The applications of system identification include any system where the inputs and outputs can be measured and include industrial processes, control systems, economic data, biology and the life sciences, medicine, social systems and many more.

In physics and mathematics, the spacetime triangle diagram (STTD) technique, also known as the Smirnov method of incomplete separation of variables, is the direct space-time domain method for electromagnetic and scalar wave motion.

In signal processing, nonlinear multidimensional signal processing (NMSP) covers all signal processing using nonlinear multidimensional signals and systems. Nonlinear multidimensional signal processing is a subset of signal processing (multidimensional signal processing). Nonlinear multi-dimensional systems can be used in a broad range such as imaging, teletraffic, communications, hydrology, geology, and economics. Nonlinear systems cannot be treated as linear systems, using Fourier transformation and wavelet analysis. Nonlinear systems will have chaotic behavior, limit cycle, steady state, bifurcation, multi-stability and so on. Nonlinear systems do not have a canonical representation, like impulse response for linear systems. But there are some efforts to characterize nonlinear systems, such as Volterra and Wiener series using polynomial integrals as the use of those methods naturally extend the signal into multi-dimensions. Another example is the Empirical mode decomposition method using Hilbert transform instead of Fourier Transform for nonlinear multi-dimensional systems. This method is an empirical method and can be directly applied to data sets. Multi-dimensional nonlinear filters (MDNF) are also an important part of NMSP, MDNF are mainly used to filter noise in real data. There are nonlinear-type hybrid filters used in color image processing, nonlinear edge-preserving filters use in magnetic resonance image restoration. Those filters use both temporal and spatial information and combine the maximum likelihood estimate with the spatial smoothing algorithm.

A functional differential equation is a differential equation with deviating argument. That is, a functional differential equation is an equation that contains a function and some of its derivatives evaluated at different argument values.

Tau functions are an important ingredient in the modern mathematical theory of integrable systems, and have numerous applications in a variety of other domains. They were originally introduced by Ryogo Hirota in his direct method approach to soliton equations, based on expressing them in an equivalent bilinear form.

<span class="mw-page-title-main">Chambolle-Pock algorithm</span> Primal-Dual algorithm optimization for convex problems

In mathematics, the Chambolle-Pock algorithm is an algorithm used to solve convex optimization problems. It was introduced by Antonin Chambolle and Thomas Pock in 2011 and has since become a widely used method in various fields, including image processing, computer vision, and signal processing.

References

  1. Volterra, Vito (1887). Sopra le funzioni che dipendono da altre funzioni. Vol. III. Italy: R. Accademia dei Lincei. pp. 97–105.
  2. Vito Volterra. Theory of Functionals and of Integrals and Integro-Differential Equations. Madrid 1927 (Spanish), translated version reprinted New York: Dover Publications, 1959.
  3. Wiener N: Response of a nonlinear device to noise. Radiation Lab MIT 1942, restricted. report V-16, no 129 (112 pp). Declassified Jul 1946, Published as rep. no. PB-1-58087, U.S. Dept. Commerce. URL: http://www.dtic.mil/dtic/tr/fulltext/u2/a800212.pdf
  4. Ikehara S: A method of Wiener in a nonlinear circuit. MIT Dec 10 1951, tech. rep. no 217, Res. Lab. Electron.
  5. Early MIT reports by Brilliant, Zames, George, Hause, Chesler can be found on dspace.mit.edu.
  6. M. Pirani; S. Orcioni; C. Turchetti (Sep 2004). "Diagonal kernel point estimation of n-th order discrete Volterra-Wiener systems". EURASIP Journal on Applied Signal Processing. 2004 (12): 1807–1816.
  7. 1 2 S. Orcioni; M. Pirani; C. Turchetti (2005). "Advances in Lee–Schetzen method for Volterra filter identification". Multidimensional Systems and Signal Processing. 16 (3): 265–284. doi:10.1007/s11045-004-1677-7. S2CID   57663554.
  8. 1 2 3 Orcioni, Simone (2014). "Improving the approximation ability of Volterra series identified with a cross-correlation method". Nonlinear Dynamics. 78 (4): 2861–2869. doi: 10.1007/s11071-014-1631-7 .
  9. Korenberg, M. J.; Bruder, S. B.; McIlroy, P. J. (1988). "Exact orthogonal kernel estimation from finite data records: extending Wiener's identification of nonlinear systems". Ann. Biomed. Eng. 16 (2): 201–214. doi:10.1007/BF02364581. PMID   3382067. S2CID   31320729.
  10. Franz, Matthias O.; Bernhard Schölkopf (2006). "A unifying view of Wiener and Volterra theory and polynomial kernel regression". Neural Computation. 18 (12): 3097–3118. doi:10.1162/neco.2006.18.12.3097. PMID   17052160. S2CID   9268156.
  11. Siamack Ghadimi (2019-09-12), Determination of Volterra kernels for nonlinear RF amplifiers, Microwaves&RF
  12. J. L. van Hemmen; W. M. Kistler; E. G. F. Thomas (2000). "Calculation of Volterra Kernels for Solutions of Nonlinear Differential Equations". SIAM Journal on Applied Mathematics. 61 (1): 1–21. doi:10.1137/S0036139999336037. hdl: 11370/eda737ae-40d1-4ff3-93d7-6b2434d23d52 .

Further reading