Filtering problem (stochastic processes)

Last updated December 02, 2024

In the theory of stochastic processes, filtering describes the problem of determining the state of a system from an incomplete and potentially noisy set of observations. While originally motivated by problems in engineering, filtering found applications in many fields from signal processing to finance.

The problem of optimal non-linear filtering (even for the non-stationary case) was solved by Ruslan L. Stratonovich (1959,^[1] 1960^[2]), see also Harold J. Kushner's work ^[3] and Moshe Zakai's, who introduced a simplified dynamics for the unnormalized conditional law of the filter^[4] known as the Zakai equation. The solution, however, is infinite-dimensional in the general case.^[5] Certain approximations and special cases are well understood: for example, the linear filters are optimal for Gaussian random variables, and are known as the Wiener filter and the Kalman-Bucy filter. More generally, as the solution is infinite dimensional, it requires finite dimensional approximations to be implemented in a computer with finite memory. A finite dimensional approximated nonlinear filter may be more based on heuristics, such as the extended Kalman filter or the assumed density filters,^[6] or more methodologically oriented such as for example the projection filters,^[7] some sub-families of which are shown to coincide with the Assumed Density Filters.^[8] Particle filters ^[9] are another option to attack the infinite dimensional filtering problem and are based on sequential Monte Carlo methods.

In general, if the separation principle applies, then filtering also arises as part of the solution of an optimal control problem. For example, the Kalman filter is the estimation part of the optimal control solution to the linear-quadratic-Gaussian control problem.

The mathematical formalism

Consider a probability space (Ω, Σ, P) and suppose that the (random) state Y_t in n-dimensional Euclidean space Rⁿ of a system of interest at time t is a random variable Y_t : Ω → Rⁿ given by the solution to an Itō stochastic differential equation of the form

\mathrm {d} Y_{t}=b(t,Y_{t})\,\mathrm {d} t+\sigma (t,Y_{t})\,\mathrm {d} B_{t},

where B denotes standard p-dimensional Brownian motion, b : [0, +∞) × Rⁿ → Rⁿ is the drift field, and σ : [0, +∞) × Rⁿ → R^n×p is the diffusion field. It is assumed that observations H_t in R^m (note that m and n may, in general, be unequal) are taken for each time t according to

H_{t}=c(t,Y_{t})+\gamma (t,Y_{t})\cdot {\mbox{noise}}.

Adopting the Itō interpretation of the stochastic differential and setting

Z_{t}=\int _{0}^{t}H_{s}\,\mathrm {d} s,

this gives the following stochastic integral representation for the observations Z_t:

\mathrm {d} Z_{t}=c(t,Y_{t})\,\mathrm {d} t+\gamma (t,Y_{t})\,\mathrm {d} W_{t},

where W denotes standard r-dimensional Brownian motion, independent of B and the initial condition Y₀, and c : [0, +∞) × Rⁿ → Rⁿ and γ : [0, +∞) × Rⁿ → R^n×r satisfy

{\big |}c(t,x){\big |}+{\big |}\gamma (t,x){\big |}\leq C{\big (}1+|x|{\big )}

for all t and x and some constant C.

The filtering problem is the following: given observations Z_s for 0 ≤ s ≤ t, what is the best estimate Ŷ_t of the true state Y_t of the system based on those observations?

By "based on those observations" it is meant that Ŷ_t is measurable with respect to the σ-algebra G_t generated by the observations Z_s, 0 ≤ s ≤ t. Denote by K = K(Z, t) the collection of all Rⁿ-valued random variables Y that are square-integrable and G_t-measurable:

K=K(Z,t)=L^{2}(\Omega ,G_{t},\mathbf {P

By "best estimate", it is meant that Ŷ_t minimizes the mean-square distance between Y_t and all candidates in K:

\mathbf {E} \left[{\big |}Y_{t}-{\hat {Y}}_{t}{\big |}^{2}\right]=\inf _{Y\in K}\mathbf {E} \left[{\big |}Y_{t}-Y{\big |}^{2}\right].\qquad {\mbox{(M)}}

Basic result: orthogonal projection

The space K(Z, t) of candidates is a Hilbert space, and the general theory of Hilbert spaces implies that the solution Ŷ_t of the minimization problem (M) is given by

{\hat {Y}}_{t}=P_{K(Z,t)}{\big (}Y_{t}{\big )},

where P_K(Z,t) denotes the orthogonal projection of L²(Ω, Σ, P; Rⁿ) onto the linear subspace K(Z, t) = L²(Ω, G_t, P; Rⁿ). Furthermore, it is a general fact about conditional expectations that if F is any sub-σ-algebra of Σ then the orthogonal projection

P_{K}:L^{2}(\Omega ,\Sigma ,\mathbf {P

is exactly the conditional expectation operator E[·|F], i.e.,

P_{K}(X)=\mathbf {E} {\big [}X{\big |}F{\big ]}.

Hence,

{\hat {Y}}_{t}=P_{K(Z,t)}{\big (}Y_{t}{\big )}=\mathbf {E} {\big [}Y_{t}{\big |}G_{t}{\big ]}.

This elementary result is the basis for the general Fujisaki-Kallianpur-Kunita equation of filtering theory.

More advanced result: nonlinear filtering SPDE

The complete knowledge of the filter at a time t would be given by the probability law of the signal Y_t conditional on the sigma-field G_t generated by observations Z up to time t. If this probability law admits a density, informally

p_{t}(y)\ dy={\bf {P}}(Y_{t}\in dy|G_{t}),

then under some regularity assumptions the density $p_{t}(y)$ satisfies a non-linear stochastic partial differential equation (SPDE) driven by $dZ_{t}$ and called Kushner-Stratonovich equation,^[10] or a unnormalized version $q_{t}(y)$ of the density $p_{t}(y)$ satisfies a linear SPDE called Zakai equation.^[10] These equations can be formulated for the above system, but to simplify the exposition one can assume that the unobserved signal Y and the partially observed noisy signal Z satisfy the equations

\mathrm {d} Y_{t}=b(t,Y_{t})\,\mathrm {d} t+\sigma (t,Y_{t})\,\mathrm {d} B_{t},

\mathrm {d} Z_{t}=c(t,Y_{t})\,\mathrm {d} t+\mathrm {d} W_{t}.

In other terms, the system is simplified by assuming that the observation noise W is not state dependent.

One might keep a deterministic time dependent $\gamma$ in front of $dW$ but we assume this has been taken out by re-scaling.

For this particular system, the Kushner-Stratonovich SPDE for the density $p_{t}$ reads

\mathrm {d} p_{t}={\cal {L}}_{t}^{*}p_{t}\ dt+p_{t}[c(t,\cdot )-E_{p_{t}}(c(t,\cdot ))]^{T}[dZ_{t}-E_{p_{t}}(c(t,\cdot ))dt]

where T denotes transposition, $E_{p}$ denotes the expectation with respect to the density p, $E_{p}[f]=\int f(y)p(y)dy,$ and the forward diffusion operator ${\cal {L}}_{t}^{*}$ is

{\cal {L}}_{t}^{*}f(t,y)=-\sum _{i}{\frac {\partial }{\partial y_{i}}}[b_{i}(t,y)f(t,y)]+{\frac {1}{2}}\sum _{i,j}{\frac {\partial ^{2}}{\partial y_{i}\partial y_{j}}}[a_{ij}(t,y)f(t,y)]

where $a=\sigma \sigma ^{T}$ . If we choose the unnormalized density $q_{t}(y)$ , the Zakai SPDE for the same system reads

\mathrm {d} q_{t}={\cal {L}}_{t}^{*}q_{t}\ dt+q_{t}[c(t,\cdot )]^{T}dZ_{t}.

These SPDEs for p and q are written in Ito calculus form. It is possible to write them in Stratonovich calculus form, which turns out to be helpful when deriving filtering approximations based on differential geometry, as in the projection filters. For example, the Kushner-Stratonovich equation written in Stratonovich calculus reads

dp_{t}={\cal {L}}_{t}^{\ast }\,p_{t}\,dt-{\frac {1}{2}}\,p_{t}\,[\vert c(\cdot ,t)\vert ^{2}-E_{p_{t}}(\vert c(\cdot ,t)\vert ^{2})]\,dt+p_{t}\,[c(\cdot ,t)-E_{p_{t}}(c(\cdot ,t))]^{T}\circ dZ_{t}\ .

From any of the densities p and q one can calculate all statistics of the signal Y_t conditional on the sigma-field generated by observations Z up to time t, so that the densities give complete knowledge of the filter. Under the particular linear-constant assumptions with respect to Y, where the systems coefficients b and c are linear functions of Y and where $\sigma$ and $\gamma$ do not depend on Y, with the initial condition for the signal Y being Gaussian or deterministic, the density $p_{t}(y)$ is Gaussian and it can be characterized by its mean and variance-covariance matrix, whose evolution is described by the Kalman-Bucy filter, which is finite dimensional.^[10] More generally, the evolution of the filter density occurs in an infinite-dimensional function space,^[5] and it has to be approximated via a finite dimensional approximation, as hinted above.

Related Research Articles

In physics, specifically in electromagnetism, the Lorentz force law is the combination of electric and magnetic force on a point charge due to electromagnetic fields. The Lorentz force, on the other hand, is a physical effect that occurs in the vicinity of electrically neutral, current-carrying conductors causing moving electrical charges to experience a magnetic force.

<span class="mw-page-title-main">Navier–Stokes equations</span> Equations describing the motion of viscous fluid substances

The Navier–Stokes equations are partial differential equations which describe the motion of viscous fluid substances. They were named after French engineer and physicist Claude-Louis Navier and the Irish physicist and mathematician George Gabriel Stokes. They were developed over several decades of progressively building the theories, from 1822 (Navier) to 1842–1850 (Stokes).

<span class="mw-page-title-main">Multivariate normal distribution</span> Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables, each of which clusters around a mean value.

In physics, a Langevin equation is a stochastic differential equation describing how a system evolves when subjected to a combination of deterministic and fluctuating ("random") forces. The dependent variables in a Langevin equation typically are collective (macroscopic) variables changing only slowly in comparison to the other (microscopic) variables of the system. The fast (microscopic) variables are responsible for the stochastic nature of the Langevin equation. One application is to Brownian motion, which models the fluctuating motion of a small particle in a fluid.

<span class="mw-page-title-main">Fokker–Planck equation</span> Partial differential equation

In statistical mechanics and information theory, the Fokker–Planck equation is a partial differential equation that describes the time evolution of the probability density function of the velocity of a particle under the influence of drag forces and random forces, as in Brownian motion. The equation can be generalized to other observables as well. The Fokker-Planck equation has multiple applications in information theory, graph theory, data science, finance, economics etc.

In statistics and control theory, Kalman filtering is an algorithm that uses a series of measurements observed over time, including statistical noise and other inaccuracies, to produce estimates of unknown variables that tend to be more accurate than those based on a single measurement, by estimating a joint probability distribution over the variables for each time-step. The filter is constructed as a mean squared error minimiser, but an alternative derivation of the filter is also provided showing how the filter relates to maximum likelihood statistics. The filter is named after Rudolf E. Kálmán.

In fluid dynamics, two types of stream function are defined:

A stochastic differential equation (SDE) is a differential equation in which one or more of the terms is a stochastic process, resulting in a solution which is also a stochastic process. SDEs have many applications throughout pure mathematics and are used to model various behaviours of stochastic models such as stock prices, random growth models or physical systems that are subjected to thermal fluctuations.

In stochastic processes, the Stratonovich integral or Fisk–Stratonovich integral is a stochastic integral, the most common alternative to the Itô integral. Although the Itô integral is the usual choice in applied mathematics, the Stratonovich integral is frequently used in physics.

In geometry, a three-dimensional space is a mathematical space in which three values (coordinates) are required to determine the position of a point. Most commonly, it is the three-dimensional Euclidean space, that is, the Euclidean space of dimension three, which models physical space. More general three-dimensional spaces are called 3-manifolds. The term may also refer colloquially to a subset of space, a three-dimensional region, a solid figure.

In electromagnetism, charge density is the amount of electric charge per unit length, surface area, or volume. Volume charge density is the quantity of charge per unit volume, measured in the SI system in coulombs per cubic meter (C⋅m⁻³), at any point in a volume. Surface charge density (σ) is the quantity of charge per unit area, measured in coulombs per square meter (C⋅m⁻²), at any point on a surface charge distribution on a two dimensional surface. Linear charge density (λ) is the quantity of charge per unit length, measured in coulombs per meter (C⋅m⁻¹), at any point on a line charge distribution. Charge density can be either positive or negative, since electric charge can be either positive or negative.

In control theory, the linear–quadratic–Gaussian (LQG) control problem is one of the most fundamental optimal control problems, and it can also be operated repeatedly for model predictive control. It concerns linear systems driven by additive white Gaussian noise. The problem is to determine an output feedback law that is optimal in the sense of minimizing the expected value of a quadratic cost criterion. Output measurements are assumed to be corrupted by Gaussian noise and the initial state, likewise, is assumed to be a Gaussian random vector.

There are various mathematical descriptions of the electromagnetic field that are used in the study of electromagnetism, one of the four fundamental interactions of nature. In this article, several approaches are discussed, although the equations are in terms of electric and magnetic fields, potentials, and charges with currents, generally speaking.

The ensemble Kalman filter (EnKF) is a recursive filter suitable for problems with a large number of variables, such as discretizations of partial differential equations in geophysical models. The EnKF originated as a version of the Kalman filter for large problems, and it is now an important data assimilation component of ensemble forecasting. EnKF is related to the particle filter but the EnKF makes the assumption that all probability distributions involved are Gaussian; when it is applicable, it is much more efficient than the particle filter.

In mathematics – specifically, in stochastic analysis – an Itô diffusion is a solution to a specific type of stochastic differential equation. That equation is similar to the Langevin equation used in physics to describe the Brownian motion of a particle subjected to a potential in a viscous fluid. Itô diffusions are named after the Japanese mathematician Kiyosi Itô.

The Cauchy momentum equation is a vector partial differential equation put forth by Cauchy that describes the non-relativistic momentum transport in any continuum.

In filtering theory the Kushner equation is an equation for the conditional probability density of the state of a stochastic non-linear dynamical system, given noisy measurements of the state. It therefore provides the solution of the nonlinear filtering problem in estimation theory. The equation is sometimes referred to as the Stratonovich–Kushnerequation. However, the correct equation in terms of Itō calculus was first derived by Kushner although a more heuristic Stratonovich version of it appeared already in Stratonovich's works in late fifties. However, the derivation in terms of Itō calculus is due to Richard Bucy.

In stochastic analysis, a rough path is a generalization of the notion of smooth path allowing to construct a robust solution theory for controlled differential equations driven by classically irregular signals, for example a Wiener process. The theory was developed in the 1990s by Terry Lyons. Several accounts of the theory are available.

Quantum stochastic calculus is a generalization of stochastic calculus to noncommuting variables. The tools provided by quantum stochastic calculus are of great use for modeling the random evolution of systems undergoing measurement, as in quantum trajectories. Just as the Lindblad master equation provides a quantum generalization to the Fokker–Planck equation, quantum stochastic calculus allows for the derivation of quantum stochastic differential equations (QSDE) that are analogous to classical Langevin equations.

Projection filters are a set of algorithms based on stochastic analysis and information geometry, or the differential geometric approach to statistics, used to find approximate solutions for filtering problems for nonlinear state-space systems. The filtering problem consists of estimating the unobserved signal of a random dynamical system from partial noisy observations of the signal. The objective is computing the probability distribution of the signal conditional on the history of the noise-perturbed observations. This distribution allows for calculations of all statistics of the signal given the history of observations. If this distribution has a density, the density satisfies specific stochastic partial differential equations (SPDEs) called Kushner-Stratonovich equation, or Zakai equation. It is known that the nonlinear filter density evolves in an infinite dimensional function space.

References

↑ Stratonovich, R. L. (1959). Optimum nonlinear systems which bring about a separation of a signal with constant parameters from noise. Radiofizika, 2:6, pp. 892-901.
↑ Stratonovich, R.L. (1960). Application of the Markov processes theory to optimal filtering. Radio Engineering and Electronic Physics, 5:11, pp.1-19.
↑ Kushner, Harold. (1967). Nonlinear filtering: The exact dynamical equations satisfied by the conditional mode. Automatic Control, IEEE Transactions on Volume 12, Issue 3, Jun 1967 Page(s): 262 - 267
↑ Zakai, Moshe (1969), On the optimal filtering of diffusion processes. Zeit. Wahrsch. 11 230–243. MR 242552, Zbl 0164.19201, doi : 10.1007/BF00536382
1 2 Mireille Chaleyat-Maurel and Dominique Michel. Des resultats de non existence de filtre de dimension finie. Stochastics, 13(1+2):83-102, 1984.
↑ Maybeck, Peter S., Stochastic models, estimation, and control, Volume 141, Series Mathematics in Science and Engineering, 1979, Academic Press
↑ Damiano Brigo, Bernard Hanzon and François LeGland, A Differential Geometric approach to nonlinear filtering: the Projection Filter, I.E.E.E. Transactions on Automatic Control Vol. 43, 2 (1998), pp 247--252.
↑ Damiano Brigo, Bernard Hanzon and François Le Gland, Approximate Nonlinear Filtering by Projection on Exponential Manifolds of Densities, Bernoulli, Vol. 5, N. 3 (1999), pp. 495--534
↑ Del Moral, Pierre (1998). "Measure Valued Processes and Interacting Particle Systems. Application to Non Linear Filtering Problems". Annals of Applied Probability. 8 (2) (Publications du Laboratoire de Statistique et Probabilités, 96-15 (1996) ed.): 438–495. doi: 10.1214/aoap/1028903535 .
1 2 3 Bain, A., and Crisan, D. (2009). Fundamentals of Stochastic Filtering. Springer-Verlag, New York, https://doi.org/10.1007/978-0-387-76896-0

Filtering problem (stochastic processes)

Contents

The mathematical formalism

Basic result: orthogonal projection

More advanced result: nonlinear filtering SPDE

See also

Related Research Articles

References

Further reading