Switching Kalman filter

Last updated December 11, 2023

The switching Kalman filtering (SKF) method is a variant of the Kalman filter. In its generalised form, it is often attributed to Kevin P. Murphy,^[1]^[2]^[3]^[4] but related switching state-space models have been in use.

Applications

Applications of the switching Kalman filter include: Brain–computer interfaces and neural decoding, real-time decoding for continuous neural-prosthetic control,^[5] and sensorimotor learning in humans.^[6] It also has application in econometrics,^[7] signal processing, tracking,^[8] computer vision, etc. It is an alternative to the Kalman filter when the system's state has a discrete component. The additional error when using a Kalman filter instead of a Switching Kalman filter may be quantified in terms of the switching system's parameters.^[9] For example, when an industrial plant has "multiple discrete modes of behaviour, each of which having a linear (Gaussian) dynamics".^[10]

Model

There are several variants of SKF discussed in.^[1]

Special case

In the simpler case, switching state-space models are defined based on a switching variable which evolves independent of the hidden variable. The probabilistic model of such variant of SKF is as the following:^[10]

[This section is badly written: It does not explain the notation used below.]

{\begin{aligned}&\Pr(\{S_{t},X_{t}^{(1)},\ldots ,X_{t}^{(M)},Y_{t}\})\\={}&\Pr(S_{1})\prod _{t=2}^{T}\Pr(S_{t}\mid S_{t-1})\times \prod _{m=1}^{M}\Pr(X_{1}^{(m)})\prod _{t=2}^{T}\Pr(X_{t}^{(m)}\mid X_{t-1}^{(m)})\times \prod _{t=1}^{T}\Pr(Y_{t}\mid X_{t}^{(1)},\ldots ,X_{t}^{(M)},S_{t}).\end{aligned}}

The hidden variables include not only the continuous $X$ , but also a discrete *switch* (or switching) variable $S_{t}$ . The dynamics of the switch variable are defined by the term $\Pr(S_{t}\mid S_{t-1})$ . The probability model of $X$ and $Y$ can depend on $S_{t}$ .

The switch variable can take its values from a set $S_{t}\in \{1,2,\ldots ,M\}$ . This changes the joint distribution $(X_{t},Y_{t})$ which is a separate multivariate Gaussian distribution in case of each value of $S_{t}$ .

General case

In more generalised variants,^[1] the switch variable affects the dynamics of $X_{t}$ , e.g. through $\Pr(X_{t}\mid X_{t-1},S_{t})$ .^[8]^[7] The filtering and smoothing procedure for general cases is discussed in.^[1]

Related Research Articles

In statistics, naive Bayes classifiers are a family of linear "probabilistic classifiers" based on applying Bayes' theorem with strong (naive) independence assumptions between the features. They are among the simplest Bayesian network models, but coupled with kernel density estimation, they can achieve high accuracy levels.

A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobservable ("hidden") states. As part of the definition, HMM requires that there be an observable process $whose outcomes are "influenced" by the outcomes of in a known way. Since cannot be observed directly, the goal is to learn about by observing HMM has an additional requirement that the outcome of at time must be "influenced" exclusively by the outcome of at and that the outcomes of and at must be conditionally independent of at given at time$

For statistics and control theory, Kalman filtering, also known as linear quadratic estimation (LQE), is an algorithm that uses a series of measurements observed over time, including statistical noise and other inaccuracies, and produces estimates of unknown variables that tend to be more accurate than those based on a single measurement alone, by estimating a joint probability distribution over the variables for each timeframe. The filter is named after Rudolf E. Kálmán, who was one of the primary developers of its theory.

A Bayesian network is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). While it is one of several forms of causal notation, causal networks are special cases of Bayesian networks. Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor. For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.

In statistics, the logistic model is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear combination of one or more independent variables. In regression analysis, logistic regression is estimating the parameters of a logistic model. Formally, in binary logistic regression there is a single binary dependent variable, coded by an indicator variable, where the two values are labeled "0" and "1", while the independent variables can each be a binary variable or a continuous variable. The corresponding probability of the value labeled "1" can vary between 0 and 1, hence the labeling; the function that converts log-odds to probability is the logistic function, hence the name. The unit of measurement for the log-odds scale is called a logit, from logistic unit, hence the alternative names. See § Background and § Definition for formal mathematics, and § Example for a worked example.

In control engineering and system identification, a state-space representation is a mathematical model of a physical system specified as a set of input, output and variables related by first-order differential equations or difference equations. Such variables, called state variables, evolve over time in a way that depends on the values they have at any given instant and on the externally imposed values of input variables. Output variables’ values depend on the values of the state variables and may also depend on the values of the input variables.

Belief propagation, also known as sum–product message passing, is a message-passing algorithm for performing inference on graphical models, such as Bayesian networks and Markov random fields. It calculates the marginal distribution for each unobserved node, conditional on any observed nodes. Belief propagation is commonly used in artificial intelligence and information theory, and has demonstrated empirical success in numerous applications, including low-density parity-check codes, turbo codes, free energy approximation, and satisfiability.

Observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs.

In probability theory, statistics, and machine learning, recursive Bayesian estimation, also known as a Bayes filter, is a general probabilistic approach for estimating an unknown probability density function (PDF) recursively over time using incoming measurements and a mathematical process model. The process relies heavily upon mathematical concepts and models that are theorized within a study of prior and posterior probabilities known as Bayesian statistics.

In mathematics, the cylinder sets form a basis of the product topology on a product of sets; they are also a generating family of the cylinder σ-algebra.

In statistics, multinomial logistic regression is a classification method that generalizes logistic regression to multiclass problems, i.e. with more than two possible discrete outcomes. That is, it is a model that is used to predict the probabilities of the different possible outcomes of a categorically distributed dependent variable, given a set of independent variables.

In statistics, binomial regression is a regression analysis technique in which the response has a binomial distribution: it is the number of successes in a series of $independent Bernoulli trials, where each trial has probability of success . In binomial regression, the probability of a success is related to explanatory variables: the corresponding concept in ordinary regression is to relate the mean value of the unobserved response to explanatory variables.$

In statistical theory, a pseudolikelihood is an approximation to the joint probability distribution of a collection of random variables. The practical use of this is that it can provide an approximation to the likelihood function of a set of observed data which may either provide a computationally simpler problem for estimation, or may provide a way of obtaining explicit estimates of model parameters.

Averaged one-dependence estimators (AODE) is a probabilistic classification learning technique. It was developed to address the attribute-independence problem of the popular naive Bayes classifier. It frequently develops substantially more accurate classifiers than naive Bayes at the cost of a modest increase in the amount of computation.

The invariant extended Kalman filter (IEKF) (not to be confused with the iterated extended Kalman filter) was first introduced as a version of the extended Kalman filter (EKF) for nonlinear systems possessing symmetries (or invariances), then generalized and recast as an adaptation to Lie groups of the linear Kalman filtering theory. Instead of using a linear correction term based on a linear output error, the IEKF uses a geometrically adapted correction term based on an invariant output error; in the same way the gain matrix is not updated from a linear state error, but from an invariant state error. The main benefit is that the gain and covariance equations have reduced dependence on the estimated value of the state. In some cases they converge to constant values on a much bigger set of trajectories than is the case for the EKF, which results in a better convergence of the estimation.

Generalized filtering is a generic Bayesian filtering scheme for nonlinear state-space models. It is based on a variational principle of least action, formulated in generalized coordinates of motion. Note that "generalized coordinates of motion" are related to—but distinct from—generalized coordinates as used in (multibody) dynamical systems analysis. Generalized filtering furnishes posterior densities over hidden states generating observed data using a generalized gradient descent on variational free energy, under the Laplace assumption. Unlike classical filtering, generalized filtering eschews Markovian assumptions about random fluctuations. Furthermore, it operates online, assimilating data to approximate the posterior density over unknown quantities, without the need for a backward pass. Special cases include variational filtering, dynamic expectation maximization and generalized predictive coding.

System identification is a method of identifying or measuring the mathematical model of a system from measurements of the system inputs and outputs. The applications of system identification include any system where the inputs and outputs can be measured and include industrial processes, control systems, economic data, biology and the life sciences, medicine, social systems and many more.

Bayesian programming is a formalism and a methodology for having a technique to specify probabilistic models and solve problems when less than the necessary information is available.

In statistics, the class of vector generalized linear models (VGLMs) was proposed to enlarge the scope of models catered for by generalized linear models (GLMs). In particular, VGLMs allow for response variables outside the classical exponential family and for more than one parameter. Each parameter can be transformed by a link function. The VGLM framework is also large enough to naturally accommodate multiple responses; these are several independent responses each coming from a particular statistical distribution with possibly different parameter values.

In probability theory and statistics, the Conway–Maxwell–binomial (CMB) distribution is a three parameter discrete probability distribution that generalises the binomial distribution in an analogous manner to the way that the Conway–Maxwell–Poisson distribution generalises the Poisson distribution. The CMB distribution can be used to model both positive and negative association among the Bernoulli summands,.

References

1 2 3 4 K. P. Murphy, "Switching Kalman Filters", Compaq Cambridge Research Lab Tech. Report 98-10, 1998
↑ K. Murphy. Switching Kalman filters. Technical report, U. C. Berkeley, 1998.
↑ K. Murphy. Dynamic Bayesian Networks: Representation, Inference and Learning. PhD thesis, University of California, Berkeley, Computer Science Division, 2002.
↑ Kalman Filtering and Neural Networks. Edited by Simon Haykin. ISBN 0-471-22154-6
↑ Wu, Wei, Michael J. Black, David Bryant Mumford, Yun Gao, Elie Bienenstock, and John P. Donoghue. 2004. Modelling and decoding motor cortical activity using a switching Kalman filter. IEEE Transactions on Biomedical Engineering 51(6): 933-942. doi : 10.1109/TBME.2004.826666
↑ Heald JB, Ingram JN, Flanagan JR, Wolpert DM. Multiple motor memories are learned to control different points on a tool. Nature Human Behaviour. 2, 300–311, (2018).
1 2 Kim, C.-J. (1994). Dynamic linear models with Markov-switching. J. Econometrics, 60:1–22.
1 2 Bar-Shalom, Y. and Li, X.-R. (1993). Estimation and Tracking. Artech House, Boston, MA.
↑ Karimi, Parisa (2021). "Quantification of mismatch error in randomly switching linear state-space models". IEEE Signal Processing Letters. 28: 2008–2012. arXiv: 2012.04542 . Bibcode:2021ISPL...28.2008K. doi:10.1109/LSP.2021.3116504. S2CID 227745283.
1 2 Zoubin Ghahramani, Geoffrey E. Hinton. Variational Learning for Switching State-Space Models. Neural Computation, 12(4):963–996.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Murphy-SKF-1] 1 2 3 4 K. P. Murphy, "Switching Kalman Filters", Compaq Cambridge Research Lab Tech. Report 98-10, 1998

[2] K. Murphy. Switching Kalman filters. Technical report, U. C. Berkeley, 1998.

[3] K. Murphy. Dynamic Bayesian Networks: Representation, Inference and Learning. PhD thesis, University of California, Berkeley, Computer Science Division, 2002.

[SKF-Haykin-book-4] Kalman Filtering and Neural Networks. Edited by Simon Haykin. ISBN 0-471-22154-6

[Wu-Mumford-etal-5] Wu, Wei, Michael J. Black, David Bryant Mumford, Yun Gao, Elie Bienenstock, and John P. Donoghue. 2004. Modelling and decoding motor cortical activity using a switching Kalman filter. IEEE Transactions on Biomedical Engineering 51(6): 933-942. doi : 10.1109/TBME.2004.826666

[HeaIngFlaWol-6] Heald JB, Ingram JN, Flanagan JR, Wolpert DM. Multiple motor memories are learned to control different points on a tool. Nature Human Behaviour. 2, 300–311, (2018).

[Kim-Econometrics-7] 1 2 Kim, C.-J. (1994). Dynamic linear models with Markov-switching. J. Econometrics, 60:1–22.

[BarShalom-Tracking-8] 1 2 Bar-Shalom, Y. and Li, X.-R. (1993). Estimation and Tracking. Artech House, Boston, MA.

[9] Karimi, Parisa (2021). "Quantification of mismatch error in randomly switching linear state-space models". IEEE Signal Processing Letters. 28: 2008–2012. arXiv: 2012.04542 . Bibcode:2021ISPL...28.2008K. doi:10.1109/LSP.2021.3116504. S2CID 227745283.

[Ghahramani-Hinton-Variational-10] 1 2 Zoubin Ghahramani, Geoffrey E. Hinton. Variational Learning for Switching State-Space Models. Neural Computation, 12(4):963–996.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]