Iterated filtering

Last updated

Iterated filtering algorithms are a tool for maximum likelihood inference on partially observed dynamical systems. Stochastic perturbations to the unknown parameters are used to explore the parameter space. Applying sequential Monte Carlo (the particle filter) to this extended model results in the selection of the parameter values that are more consistent with the data. Appropriately constructed procedures, iterating with successively diminished perturbations, converge to the maximum likelihood estimate. [1] [2] [3] Iterated filtering methods have so far been used most extensively to study infectious disease transmission dynamics. Case studies include cholera, [4] [5] Ebola virus, [6] influenza, [7] [8] [9] [10] malaria, [11] [12] [13] HIV, [14] pertussis, [15] [16] poliovirus [17] and measles. [5] [18] Other areas which have been proposed to be suitable for these methods include ecological dynamics [19] [20] and finance. [21] [22]

Contents

The perturbations to the parameter space play several different roles. Firstly, they smooth out the likelihood surface, enabling the algorithm to overcome small-scale features of the likelihood during early stages of the global search. Secondly, Monte Carlo variation allows the search to escape from local minima. Thirdly, the iterated filtering update uses the perturbed parameter values to construct an approximation to the derivative of the log likelihood even though this quantity is not typically available in closed form. Fourthly, the parameter perturbations help to overcome numerical difficulties that can arise during sequential Monte Carlo.

Overview

The data are a time series collected at times . The dynamic system is modeled by a Markov process which is generated by a function in the sense that

where is a vector of unknown parameters and is some random quantity that is drawn independently each time is evaluated. An initial condition at some time is specified by an initialization function, . A measurement density completes the specification of a partially observed Markov process. We present a basic iterated filtering algorithm (IF1) [1] [2] followed by an iterated filtering algorithm implementing an iterated, perturbed Bayes map (IF2). [3] [23]

Procedure: Iterated filtering (IF1)

Input: A partially observed Markov model specified as above; Monte Carlo sample size ; number of iterations ; cooling parameters and ; covariance matrix ; initial parameter vector
for to
draw for
set for
set
for to
draw for
set for
set for
draw such that
set and for
set to the sample mean of , where the vector has components
set to the sample variance of
set
Output: Maximum likelihood estimate

Variations

  1. For IF1, parameters which enter the model only in the specification of the initial condition, , warrant some special algorithmic attention since information about them in the data may be concentrated in a small part of the time series. [1]
  2. Theoretically, any distribution with the requisite mean and variance could be used in place of the normal distribution. It is standard to use the normal distribution and to reparameterise to remove constraints on the possible values of the parameters.
  3. Modifications to the IF1 algorithm have been proposed to give superior asymptotic performance. [24] [25]

Procedure: Iterated filtering (IF2)

Input: A partially observed Markov model specified as above; Monte Carlo sample size ; number of iterations ; cooling parameter ; covariance matrix ; initial parameter vectors
for to
set for
set for
for to
draw for
set for
set for
draw such that
set and for
set for
Output: Parameter vectors approximating the maximum likelihood estimate,

Software

"pomp: statistical inference for partially-observed Markov processes"  : R package.

Related Research Articles

A Costas loop is a phase-locked loop (PLL) based circuit which is used for carrier frequency recovery from suppressed-carrier modulation signals and phase modulation signals. It was invented by John P. Costas at General Electric in the 1950s. Its invention was described as having had "a profound effect on modern digital communications". The primary application of Costas loops is in wireless receivers. Its advantage over other PLL-based detectors is that at small deviations the Costas loop error voltage is as compared to . This translates to double the sensitivity and also makes the Costas loop uniquely suited for tracking Doppler-shifted carriers, especially in OFDM and GPS receivers.

The likelihood function is the joint probability of observed data viewed as a function of the parameters of a statistical model.

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

In statistics, a statistic is sufficient with respect to a statistical model and its associated unknown parameter if "no other statistic that can be calculated from the same sample provides any additional information as to the value of the parameter". In particular, a statistic is sufficient for a family of probability distributions if the sample from which it is calculated gives no additional information than the statistic, as to which of those probability distributions is the sampling distribution.

<span class="mw-page-title-main">Gamma distribution</span> Probability distribution

In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution. There are two equivalent parameterizations in common use:

  1. With a shape parameter and a scale parameter .
  2. With a shape parameter and an inverse scale parameter , called a rate parameter.

In statistics, the Lehmann–Scheffé theorem is a prominent statement, tying together the ideas of completeness, sufficiency, uniqueness, and best unbiased estimation. The theorem states that any estimator that is unbiased for a given unknown quantity and that depends on the data only through a complete, sufficient statistic is the unique best unbiased estimator of that quantity. The Lehmann–Scheffé theorem is named after Erich Leo Lehmann and Henry Scheffé, given their two early papers.

<span class="mw-page-title-main">Expectation–maximization algorithm</span> Iterative method for finding maximum likelihood estimates in statistical models

In statistics, an expectation–maximization (EM) algorithm is an iterative method to find (local) maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables. The EM iteration alternates between performing an expectation (E) step, which creates a function for the expectation of the log-likelihood evaluated using the current estimate for the parameters, and a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step. These parameter-estimates are then used to determine the distribution of the latent variables in the next E step. It can be used, for example, to estimate a mixture of gaussians, or to solve the multiple linear regression problem.

In analytical mechanics, generalized coordinates are a set of parameters used to represent the state of a system in a configuration space. These parameters must uniquely define the configuration of the system relative to a reference state. The generalized velocities are the time derivatives of the generalized coordinates of the system. The adjective "generalized" distinguishes these parameters from the traditional use of the term "coordinate" to refer to Cartesian coordinates.

In mathematical statistics, the Fisher information is a way of measuring the amount of information that an observable random variable X carries about an unknown parameter θ of a distribution that models X. Formally, it is the variance of the score, or the expected value of the observed information.

In statistics, a mixture model is a probabilistic model for representing the presence of subpopulations within an overall population, without requiring that an observed data set should identify the sub-population to which an individual observation belongs. Formally a mixture model corresponds to the mixture distribution that represents the probability distribution of observations in the overall population. However, while problems associated with "mixture distributions" relate to deriving the properties of the overall population from those of the sub-populations, "mixture models" are used to make statistical inferences about the properties of the sub-populations given only observations on the pooled population, without sub-population identity information.

<span class="mw-page-title-main">Tomographic reconstruction</span> Estimate object properties from a finite number of projections

Tomographic reconstruction is a type of multidimensional inverse problem where the challenge is to yield an estimate of a specific system from a finite number of projections. The mathematical basis for tomographic imaging was laid down by Johann Radon. A notable example of applications is the reconstruction of computed tomography (CT) where cross-sectional images of patients are obtained in non-invasive manner. Recent developments have seen the Radon transform and its inverse used for tasks related to realistic object insertion required for testing and evaluating computed tomography use in airport security.

In probability theory the hypoexponential distribution or the generalized Erlang distribution is a continuous distribution, that has found use in the same fields as the Erlang distribution, such as queueing theory, teletraffic engineering and more generally in stochastic processes. It is called the hypoexponetial distribution as it has a coefficient of variation less than one, compared to the hyper-exponential distribution which has coefficient of variation greater than one and the exponential distribution which has coefficient of variation of one.

In statistics, the method of moments is a method of estimation of population parameters. The same principle is used to derive higher moments like skewness and kurtosis.

Rietveld refinement is a technique described by Hugo Rietveld for use in the characterisation of crystalline materials. The neutron and X-ray diffraction of powder samples results in a pattern characterised by reflections at certain positions. The height, width and position of these reflections can be used to determine many aspects of the material's structure.

<span class="mw-page-title-main">Arnold tongue</span> Phenomenon in maths

In mathematics, particularly in dynamical systems, Arnold tongues are a pictorial phenomenon that occur when visualizing how the rotation number of a dynamical system, or other related invariant property thereof, changes according to two or more of its parameters. The regions of constant rotation number have been observed, for some dynamical systems, to form geometric shapes that resemble tongues, in which case they are called Arnold tongues.

Biological motion perception is the act of perceiving the fluid unique motion of a biological agent. The phenomenon was first documented by Swedish perceptual psychologist, Gunnar Johansson, in 1973. There are many brain areas involved in this process, some similar to those used to perceive faces. While humans complete this process with ease, from a computational neuroscience perspective there is still much to be learned as to how this complex perceptual problem is solved. One tool which many research studies in this area use is a display stimuli called a point light walker. Point light walkers are coordinated moving dots that simulate biological motion in which each dot represents specific joints of a human performing an action.

Weak supervision is a paradigm in machine learning, the relevance and notability of which increased with the advent of large language models due to large amount of data required to train them. It is characterized by using a combination of a small amount of human-labeled data, followed by a large amount of unlabeled data. In other words, the desired output values are provided only for a subset of the training data. The remaining data is unlabeled or imprecisely labeled. Intuitively, it can be seen as an exam and labeled data as sample problems that the teacher solves for the class as an aid in solving another set of problems. In the transductive setting, these unsolved problems act as exam questions. In the inductive setting, they become practice problems of the sort that will make up the exam. Technically, it could be viewed as performing clustering and then labeling the clusters with the labeled data, pushing the decision boundary away from high-density regions, or learning an underlying one-dimensional manifold where the data reside.

In the study of artificial neural networks (ANNs), the neural tangent kernel (NTK) is a kernel that describes the evolution of deep artificial neural networks during their training by gradient descent. It allows ANNs to be studied using theoretical tools from kernel methods.

<span class="mw-page-title-main">Hyperbolastic functions</span> Mathematical functions

The hyperbolastic functions, also known as hyperbolastic growth models, are mathematical functions that are used in medical statistical modeling. These models were originally developed to capture the growth dynamics of multicellular tumor spheres, and were introduced in 2005 by Mohammad Tabatabai, David Williams, and Zoran Bursac. The precision of hyperbolastic functions in modeling real world problems is somewhat due to their flexibility in their point of inflection. These functions can be used in a wide variety of modeling problems such as tumor growth, stem cell proliferation, pharma kinetics, cancer growth, sigmoid activation function in neural networks, and epidemiological disease progression or regression.

Nonlinear mixed-effects models constitute a class of statistical models generalizing linear mixed-effects models. Like linear mixed-effects models, they are particularly useful in settings where there are multiple measurements within the same statistical units or when there are dependencies between measurements on related statistical units. Nonlinear mixed-effects models are applied in many fields including medicine, public health, pharmacology, and ecology.

References

  1. 1 2 3 Ionides, E. L.; Breto, C.; King, A. A. (2006). "Inference for nonlinear dynamical systems". Proceedings of the National Academy of Sciences of the USA. 103 (49): 18438–18443. Bibcode:2006PNAS..10318438I. doi: 10.1073/pnas.0603181103 . PMC   3020138 . PMID   17121996.
  2. 1 2 Ionides, E. L.; Bhadra, A.; Atchade, Y.; King, A. A. (2011). "Iterated filtering". Annals of Statistics. 39 (3): 1776–1802. arXiv: 0902.0347 . doi:10.1214/11-AOS886. S2CID   6527480.
  3. 1 2 Ionides, E. L.; Nguyen, D.; Atchadé, Y.; Stoev, S.; King, A. A. (2015). "Inference for dynamic and latent variable models via iterated, perturbed Bayes maps". Proceedings of the National Academy of Sciences of the USA. 112 (3): 719–724. Bibcode:2015PNAS..112..719I. doi: 10.1073/pnas.1410597112 . PMC   4311819 . PMID   25568084.
  4. King, A. A.; Ionides, E. L.; Pascual, M.; Bouma, M. J. (2008). "Inapparent infections and cholera dynamics" (PDF). Nature. 454 (7206): 877–880. Bibcode:2008Natur.454..877K. doi:10.1038/nature07084. hdl: 2027.42/62519 . PMID   18704085. S2CID   4408759.
  5. 1 2 Breto, C.; He, D.; Ionides, E. L.; King, A. A. (2009). "Time series analysis via mechanistic models". Annals of Applied Statistics. 3: 319–348. arXiv: 0802.0021 . doi:10.1214/08-AOAS201. S2CID   8400632.
  6. King AA, Domenech de Celles M, Magpantay FM, Rohani P (2015). "Avoidable errors in the modelling of outbreaks of emerging pathogens, with special reference to Ebola". Proceedings of the Royal Society B. 282 (1806): 20150347. doi:10.1098/rspb.2015.0347. PMC   4426634 . PMID   25833863.
  7. He, D.; J. Dushoff; T. Day; J. Ma; D. Earn (2011). "Mechanistic modelling of the three waves of the 1918 influenza pandemic". Theoretical Ecology. 4 (2): 1–6. doi:10.1007/s12080-011-0123-3. S2CID   2010776.
  8. Camacho, A.; S. Ballesteros; A. L. Graham; R. Carrat; O. Ratmann; B. Cazelles (2011). "Explaining rapid reinfections in multiple-wave influenza outbreaks: Tristan da Cunha 1971 epidemic as a case study". Proceedings of the Royal Society B. 278 (1725): 3635–3643. doi:10.1098/rspb.2011.0300. PMC   3203494 . PMID   21525058.
  9. Earn, D.; He, D.; Loeb, M. B.; Fonseca, K.; Lee, B. E.; Dushoff, J. (2012). "Effects of School Closure on Incidence of Pandemic Influenza in Alberta, Canada". Annals of Internal Medicine. 156 (3): 173–181. doi: 10.7326/0003-4819-156-3-201202070-00005 . PMID   22312137.
  10. Shrestha, S.; Foxman, B.; Weinberger, D. M.; Steiner, C.; Viboud, C.; Rohani, P. (2013). "Identifying the interaction between influenza and pneumococcal pneumonia using incidence data". Science Translational Medicine. 5 (191): 191ra84. doi:10.1126/scitranslmed.3005982. PMC   4178309 . PMID   23803706.
  11. Laneri, K.; A. Bhadra; E. L. Ionides; M. Bouma; R. C. Dhiman; R. S. Yadav; M. Pascual (2010). "Forcing versus feedback: Epidemic malaria and monsoon rains in NW India". PLOS Computational Biology. 6 (9): e1000898. Bibcode:2010PLSCB...6E0898L. doi: 10.1371/journal.pcbi.1000898 . PMC   2932675 . PMID   20824122.
  12. Bhadra, A.; E. L. Ionides; K. Laneri; M. Bouma; R. C. Dhiman; M. Pascual (2011). "Malaria in Northwest India: Data analysis via partially observed stochastic differential equation models driven by Lévy noise". Journal of the American Statistical Association. 106 (494): 440–451. doi:10.1198/jasa.2011.ap10323. S2CID   53560432.
  13. Roy, M.; Bouma, M. J.; Ionides, E. L.; Dhiman, R. C.; Pascual, M. (2013). "The potential elimination of Plasmodium vivax malaria by relapse treatment: Insights from a transmission model and surveillance data from NW India". PLOS Neglected Tropical Diseases. 7 (1): e1979. doi: 10.1371/journal.pntd.0001979 . PMC   3542148 . PMID   23326611.
  14. Zhou, J.; Han, L.; Liu, S. (2013). "Nonlinear mixed-effects state space models with applications to HIV dynamics". Statistics and Probability Letters. 83 (5): 1448–1456. doi:10.1016/j.spl.2013.01.032.
  15. Lavine, J.; Rohani, P. (2012). "Resolving pertussis immunity and vaccine effectiveness using incidence time series". Expert Review of Vaccines. 11 (11): 1319–1329. doi:10.1586/ERV.12.109. PMC   3595187 . PMID   23249232.
  16. Blackwood, J. C.; Cummings, D. A. T.; Broutin, H.; Iamsirithaworn, S.; Rohani, P. (2013). "Deciphering the impacts of vaccination and immunity on pertussis epidemiology in Thailand". Proceedings of the National Academy of Sciences of the USA. 110 (23): 9595–9600. Bibcode:2013PNAS..110.9595B. doi: 10.1073/pnas.1220908110 . PMC   3677483 . PMID   23690587.
  17. Blake, I. M.; Martin, R.; Goel, A.; Khetsuriani, N.; Everts, J.; Wolff, C.; Wassilak, S.; Aylward, R. B.; Grassly, N. C. (2014). "The role of older children and adults in wild poliovirus transmission". Proceedings of the National Academy of Sciences of the USA. 111 (29): 10604–10609. Bibcode:2014PNAS..11110604B. doi: 10.1073/pnas.1323688111 . PMC   4115498 . PMID   25002465.
  18. He, D.; Ionides, E. L.; King, A. A. (2010). "Plug-and-play inference for disease dynamics: measles in large and small towns as a case study". Journal of the Royal Society Interface. 7 (43): 271–283. doi:10.1098/rsif.2009.0151. PMC   2842609 . PMID   19535416.
  19. Ionides, E. L.. (2011). "Discussion on "Feature Matching in Time Series Modeling" by Y. Xia and H. Tong". Statistical Science. 26: 49–52. arXiv: 1201.1376 . doi:10.1214/11-STS345C. S2CID   88511724.
  20. Blackwood, J. C.; Streicker, D. G.; Altizer, S.; Rohani, P. (2013). "Resolving the roles of immunity, pathogenesis, and immigration for rabies persistence in vampire bat". Proceedings of the National Academy of Sciences of the USA. 110 (51): 20837––20842. Bibcode:2013PNAS..11020837B. doi: 10.1073/pnas.1308817110 . PMC   3870737 . PMID   24297874.
  21. Bhadra, A. (2010). "Discussion of "Particle Markov chain Monte Carlo methods" by C. Andrieu, A. Doucet and R. Holenstein". Journal of the Royal Statistical Society, Series B. 72 (3): 314–315. doi: 10.1111/j.1467-9868.2009.00736.x .
  22. Breto, C. (2014). "On idiosyncratic stochasticity of financial leverage effects". Statistics and Probability Letters. 91: 20–26. arXiv: 1312.5496 . doi:10.1016/j.spl.2014.04.003. S2CID   122694545.
  23. Lindstrom, E.; Ionides, E. L.; Frydendall, J.; Madsen, H. (2012). "Efficient Iterated Filtering". System Identification. 45 (16): 1785–1790. doi: 10.3182/20120711-3-BE-2027.00300 .
  24. Lindstrom, E. (2013). "Tuned iterated filtering". Statistics and Probability Letters. 83 (9): 2077–2080. doi:10.1016/j.spl.2013.05.019.
  25. Doucet, A.; Jacob, P. E.; Rubenthaler, S. (2013). "Derivative-Free Estimation of the Score Vector and Observed Information Matrix with Application to State-Space Models". arXiv: 1304.5768 [stat.ME].