Iterated filtering

Last updated

Iterated filtering algorithms are a tool for maximum likelihood inference on partially observed dynamical systems. Stochastic perturbations to the unknown parameters are used to explore the parameter space. Applying sequential Monte Carlo (the particle filter) to this extended model results in the selection of the parameter values that are more consistent with the data. Appropriately constructed procedures, iterating with successively diminished perturbations, converge to the maximum likelihood estimate. [1] [2] [3] Iterated filtering methods have so far been used most extensively to study infectious disease transmission dynamics. Case studies include cholera, [4] [5] Ebola virus, [6] influenza, [7] [8] [9] [10] malaria, [11] [12] [13] HIV, [14] pertussis, [15] [16] poliovirus [17] and measles. [5] [18] Other areas which have been proposed to be suitable for these methods include ecological dynamics [19] [20] and finance. [21] [22]

Contents

The perturbations to the parameter space play several different roles. Firstly, they smooth out the likelihood surface, enabling the algorithm to overcome small-scale features of the likelihood during early stages of the global search. Secondly, Monte Carlo variation allows the search to escape from local minima. Thirdly, the iterated filtering update uses the perturbed parameter values to construct an approximation to the derivative of the log likelihood even though this quantity is not typically available in closed form. Fourthly, the parameter perturbations help to overcome numerical difficulties that can arise during sequential Monte Carlo.

Overview

The data are a time series collected at times . The dynamic system is modeled by a Markov process which is generated by a function in the sense that

where is a vector of unknown parameters and is some random quantity that is drawn independently each time is evaluated. An initial condition at some time is specified by an initialization function, . A measurement density completes the specification of a partially observed Markov process. We present a basic iterated filtering algorithm (IF1) [1] [2] followed by an iterated filtering algorithm implementing an iterated, perturbed Bayes map (IF2). [3] [23]

Procedure: Iterated filtering (IF1)

Input: A partially observed Markov model specified as above; Monte Carlo sample size ; number of iterations ; cooling parameters and ; covariance matrix ; initial parameter vector
for to
draw for
set for
set
for to
draw for
set for
set for
draw such that
set and for
set to the sample mean of , where the vector has components
set to the sample variance of
set
Output: Maximum likelihood estimate

Variations

  1. For IF1, parameters which enter the model only in the specification of the initial condition, , warrant some special algorithmic attention since information about them in the data may be concentrated in a small part of the time series. [1]
  2. Theoretically, any distribution with the requisite mean and variance could be used in place of the normal distribution. It is standard to use the normal distribution and to reparameterise to remove constraints on the possible values of the parameters.
  3. Modifications to the IF1 algorithm have been proposed to give superior asymptotic performance. [24] [25]

Procedure: Iterated filtering (IF2)

Input: A partially observed Markov model specified as above; Monte Carlo sample size ; number of iterations ; cooling parameter ; covariance matrix ; initial parameter vectors
for to
set for
set for
for to
draw for
set for
set for
draw such that
set and for
set for
Output: Parameter vectors approximating the maximum likelihood estimate,

Software

"pomp: statistical inference for partially-observed Markov processes"  : R package.

Related Research Articles

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

In statistics, sufficiency is a property of a statistic computed on a sample dataset in relation to a parametric model of the dataset. A sufficient statistic contains all of the information that the dataset provides about the model parameters. It is closely related to the concepts of an ancillary statistic which contains no information about the model parameters, and of a complete statistic which only contains information about the parameters and no ancillary information.

<span class="mw-page-title-main">Gamma distribution</span> Probability distribution

In probability theory and statistics, the gamma distribution is a versatile two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution. There are two equivalent parameterizations in common use:

  1. With a shape parameter k and a scale parameter θ
  2. With a shape parameter and an inverse scale parameter , called a rate parameter.

In statistics, the Lehmann–Scheffé theorem is a prominent statement, tying together the ideas of completeness, sufficiency, uniqueness, and best unbiased estimation. The theorem states that any estimator that is unbiased for a given unknown quantity and that depends on the data only through a complete, sufficient statistic is the unique best unbiased estimator of that quantity. The Lehmann–Scheffé theorem is named after Erich Leo Lehmann and Henry Scheffé, given their two early papers.

<span class="mw-page-title-main">Expectation–maximization algorithm</span> Iterative method for finding maximum likelihood estimates in statistical models

In statistics, an expectation–maximization (EM) algorithm is an iterative method to find (local) maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables. The EM iteration alternates between performing an expectation (E) step, which creates a function for the expectation of the log-likelihood evaluated using the current estimate for the parameters, and a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step. These parameter-estimates are then used to determine the distribution of the latent variables in the next E step. It can be used, for example, to estimate a mixture of gaussians, or to solve the multiple linear regression problem.

In analytical mechanics, generalized coordinates are a set of parameters used to represent the state of a system in a configuration space. These parameters must uniquely define the configuration of the system relative to a reference state. The generalized velocities are the time derivatives of the generalized coordinates of the system. The adjective "generalized" distinguishes these parameters from the traditional use of the term "coordinate" to refer to Cartesian coordinates.

In statistics, Gibbs sampling or a Gibbs sampler is a Markov chain Monte Carlo (MCMC) algorithm for sampling from a specified multivariate probability distribution when direct sampling from the joint distribution is difficult, but sampling from the conditional distribution is more practical. This sequence can be used to approximate the joint distribution ; to approximate the marginal distribution of one of the variables, or some subset of the variables ; or to compute an integral. Typically, some of the variables correspond to observations whose values are known, and hence do not need to be sampled.

In mathematical statistics, the Fisher information is a way of measuring the amount of information that an observable random variable X carries about an unknown parameter θ of a distribution that models X. Formally, it is the variance of the score, or the expected value of the observed information.

In statistics, a mixture model is a probabilistic model for representing the presence of subpopulations within an overall population, without requiring that an observed data set should identify the sub-population to which an individual observation belongs. Formally a mixture model corresponds to the mixture distribution that represents the probability distribution of observations in the overall population. However, while problems associated with "mixture distributions" relate to deriving the properties of the overall population from those of the sub-populations, "mixture models" are used to make statistical inferences about the properties of the sub-populations given only observations on the pooled population, without sub-population identity information. Mixture models are used for clustering, under the name model-based clustering, and also for density estimation.

<span class="mw-page-title-main">Tomographic reconstruction</span> Estimate object properties from a finite number of projections

Tomographic reconstruction is a type of multidimensional inverse problem where the challenge is to yield an estimate of a specific system from a finite number of projections. The mathematical basis for tomographic imaging was laid down by Johann Radon. A notable example of applications is the reconstruction of computed tomography (CT) where cross-sectional images of patients are obtained in non-invasive manner. Recent developments have seen the Radon transform and its inverse used for tasks related to realistic object insertion required for testing and evaluating computed tomography use in airport security.

In probability theory the hypoexponential distribution or the generalized Erlang distribution is a continuous distribution, that has found use in the same fields as the Erlang distribution, such as queueing theory, teletraffic engineering and more generally in stochastic processes. It is called the hypoexponetial distribution as it has a coefficient of variation less than one, compared to the hyper-exponential distribution which has coefficient of variation greater than one and the exponential distribution which has coefficient of variation of one.

In statistics, the method of moments is a method of estimation of population parameters. The same principle is used to derive higher moments like skewness and kurtosis.

Rietveld refinement is a technique described by Hugo Rietveld for use in the characterisation of crystalline materials. The neutron and X-ray diffraction of powder samples results in a pattern characterised by reflections at certain positions. The height, width and position of these reflections can be used to determine many aspects of the material's structure.

<span class="mw-page-title-main">Arnold tongue</span> Phenomenon in maths

In mathematics, particularly in dynamical systems, Arnold tongues are a pictorial phenomenon that occur when visualizing how the rotation number of a dynamical system, or other related invariant property thereof, changes according to two or more of its parameters. The regions of constant rotation number have been observed, for some dynamical systems, to form geometric shapes that resemble tongues, in which case they are called Arnold tongues.

Bayesian hierarchical modelling is a statistical model written in multiple levels that estimates the parameters of the posterior distribution using the Bayesian method. The sub-models combine to form the hierarchical model, and Bayes' theorem is used to integrate them with the observed data and account for all the uncertainty that is present. The result of this integration is the posterior distribution, also known as the updated probability estimate, as additional evidence on the prior distribution is acquired.

Biological motion perception is the act of perceiving the fluid unique motion of a biological agent. The phenomenon was first documented by Swedish perceptual psychologist, Gunnar Johansson, in 1973. There are many brain areas involved in this process, some similar to those used to perceive faces. While humans complete this process with ease, from a computational neuroscience perspective there is still much to be learned as to how this complex perceptual problem is solved. One tool which many research studies in this area use is a display stimuli called a point light walker. Point light walkers are coordinated moving dots that simulate biological motion in which each dot represents specific joints of a human performing an action.

<span class="mw-page-title-main">Learning curve (machine learning)</span>

In machine learning, a learning curve plots the optimal value of a model's loss function for a training set against this loss function evaluated on a validation data set with same parameters as produced the optimal function. Synonyms include error curve, experience curve, improvement curve and generalization curve.

In the study of artificial neural networks (ANNs), the neural tangent kernel (NTK) is a kernel that describes the evolution of deep artificial neural networks during their training by gradient descent. It allows ANNs to be studied using theoretical tools from kernel methods.

<span class="mw-page-title-main">Hyperbolastic functions</span> Mathematical functions

The hyperbolastic functions, also known as hyperbolastic growth models, are mathematical functions that are used in medical statistical modeling. These models were originally developed to capture the growth dynamics of multicellular tumor spheres, and were introduced in 2005 by Mohammad Tabatabai, David Williams, and Zoran Bursac. The precision of hyperbolastic functions in modeling real world problems is somewhat due to their flexibility in their point of inflection. These functions can be used in a wide variety of modeling problems such as tumor growth, stem cell proliferation, pharma kinetics, cancer growth, sigmoid activation function in neural networks, and epidemiological disease progression or regression.

Nonlinear mixed-effects models constitute a class of statistical models generalizing linear mixed-effects models. Like linear mixed-effects models, they are particularly useful in settings where there are multiple measurements within the same statistical units or when there are dependencies between measurements on related statistical units. Nonlinear mixed-effects models are applied in many fields including medicine, public health, pharmacology, and ecology.

References

  1. 1 2 3 Ionides, E. L.; Breto, C.; King, A. A. (2006). "Inference for nonlinear dynamical systems". Proceedings of the National Academy of Sciences of the USA. 103 (49): 18438–18443. Bibcode:2006PNAS..10318438I. doi: 10.1073/pnas.0603181103 . PMC   3020138 . PMID   17121996.
  2. 1 2 Ionides, E. L.; Bhadra, A.; Atchade, Y.; King, A. A. (2011). "Iterated filtering". Annals of Statistics. 39 (3): 1776–1802. arXiv: 0902.0347 . doi:10.1214/11-AOS886. S2CID   6527480.
  3. 1 2 Ionides, E. L.; Nguyen, D.; Atchadé, Y.; Stoev, S.; King, A. A. (2015). "Inference for dynamic and latent variable models via iterated, perturbed Bayes maps". Proceedings of the National Academy of Sciences of the USA. 112 (3): 719–724. Bibcode:2015PNAS..112..719I. doi: 10.1073/pnas.1410597112 . PMC   4311819 . PMID   25568084.
  4. King, A. A.; Ionides, E. L.; Pascual, M.; Bouma, M. J. (2008). "Inapparent infections and cholera dynamics" (PDF). Nature. 454 (7206): 877–880. Bibcode:2008Natur.454..877K. doi:10.1038/nature07084. hdl: 2027.42/62519 . PMID   18704085. S2CID   4408759. Archived from the original on 2021-08-28. Retrieved 2024-05-23.
  5. 1 2 Breto, C.; He, D.; Ionides, E. L.; King, A. A. (2009). "Time series analysis via mechanistic models". Annals of Applied Statistics. 3: 319–348. arXiv: 0802.0021 . doi:10.1214/08-AOAS201. S2CID   8400632.
  6. King AA, Domenech de Celles M, Magpantay FM, Rohani P (2015). "Avoidable errors in the modelling of outbreaks of emerging pathogens, with special reference to Ebola". Proceedings of the Royal Society B. 282 (1806): 20150347. doi:10.1098/rspb.2015.0347. PMC   4426634 . PMID   25833863.
  7. He, D.; J. Dushoff; T. Day; J. Ma; D. Earn (2011). "Mechanistic modelling of the three waves of the 1918 influenza pandemic". Theoretical Ecology. 4 (2): 1–6. Bibcode:2011ThEco...4..283H. doi:10.1007/s12080-011-0123-3. S2CID   2010776.
  8. Camacho, A.; S. Ballesteros; A. L. Graham; R. Carrat; O. Ratmann; B. Cazelles (2011). "Explaining rapid reinfections in multiple-wave influenza outbreaks: Tristan da Cunha 1971 epidemic as a case study". Proceedings of the Royal Society B. 278 (1725): 3635–3643. doi:10.1098/rspb.2011.0300. PMC   3203494 . PMID   21525058.
  9. Earn, D.; He, D.; Loeb, M. B.; Fonseca, K.; Lee, B. E.; Dushoff, J. (2012). "Effects of School Closure on Incidence of Pandemic Influenza in Alberta, Canada". Annals of Internal Medicine. 156 (3): 173–181. doi: 10.7326/0003-4819-156-3-201202070-00005 . PMID   22312137.
  10. Shrestha, S.; Foxman, B.; Weinberger, D. M.; Steiner, C.; Viboud, C.; Rohani, P. (2013). "Identifying the interaction between influenza and pneumococcal pneumonia using incidence data". Science Translational Medicine. 5 (191): 191ra84. doi:10.1126/scitranslmed.3005982. PMC   4178309 . PMID   23803706.
  11. Laneri, K.; A. Bhadra; E. L. Ionides; M. Bouma; R. C. Dhiman; R. S. Yadav; M. Pascual (2010). "Forcing versus feedback: Epidemic malaria and monsoon rains in NW India". PLOS Computational Biology. 6 (9): e1000898. Bibcode:2010PLSCB...6E0898L. doi: 10.1371/journal.pcbi.1000898 . PMC   2932675 . PMID   20824122.
  12. Bhadra, A.; E. L. Ionides; K. Laneri; M. Bouma; R. C. Dhiman; M. Pascual (2011). "Malaria in Northwest India: Data analysis via partially observed stochastic differential equation models driven by Lévy noise". Journal of the American Statistical Association. 106 (494): 440–451. doi:10.1198/jasa.2011.ap10323. S2CID   53560432.
  13. Roy, M.; Bouma, M. J.; Ionides, E. L.; Dhiman, R. C.; Pascual, M. (2013). "The potential elimination of Plasmodium vivax malaria by relapse treatment: Insights from a transmission model and surveillance data from NW India". PLOS Neglected Tropical Diseases. 7 (1): e1979. doi: 10.1371/journal.pntd.0001979 . PMC   3542148 . PMID   23326611.
  14. Zhou, J.; Han, L.; Liu, S. (2013). "Nonlinear mixed-effects state space models with applications to HIV dynamics". Statistics and Probability Letters. 83 (5): 1448–1456. doi:10.1016/j.spl.2013.01.032.
  15. Lavine, J.; Rohani, P. (2012). "Resolving pertussis immunity and vaccine effectiveness using incidence time series". Expert Review of Vaccines. 11 (11): 1319–1329. doi:10.1586/ERV.12.109. PMC   3595187 . PMID   23249232.
  16. Blackwood, J. C.; Cummings, D. A. T.; Broutin, H.; Iamsirithaworn, S.; Rohani, P. (2013). "Deciphering the impacts of vaccination and immunity on pertussis epidemiology in Thailand". Proceedings of the National Academy of Sciences of the USA. 110 (23): 9595–9600. Bibcode:2013PNAS..110.9595B. doi: 10.1073/pnas.1220908110 . PMC   3677483 . PMID   23690587.
  17. Blake, I. M.; Martin, R.; Goel, A.; Khetsuriani, N.; Everts, J.; Wolff, C.; Wassilak, S.; Aylward, R. B.; Grassly, N. C. (2014). "The role of older children and adults in wild poliovirus transmission". Proceedings of the National Academy of Sciences of the USA. 111 (29): 10604–10609. Bibcode:2014PNAS..11110604B. doi: 10.1073/pnas.1323688111 . PMC   4115498 . PMID   25002465.
  18. He, D.; Ionides, E. L.; King, A. A. (2010). "Plug-and-play inference for disease dynamics: measles in large and small towns as a case study". Journal of the Royal Society Interface. 7 (43): 271–283. doi:10.1098/rsif.2009.0151. PMC   2842609 . PMID   19535416.
  19. Ionides, E. L.. (2011). "Discussion on "Feature Matching in Time Series Modeling" by Y. Xia and H. Tong". Statistical Science. 26: 49–52. arXiv: 1201.1376 . doi:10.1214/11-STS345C. S2CID   88511724.
  20. Blackwood, J. C.; Streicker, D. G.; Altizer, S.; Rohani, P. (2013). "Resolving the roles of immunity, pathogenesis, and immigration for rabies persistence in vampire bat". Proceedings of the National Academy of Sciences of the USA. 110 (51): 20837––20842. Bibcode:2013PNAS..11020837B. doi: 10.1073/pnas.1308817110 . PMC   3870737 . PMID   24297874.
  21. Bhadra, A. (2010). "Discussion of "Particle Markov chain Monte Carlo methods" by C. Andrieu, A. Doucet and R. Holenstein". Journal of the Royal Statistical Society, Series B. 72 (3): 314–315. doi: 10.1111/j.1467-9868.2009.00736.x .
  22. Breto, C. (2014). "On idiosyncratic stochasticity of financial leverage effects". Statistics and Probability Letters. 91: 20–26. arXiv: 1312.5496 . doi:10.1016/j.spl.2014.04.003. S2CID   122694545.
  23. Lindstrom, E.; Ionides, E. L.; Frydendall, J.; Madsen, H. (2012). "Efficient Iterated Filtering". System Identification. 45 (16): 1785–1790. doi: 10.3182/20120711-3-BE-2027.00300 .
  24. Lindstrom, E. (2013). "Tuned iterated filtering". Statistics and Probability Letters. 83 (9): 2077–2080. doi:10.1016/j.spl.2013.05.019.
  25. Doucet, A.; Jacob, P. E.; Rubenthaler, S. (2013). "Derivative-Free Estimation of the Score Vector and Observed Information Matrix with Application to State-Space Models". arXiv: 1304.5768 [stat.ME].