Whittle likelihood

Last updated

In statistics, Whittle likelihood is an approximation to the likelihood function of a stationary Gaussian time series. It is named after the mathematician and statistician Peter Whittle, who introduced it in his PhD thesis in 1951. [1] It is commonly used in time series analysis and signal processing for parameter estimation and signal detection.

Contents

Context

In a stationary Gaussian time series model, the likelihood function is (as usual in Gaussian models) a function of the associated mean and covariance parameters. With a large number () of observations, the () covariance matrix may become very large, making computations very costly in practice. However, due to stationarity, the covariance matrix has a rather simple structure, and by using an approximation, computations may be simplified considerably (from to ). [2] The idea effectively boils down to assuming a heteroscedastic zero-mean Gaussian model in Fourier domain; the model formulation is based on the time series' discrete Fourier transform and its power spectral density. [3] [4] [5]

Definition

Let be a stationary Gaussian time series with (one-sided) power spectral density , where is even and samples are taken at constant sampling intervals . Let be the (complex-valued) discrete Fourier transform (DFT) of the time series. Then for the Whittle likelihood one effectively assumes independent zero-mean Gaussian distributions for all with variances for the real and imaginary parts given by

where is the th Fourier frequency. This approximate model immediately leads to the (logarithmic) likelihood function

where denotes the absolute value with . [3] [4] [6]

Special case of a known noise spectrum

In case the noise spectrum is assumed a-priori known, and noise properties are not to be inferred from the data, the likelihood function may be simplified further by ignoring constant terms, leading to the sum-of-squares expression

This expression also is the basis for the common matched filter.

Accuracy of approximation

The Whittle likelihood in general is only an approximation, it is only exact if the spectrum is constant, i.e., in the trivial case of white noise. The efficiency of the Whittle approximation always depends on the particular circumstances. [7] [8]

Note that due to linearity of the Fourier transform, Gaussianity in Fourier domain implies Gaussianity in time domain and vice versa. What makes the Whittle likelihood only approximately accurate is related to the sampling theorem—the effect of Fourier-transforming only a finite number of data points, which also manifests itself as spectral leakage in related problems (and which may be ameliorated using the same methods, namely, windowing). In the present case, the implicit periodicity assumption implies correlation between the first and last samples ( and ), which are effectively treated as "neighbouring" samples (like and ).

Applications

Parameter estimation

Whittle's likelihood is commonly used to estimate signal parameters for signals that are buried in non-white noise. The noise spectrum then may be assumed known, [9] or it may be inferred along with the signal parameters. [4] [6]

Signal detection

Signal detection is commonly performed with the matched filter, which is based on the Whittle likelihood for the case of a known noise power spectral density. [10] [11] The matched filter effectively does a maximum-likelihood fit of the signal to the noisy data and uses the resulting likelihood ratio as the detection statistic. [12]

The matched filter may be generalized to an analogous procedure based on a Student-t distribution by also considering uncertainty (e.g. estimation uncertainty) in the noise spectrum. On the technical side, this entails repeated or iterative matched-filtering. [12]

Spectrum estimation

The Whittle likelihood is also applicable for estimation of the noise spectrum, either alone or in conjunction with signal parameters. [13] [14]

See also

Related Research Articles

<span class="mw-page-title-main">Wavelet</span> Function for integral Fourier-like transform

A wavelet is a wave-like oscillation with an amplitude that begins at zero, increases or decreases, and then returns to zero one or more times. Wavelets are termed a "brief oscillation". A taxonomy of wavelets has been established, based on the number and direction of its pulses. Wavelets are imbued with specific properties that make them useful for signal processing.

<span class="mw-page-title-main">Kalman filter</span> Algorithm that estimates unknowns from a series of measurements over time

For statistics and control theory, Kalman filtering, also known as linear quadratic estimation (LQE), is an algorithm that uses a series of measurements observed over time, including statistical noise and other inaccuracies, and produces estimates of unknown variables that tend to be more accurate than those based on a single measurement alone, by estimating a joint probability distribution over the variables for each timeframe. The filter is named after Rudolf E. Kálmán, who was one of the primary developers of its theory.

<span class="mw-page-title-main">Spectral density</span> Relative importance of certain frequencies in a composite signal

The power spectrum of a time series describes the distribution of power into frequency components composing that signal. According to Fourier analysis, any physical signal can be decomposed into a number of discrete frequencies, or a spectrum of frequencies over a continuous range. The statistical average of any sort of signal as analyzed in terms of its frequency content, is called its spectrum.

<span class="mw-page-title-main">Window function</span> Function used in signal processing

In signal processing and statistics, a window function is a mathematical function that is zero-valued outside of some chosen interval, normally symmetric around the middle of the interval, usually approaching a maximum in the middle, and usually tapering away from the middle. Mathematically, when another function or waveform/data-sequence is "multiplied" by a window function, the product is also zero-valued outside the interval: all that is left is the part where they overlap, the "view through the window". Equivalently, and in actual practice, the segment of data within the window is first isolated, and then only that data is multiplied by the window function values. Thus, tapering, not segmentation, is the main purpose of window functions.

In probability theory and statistics, a Gaussian process is a stochastic process, such that every finite collection of those random variables has a multivariate normal distribution. The distribution of a Gaussian process is the joint distribution of all those random variables, and as such, it is a distribution over functions with a continuous domain, e.g. time or space.

<span class="mw-page-title-main">Expectation–maximization algorithm</span> Iterative method for finding maximum likelihood estimates in statistical models

In statistics, an expectation–maximization (EM) algorithm is an iterative method to find (local) maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables. The EM iteration alternates between performing an expectation (E) step, which creates a function for the expectation of the log-likelihood evaluated using the current estimate for the parameters, and a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step. These parameter-estimates are then used to determine the distribution of the latent variables in the next E step. It can be used, for example, to estimate a mixture of gaussians, or to solve the multiple linear regression problem.

In signal processing, a periodogram is an estimate of the spectral density of a signal. The term was coined by Arthur Schuster in 1898. Today, the periodogram is a component of more sophisticated methods. It is the most common tool for examining the amplitude vs frequency characteristics of FIR filters and window functions. FFT spectrum analyzers are also implemented as a time-sequence of periodograms.

In the theory of stochastic processes, the Karhunen–Loève theorem, also known as the Kosambi–Karhunen–Loève theorem states that a stochastic process can be represented as an infinite linear combination of orthogonal functions, analogous to a Fourier series representation of a function on a bounded interval. The transformation is also known as Hotelling transform and eigenvector transform, and is closely related to principal component analysis (PCA) technique widely used in image processing and in data analysis in many fields.

<span class="mw-page-title-main">Sensor array</span> Group of sensors used to increase gain or dimensionality over a single sensor

A sensor array is a group of sensors, usually deployed in a certain geometry pattern, used for collecting and processing electromagnetic or acoustic signals. The advantage of using a sensor array over using a single sensor lies in the fact that an array adds new dimensions to the observation, helping to estimate more parameters and improve the estimation performance. For example an array of radio antenna elements used for beamforming can increase antenna gain in the direction of the signal while decreasing the gain in other directions, i.e., increasing signal-to-noise ratio (SNR) by amplifying the signal coherently. Another example of sensor array application is to estimate the direction of arrival of impinging electromagnetic waves. The related processing method is called array signal processing. A third examples includes chemical sensor arrays, which utilize multiple chemical sensors for fingerprint detection in complex mixtures or sensing environments. Application examples of array signal processing include radar/sonar, wireless communications, seismology, machine condition monitoring, astronomical observations fault diagnosis, etc.

In signal processing, a matched filter is obtained by correlating a known delayed signal, or template, with an unknown signal to detect the presence of the template in the unknown signal. This is equivalent to convolving the unknown signal with a conjugated time-reversed version of the template. The matched filter is the optimal linear filter for maximizing the signal-to-noise ratio (SNR) in the presence of additive stochastic noise.

<span class="mw-page-title-main">Array processing</span>

Array processing is a wide area of research in the field of signal processing that extends from the simplest form of 1 dimensional line arrays to 2 and 3 dimensional array geometries. Array structure can be defined as a set of sensors that are spatially separated, e.g. radio antenna and seismic arrays. The sensors used for a specific problem may vary widely, for example microphones, accelerometers and telescopes. However, many similarities exist, the most fundamental of which may be an assumption of wave propagation. Wave propagation means there is a systemic relationship between the signal received on spatially separated sensors. By creating a physical model of the wave propagation, or in machine learning applications a training data set, the relationships between the signals received on spatially separated sensors can be leveraged for many applications.

In statistics, econometrics, and signal processing, an autoregressive (AR) model is a representation of a type of random process; as such, it is used to describe certain time-varying processes in nature, economics, behavior, etc. The autoregressive model specifies that the output variable depends linearly on its own previous values and on a stochastic term ; thus the model is in the form of a stochastic difference equation which should not be confused with a differential equation. Together with the moving-average (MA) model, it is a special case and key component of the more general autoregressive–moving-average (ARMA) and autoregressive integrated moving average (ARIMA) models of time series, which have a more complicated stochastic structure; it is also a special case of the vector autoregressive model (VAR), which consists of a system of more than one interlocking stochastic difference equation in more than one evolving random variable.

A cyclostationary process is a signal having statistical properties that vary cyclically with time. A cyclostationary process can be viewed as multiple interleaved stationary processes. For example, the maximum daily temperature in New York City can be modeled as a cyclostationary process: the maximum temperature on July 21 is statistically different from the temperature on December 20; however, it is a reasonable approximation that the temperature on December 20 of different years has identical statistics. Thus, we can view the random process composed of daily maximum temperatures as 365 interleaved stationary processes, each of which takes on a new value once per year.

In mathematics, Fourier–Bessel series is a particular kind of generalized Fourier series based on Bessel functions.

In statistical signal processing, the goal of spectral density estimation (SDE) or simply spectral estimation is to estimate the spectral density of a signal from a sequence of time samples of the signal. Intuitively speaking, the spectral density characterizes the frequency content of the signal. One purpose of estimating the spectral density is to detect any periodicities in the data, by observing peaks at the frequencies corresponding to these periodicities.

<span class="mw-page-title-main">Q-function</span> Statistics function

In statistics, the Q-function is the tail distribution function of the standard normal distribution. In other words, is the probability that a normal (Gaussian) random variable will obtain a value larger than standard deviations. Equivalently, is the probability that a standard normal random variable takes a value larger than .

In computer networks, self-similarity is a feature of network data transfer dynamics. When modeling network data dynamics the traditional time series models, such as an autoregressive moving average model are not appropriate. This is because these models only provide a finite number of parameters in the model and thus interaction in a finite time window, but the network data usually have a long-range dependent temporal structure. A self-similar process is one way of modeling network data dynamics with such a long range correlation. This article defines and describes network data transfer dynamics in the context of a self-similar process. Properties of the process are shown and methods are given for graphing and estimating parameters modeling the self-similarity of network data.

Multidimension spectral estimation is a generalization of spectral estimation, normally formulated for one-dimensional signals, to multidimensional signals or multivariate data, such as wave vectors.

Information field theory (IFT) is a Bayesian statistical field theory relating to signal reconstruction, cosmography, and other related areas. IFT summarizes the information available on a physical field using Bayesian probabilities. It uses computational techniques developed for quantum field theory and statistical field theory to handle the infinite number of degrees of freedom of a field and to derive algorithms for the calculation of field expectation values. For example, the posterior expectation value of a field generated by a known Gaussian process and measured by a linear device with known Gaussian noise statistics is given by a generalized Wiener filter applied to the measured data. IFT extends such known filter formula to situations with nonlinear physics, nonlinear devices, non-Gaussian field or noise statistics, dependence of the noise statistics on the field values, and partly unknown parameters of measurement. For this it uses Feynman diagrams, renormalisation flow equations, and other methods from mathematical physics.

Laplace's approximation provides an analytical expression for a posterior probability distribution by fitting a Gaussian distribution with a mean equal to the MAP solution and precision equal to the observed Fisher information. The approximation is justified by the Bernstein–von Mises theorem, which states that under regularity conditions the posterior converges to a Gaussian in large samples.

References

  1. Whittle, P. (1951). Hypothesis testing in times series analysis. Uppsala: Almqvist & Wiksells Boktryckeri AB.
  2. Hurvich, C. (2002). "Whittle's approximation to the likelihood function" (PDF). NYU Stern.
  3. 1 2 Calder, M.; Davis, R. A. (1997), "An introduction to Whittle (1953) "The analysis of multiple stationary time series"", in Kotz, S.; Johnson, N. L. (eds.), Breakthroughs in Statistics, Springer Series in Statistics, New York: Springer-Verlag, pp. 141–169, doi:10.1007/978-1-4612-0667-5_7, ISBN   978-0-387-94989-5
    See also: Calder, M.; Davis, R. A. (1996), "An introduction to Whittle (1953) "The analysis of multiple stationary time series"", Technical report 1996/41, Department of Statistics, Colorado State University
  4. 1 2 3 Hannan, E. J. (1994), "The Whittle likelihood and frequency estimation", in Kelly, F. P. (ed.), Probability, statistics and optimization; a tribute to Peter Whittle, Chichester: Wiley
  5. Pawitan, Y. (1998), "Whittle likelihood", in Kotz, S.; Read, C. B.; Banks, D. L. (eds.), Encyclopedia of Statistical Sciences, vol. Update Volume 2, New York: Wiley & Sons, pp. 708–710, doi:10.1002/0471667196.ess0753, ISBN   978-0471667193
  6. 1 2 Röver, C.; Meyer, R.; Christensen, N. (2011). "Modelling coloured residual noise in gravitational-wave signal processing". Classical and Quantum Gravity. 28 (1): 025010. arXiv: 0804.3853 . Bibcode:2011CQGra..28a5010R. doi:10.1088/0264-9381/28/1/015010. S2CID   46673503.
  7. Choudhuri, N.; Ghosal, S.; Roy, A. (2004). "Contiguity of the Whittle measure for a Gaussian time series". Biometrika. 91 (4): 211–218. doi: 10.1093/biomet/91.1.211 .
  8. Countreras-Cristán, A.; Gutiérrez-Peña, E.; Walker, S. G. (2006). "A Note on Whittle's Likelihood". Communications in Statistics – Simulation and Computation. 35 (4): 857–875. doi:10.1080/03610910600880203. S2CID   119395974.
  9. Finn, L. S. (1992). "Detection, measurement and gravitational radiation". Physical Review D. 46 (12): 5236–5249. arXiv: gr-qc/9209010 . Bibcode:1992PhRvD..46.5236F. doi:10.1103/PhysRevD.46.5236. PMID   10014913. S2CID   19004097.
  10. Turin, G. L. (1960). "An introduction to matched filters". IRE Transactions on Information Theory. 6 (3): 311–329. doi:10.1109/TIT.1960.1057571. S2CID   5128742.
  11. Wainstein, L. A.; Zubakov, V. D. (1962). Extraction of signals from noise. Englewood Cliffs, NJ: Prentice-Hall.
  12. 1 2 Röver, C. (2011). "Student-t-based filter for robust signal detection". Physical Review D. 84 (12): 122004. arXiv: 1109.0442 . Bibcode:2011PhRvD..84l2004R. doi:10.1103/PhysRevD.84.122004.
  13. Choudhuri, N.; Ghosal, S.; Roy, A. (2004). "Bayesian estimation of the spectral density of a time series" (PDF). Journal of the American Statistical Association. 99 (468): 1050–1059. CiteSeerX   10.1.1.212.2814 . doi:10.1198/016214504000000557. S2CID   17906077.
  14. Edwards, M. C.; Meyer, R.; Christensen, N. (2015). "Bayesian semiparametric power spectral density estimation in gravitational wave data analysis". Physical Review D. 92 (6): 064011. arXiv: 1506.00185 . Bibcode:2015PhRvD..92f4011E. doi:10.1103/PhysRevD.92.064011. S2CID   11508218.