Approximate entropy

Last updated

In statistics, an approximate entropy (ApEn) is a technique used to quantify the amount of regularity and the unpredictability of fluctuations over time-series data. [1] For example, consider two series of data:

Contents

Series A: (0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...), which alternates 0 and 1.
Series B: (0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, ...), which has either a value of 0 or 1, chosen randomly, each with probability 1/2.

Moment statistics, such as mean and variance, will not distinguish between these two series. Nor will rank order statistics distinguish between these series. Yet series A is perfectly regular: knowing a term has the value of 1 enables one to predict with certainty that the next term will have the value of 0. In contrast, series B is randomly valued: knowing a term has the value of 1 gives no insight into what value the next term will have.

Regularity was originally measured by exact regularity statistics, which has mainly centered on various entropy measures. [1] However, accurate entropy calculation requires vast amounts of data, and the results will be greatly influenced by system noise, [2] therefore it is not practical to apply these methods to experimental data. ApEn was first proposed (under a different name) by A. Cohen and I. Procaccia, [3] as an approximate algorithm to compute an exact regularity statistic, Kolmogorov–Sinai entropy, and later popularized by Steve M. Pincus. ApEn was initially used to analyze chaotic dynamics and medical data, such as heart rate, [1] and later spread its applications in finance, [4] physiology, [5] human factors engineering, [6] and climate sciences. [7]

Algorithm

A comprehensive step-by-step tutorial with an explanation of the theoretical foundations of Approximate Entropy is available. [8] The algorithm is:

Step 1
Assume a time series of data . These are raw data values from measurements equally spaced in time.
Step 2
Let be a positive integer, with , which represents the length of a run of data (essentially a window).
Let be a positive real number, which specifies a filtering level.
Let .
Step 3
Define for each where . In other words, is an -dimensional vector that contains the run of data starting with .
Define the distance between two vectors and as the maximum of the distances between their respective components, given by
for .
Step 4
Define a count as
for each where . Note that since takes on all values between 1 and , the match will be counted when (i.e. when the test subsequence, , is matched against itself, ).
Step 5
Define
where is the natural logarithm, and for a fixed , , and as set in Step 2.
Step 6
Define approximate entropy () as
Parameter selection
Typically, choose or , whereas depends greatly on the application.

An implementation on Physionet, [9] which is based on Pincus, [2] use instead of in Step 4. While a concern for artificially constructed examples, it is usually not a concern in practice.

Example

Illustration of the Heart Rate Sequence Heartrate.jpg
Illustration of the Heart Rate Sequence

Consider a sequence of samples of heart rate equally spaced in time:

Note the sequence is periodic with a period of 3. Let's choose and (the values of and can be varied without affecting the result).

Form a sequence of vectors:

Distance is calculated repeatedly as follows. In the first calculation,

which is less than .

In the second calculation, note that , so

which is greater than .

Similarly,

The result is a total of 17 terms such that . These include . In these cases, is

Note in Step 4, for . So the terms such that include , and the total number is 16.

At the end of these calculations, we have

Then we repeat the above steps for . First form a sequence of vectors:

By calculating distances between vector , we find the vectors satisfying the filtering level have the following characteristic:

Therefore,

At the end of these calculations, we have

Finally,

The value is very small, so it implies the sequence is regular and predictable, which is consistent with the observation.

Python implementation

importmathdefapprox_entropy(time_series,run_length,filter_level)->float:"""    Approximate entropy    >>> import random    >>> regularly = [85, 80, 89] * 17    >>> print(f"{approx_entropy(regularly, 2, 3):e}")    1.099654e-05    >>> randomly = [random.choice([85, 80, 89]) for _ in range(17*3)]    >>> 0.8 < approx_entropy(randomly, 2, 3) < 1    True    """def_maxdist(x_i,x_j):returnmax(abs(ua-va)forua,vainzip(x_i,x_j))def_phi(m):n=time_series_length-m+1x=[[time_series[j]forjinrange(i,i+m-1+1)]foriinrange(time_series_length-m+1)]counts=[sum(1forx_jinxif_maxdist(x_i,x_j)<=filter_level)/nforx_iinx]returnsum(math.log(c)forcincounts)/ntime_series_length=len(time_series)returnabs(_phi(run_length+1)-_phi(run_length))if__name__=="__main__":importdoctestdoctest.testmod()

MATLAB implementation

Interpretation

The presence of repetitive patterns of fluctuation in a time series renders it more predictable than a time series in which such patterns are absent. ApEn reflects the likelihood that similar patterns of observations will not be followed by additional similar observations. [10] A time series containing many repetitive patterns has a relatively small ApEn; a less predictable process has a higher ApEn.

Advantages

The advantages of ApEn include: [2]

Limitations

The ApEn algorithm counts each sequence as matching itself to avoid the occurrence of in the calculations. This step might introduce bias in ApEn, which causes ApEn to have two poor properties in practice: [11]

  1. ApEn is heavily dependent on the record length and is uniformly lower than expected for short records.
  2. It lacks relative consistency. That is, if ApEn of one data set is higher than that of another, it should, but does not, remain higher for all conditions tested.

Applications

ApEn has been applied to classify electroencephalography (EEG) in psychiatric diseases, such as schizophrenia, [12] epilepsy, [13] and addiction. [14]

See also

Related Research Articles

<span class="mw-page-title-main">Discrete Fourier transform</span> Type of Fourier transform in discrete mathematics

In mathematics, the discrete Fourier transform (DFT) converts a finite sequence of equally-spaced samples of a function into a same-length sequence of equally-spaced samples of the discrete-time Fourier transform (DTFT), which is a complex-valued function of frequency. The interval at which the DTFT is sampled is the reciprocal of the duration of the input sequence. An inverse DFT (IDFT) is a Fourier series, using the DTFT samples as coefficients of complex sinusoids at the corresponding DTFT frequencies. It has the same sample-values as the original input sequence. The DFT is therefore said to be a frequency domain representation of the original input sequence. If the original sequence spans all the non-zero values of a function, its DTFT is continuous, and the DFT provides discrete samples of one cycle. If the original sequence is one cycle of a periodic function, the DFT provides all the non-zero values of one DTFT cycle.

<span class="mw-page-title-main">B-spline</span> Spline function

In the mathematical subfield of numerical analysis, a B-spline or basis spline is a spline function that has minimal support with respect to a given degree, smoothness, and domain partition. Any spline function of given degree can be expressed as a linear combination of B-splines of that degree. Cardinal B-splines have knots that are equidistant from each other. B-splines can be used for curve-fitting and numerical differentiation of experimental data.

The Cauchy–Schwarz inequality is an upper bound on the inner product between two vectors in an inner product space in terms of the product of the vector norms. It is considered one of the most important and widely used inequalities in mathematics.

Controllability is an important property of a control system and plays a crucial role in many control problems, such as stabilization of unstable systems by feedback, or optimal control.

In vector calculus, the divergence theorem, also known as Gauss's theorem or Ostrogradsky's theorem, is a theorem relating the flux of a vector field through a closed surface to the divergence of the field in the volume enclosed.

In mathematics, a linear form is a linear map from a vector space to its field of scalars.

In vector calculus, Green's theorem relates a line integral around a simple closed curve C to a double integral over the plane region D bounded by C. It is the two-dimensional special case of Stokes' theorem. In one dimension, it is equivalent to the fundamental theorem of calculus. In three dimensions, it is equivalent to the divergence theorem.

In mathematics, the spectral radius of a square matrix is the maximum of the absolute values of its eigenvalues. More generally, the spectral radius of a bounded linear operator is the supremum of the absolute values of the elements of its spectrum. The spectral radius is often denoted by ρ(·).

<span class="mw-page-title-main">Stellar dynamics</span> Branch of astrophysics

Stellar dynamics is the branch of astrophysics which describes in a statistical way the collective motions of stars subject to their mutual gravity. The essential difference from celestial mechanics is that the number of body

Convex optimization is a subfield of mathematical optimization that studies the problem of minimizing convex functions over convex sets. Many classes of convex optimization problems admit polynomial-time algorithms, whereas mathematical optimization is in general NP-hard.

In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of a specified class of probability distributions. According to the principle of maximum entropy, if nothing is known about a distribution except that it belongs to a certain class, then the distribution with the largest entropy should be chosen as the least-informative default. The motivation is twofold: first, maximizing entropy minimizes the amount of prior information built into the distribution; second, many physical systems tend to move towards maximal entropy configurations over time.

A parametric surface is a surface in the Euclidean space which is defined by a parametric equation with two parameters . Parametric representation is a very general way to specify a surface, as well as implicit representation. Surfaces that occur in two of the main theorems of vector calculus, Stokes' theorem and the divergence theorem, are frequently given in a parametric form. The curvature and arc length of curves on the surface, surface area, differential geometric invariants such as the first and second fundamental forms, Gaussian, mean, and principal curvatures can all be computed from a given parametrization.

Differential entropy is a concept in information theory that began as an attempt by Claude Shannon to extend the idea of (Shannon) entropy of a random variable, to continuous probability distributions. Unfortunately, Shannon did not derive this formula, and rather just assumed it was the correct continuous analogue of discrete entropy, but it is not. The actual continuous version of discrete entropy is the limiting density of discrete points (LDDP). Differential entropy is commonly encountered in the literature, but it is a limiting case of the LDDP, and one that loses its fundamental association with discrete entropy.

In mathematics, there is in mathematical analysis a class of Sobolev inequalities, relating norms including those of Sobolev spaces. These are used to prove the Sobolev embedding theorem, giving inclusions between certain Sobolev spaces, and the Rellich–Kondrachov theorem showing that under slightly stronger conditions some Sobolev spaces are compactly embedded in others. They are named after Sergei Lvovich Sobolev.

In mathematics, the Vitali covering lemma is a combinatorial and geometric result commonly used in measure theory of Euclidean spaces. This lemma is an intermediate step, of independent interest, in the proof of the Vitali covering theorem. The covering theorem is credited to the Italian mathematician Giuseppe Vitali. The theorem states that it is possible to cover, up to a Lebesgue-negligible set, a given subset E of Rd by a disjoint family extracted from a Vitali covering of E.

Phase retrieval is the process of algorithmically finding solutions to the phase problem. Given a complex spectrum , of amplitude , and phase :

In multilinear algebra, the tensor rank decomposition or rank-R decomposition is the decomposition of a tensor as a sum of R rank-1 tensors, where R is minimal. Computing this decomposition is an open problem.

Distributed source coding (DSC) is an important problem in information theory and communication. DSC problems regard the compression of multiple correlated information sources that do not communicate with each other. By modeling the correlation between multiple sources at the decoder side together with channel codes, DSC is able to shift the computational complexity from encoder side to decoder side, therefore provide appropriate frameworks for applications with complexity-constrained sender, such as sensor networks and video/multimedia compression. One of the main properties of distributed source coding is that the computational burden in encoders is shifted to the joint decoder.

In mathematics, there are many kinds of inequalities involving matrices and linear operators on Hilbert spaces. This article covers some important operator inequalities connected with traces of matrices.

In stochastic analysis, a rough path is a generalization of the notion of smooth path allowing to construct a robust solution theory for controlled differential equations driven by classically irregular signals, for example a Wiener process. The theory was developed in the 1990s by Terry Lyons. Several accounts of the theory are available.

References

  1. 1 2 3 Pincus, S. M.; Gladstone, I. M.; Ehrenkranz, R. A. (1991). "A regularity statistic for medical data analysis". Journal of Clinical Monitoring and Computing . 7 (4): 335–345. doi:10.1007/BF01619355. PMID   1744678. S2CID   23455856.
  2. 1 2 3 Pincus, S. M. (1991). "Approximate entropy as a measure of system complexity". Proceedings of the National Academy of Sciences . 88 (6): 2297–2301. Bibcode:1991PNAS...88.2297P. doi: 10.1073/pnas.88.6.2297 . PMC   51218 . PMID   11607165.
  3. Cohen, A.; Procaccia, I. (1985). "Computing the Kolmogorov entropy from time signals of dissipative and conservative dynamical systems". Physical Review A . 28 (3): 2591(R). Bibcode:1985PhRvA..31.1872C. doi:10.1103/PhysRevA.31.1872.
  4. Pincus, S.M.; Kalman, E.K. (2004). "Irregularity, volatility, risk, and financial market time series". Proceedings of the National Academy of Sciences . 101 (38): 13709–13714. Bibcode:2004PNAS..10113709P. doi: 10.1073/pnas.0405168101 . PMC   518821 . PMID   15358860.
  5. Pincus, S.M.; Goldberger, A.L. (1994). "Physiological time-series analysis: what does regularity quantify?". The American Journal of Physiology . 266 (4): 1643–1656. doi:10.1152/ajpheart.1994.266.4.H1643. PMID   8184944. S2CID   362684.
  6. McKinley, R.A.; McIntire, L.K.; Schmidt, R; Repperger, D.W.; Caldwell, J.A. (2011). "Evaluation of Eye Metrics as a Detector of Fatigue". Human Factors . 53 (4): 403–414. doi:10.1177/0018720811411297. PMID   21901937. S2CID   109251681.
  7. Delgado-Bonal, Alfonso; Marshak, Alexander; Yang, Yuekui; Holdaway, Daniel (2020-01-22). "Analyzing changes in the complexity of climate in the last four decades using MERRA-2 radiation data". Scientific Reports. 10 (1): 922. Bibcode:2020NatSR..10..922D. doi: 10.1038/s41598-020-57917-8 . ISSN   2045-2322. PMC   6976651 . PMID   31969616.
  8. Delgado-Bonal, Alfonso; Marshak, Alexander (June 2019). "Approximate Entropy and Sample Entropy: A Comprehensive Tutorial". Entropy. 21 (6): 541. Bibcode:2019Entrp..21..541D. doi: 10.3390/e21060541 . PMC   7515030 . PMID   33267255.
  9. "PhysioNet". Archived from the original on 2012-06-18. Retrieved 2012-07-04.
  10. Ho, K. K.; Moody, G. B.; Peng, C.K.; Mietus, J. E.; Larson, M. G.; levy, D; Goldberger, A. L. (1997). "Predicting survival in heart failure case and control subjects by use of fully automated methods for deriving nonlinear and conventional indices of heart rate dynamics". Circulation . 96 (3): 842–848. doi:10.1161/01.cir.96.3.842. PMID   9264491.
  11. Richman, J.S.; Moorman, J.R. (2000). "Physiological time-series analysis using approximate entropy and sample entropy". American Journal of Physiology. Heart and Circulatory Physiology . 278 (6): 2039–2049. doi:10.1152/ajpheart.2000.278.6.H2039. PMID   10843903. S2CID   2389971.
  12. Sabeti, Malihe (2009). "Entropy and complexity measures for EEG signal classification of schizophrenic and control participants". Artificial Intelligence in Medicine . 47 (3): 263–274. doi:10.1016/j.artmed.2009.03.003. PMID   19403281.
  13. Yuan, Qi (2011). "Epileptic EEG classification based on extreme learning machine and nonlinear features". Epilepsy Research . 96 (1–2): 29–38. doi:10.1016/j.eplepsyres.2011.04.013. PMID   21616643. S2CID   41730913.
  14. Yun, Kyongsik (2012). "Decreased cortical complexity in methamphetamine abusers". Psychiatry Research: Neuroimaging . 201 (3): 226–32. doi:10.1016/j.pscychresns.2011.07.009. PMID   22445216. S2CID   30670300.