Approximate entropy

Last updated June 08, 2023

In statistics, an approximate entropy (ApEn) is a technique used to quantify the amount of regularity and the unpredictability of fluctuations over time-series data.^[1] For example, consider two series of data:

Moment statistics, such as mean and variance, will not distinguish between these two series. Nor will rank order statistics distinguish between these series. Yet series A is perfectly regular: knowing a term has the value of 1 enables one to predict with certainty that the next term will have the value of 0. In contrast, series B is randomly valued: knowing a term has the value of 1 gives no insight into what value the next term will have.

Regularity was originally measured by exact regularity statistics, which has mainly centered on various entropy measures.^[1] However, accurate entropy calculation requires vast amounts of data, and the results will be greatly influenced by system noise,^[2] therefore it is not practical to apply these methods to experimental data. ApEn was developed by Steve M. Pincus to handle these limitations by modifying an exact regularity statistic, Kolmogorov–Sinai entropy. ApEn was initially developed to analyze medical data, such as heart rate,^[1] and later spread its applications in finance,^[3] physiology,^[4] human factors engineering,^[5] and climate sciences.^[6]

Algorithm

A comprehensive step-by-step tutorial with an explanation of the theoretical foundations of Approximate Entropy is available.^[7] The algorithm is:

Step 1

Assume a time series of data

u(1),u(2),\ldots ,u(N)

. These are

N

raw data values from measurements equally spaced in time.

Step 2

Let

m\in \mathbb {Z} ^{+}

be a positive integer, with

m\leq N

, which represents the length of a run of data (essentially a window).
Let

r\in \mathbb {R} ^{+}

be a positive real number, which specifies a filtering level.
Let

n=N-m+1

.

Step 3

Define

\mathbf {x} (i)={\big [}u(i),u(i+1),\ldots ,u(i+m-1){\big ]}

for each

i

where

1\leq i\leq n

. In other words,

\mathbf {x} (i)

is an

m

-dimensional vector that contains the run of data starting with

u(i)

.
Define the distance between two vectors

\mathbf {x} (i)

and

\mathbf {x} (j)

as the maximum of the distances between their respective components, given by

{\begin{aligned}d[\mathbf {x} (i),\mathbf {x} (j)]&=\max _{k}{\big (}|\mathbf {x} (i)_{k}-\mathbf {x} (j)_{k}|{\big )}\\&=\max _{k}{\big (}|u(i+k-1)-u(j+k-1)|{\big )}\\\end{aligned}}

for

1\leq k\leq m

.

Step 4

Define a count

C_{i}^{m}

as

C_{i}^{m}(r)={({\text{number of }}j{\text{ such that }}d[\mathbf {x} (i),\mathbf {x} (j)]\leq r) \over n}

for each

i

where

1\leq i,j\leq n

. Note that since

j

takes on all values between 1 and

n

, the match will be counted when

j=i

(i.e. when the test subsequence,

\mathbf {x} (j)

, is matched against itself,

\mathbf {x} (i)

).

Step 5

Define

\phi ^{m}(r)={1 \over n}\sum _{i=1}^{n}\log(C_{i}^{m}(r))

where

\log

is the natural logarithm, and for a fixed

m

,

r

, and

n

as set in Step 2.

Step 6

Define approximate entropy (

\mathrm {ApEn}

) as

\mathrm {ApEn} (m,r,N)(u)=\phi ^{m}(r)-\phi ^{m+1}(r)

Parameter selection

Typically, choose

m=2

or

m=3

, whereas

r

depends greatly on the application.

An implementation on Physionet,^[8] which is based on Pincus,^[2] use $d[\mathbf {x} (i),\mathbf {x} (j)]<r$ instead of $d[\mathbf {x} (i),\mathbf {x} (j)]\leq r$ in Step 4. While a concern for artificially constructed examples, it is usually not a concern in practice.

Example

Illustration of the Heart Rate Sequence Heartrate.jpg — Illustration of the Heart Rate Sequence

Consider a sequence of $N=51$ samples of heart rate equally spaced in time:

\ S_{N}=\{85,80,89,85,80,89,\ldots \}

Note the sequence is periodic with a period of 3. Let's choose $m=2$ and $r=3$ (the values of $m$ and $r$ can be varied without affecting the result).

Form a sequence of vectors:

{\begin{aligned}\mathbf {x} (1)&=[u(1)\ u(2)]=[85\ 80]\\\mathbf {x} (2)&=[u(2)\ u(3)]=[80\ 89]\\\mathbf {x} (3)&=[u(3)\ u(4)]=[89\ 85]\\\mathbf {x} (4)&=[u(4)\ u(5)]=[85\ 80]\\&\ \ \vdots \end{aligned}}

Distance is calculated repeatedly as follows. In the first calculation,

\ d[\mathbf {x} (1),\mathbf {x} (1)]=\max _{k}|\mathbf {x} (1)_{k}-\mathbf {x} (1)_{k}|=0

which is less than

r

.

In the second calculation, note that $|u(2)-u(3)|>|u(1)-u(2)|$ , so

\ d[\mathbf {x} (1),\mathbf {x} (2)]=\max _{k}|\mathbf {x} (1)_{k}-\mathbf {x} (2)_{k}|=|u(2)-u(3)|=9

which is greater than

r

.

Similarly,

{\begin{aligned}d[\mathbf {x} (1)&,\mathbf {x} (3)]=|u(2)-u(4)|=5>r\\d[\mathbf {x} (1)&,\mathbf {x} (4)]=|u(1)-u(4)|=|u(2)-u(5)|=0<r\\&\vdots \\d[\mathbf {x} (1)&,\mathbf {x} (j)]=\cdots \\&\vdots \\\end{aligned}}

The result is a total of 17 terms $\mathbf {x} (j)$ such that $d[\mathbf {x} (1),\mathbf {x} (j)]\leq r$ . These include $\mathbf {x} (1),\mathbf {x} (4),\mathbf {x} (7),\ldots ,\mathbf {x} (49)$ . In these cases, $C_{i}^{m}(r)$ is

\ C_{1}^{2}(3)={\frac {17}{50}}

\ C_{2}^{2}(3)={\frac {17}{50}}

\ C_{3}^{2}(3)={\frac {16}{50}}

\ C_{4}^{2}(3)={\frac {17}{50}}\ \cdots

Note in Step 4, $1\leq i\leq n$ for $\mathbf {x} (i)$ . So the terms $\mathbf {x} (j)$ such that $d[\mathbf {x} (3),\mathbf {x} (j)]\leq r$ include $\mathbf {x} (3),\mathbf {x} (6),\mathbf {x} (9),\ldots ,\mathbf {x} (48)$ , and the total number is 16.

At the end of these calculations, we have

\phi ^{2}(3)={1 \over 50}\sum _{i=1}^{50}\log(C_{i}^{2}(3))\approx -1.0982

Then we repeat the above steps for $m=3$ . First form a sequence of vectors:

{\begin{aligned}\mathbf {x} (1)&=[u(1)\ u(2)\ u(3)]=[85\ 80\ 89]\\\mathbf {x} (2)&=[u(2)\ u(3)\ u(4)]=[80\ 89\ 85]\\\mathbf {x} (3)&=[u(3)\ u(4)\ u(5)]=[89\ 85\ 80]\\\mathbf {x} (4)&=[u(4)\ u(5)\ u(6)]=[85\ 80\ 89]\\&\ \ \vdots \end{aligned}}

By calculating distances between vector $\mathbf {x} (i),\mathbf {x} (j),1\leq i\leq 49$ , we find the vectors satisfying the filtering level have the following characteristic:

d[\mathbf {x} (i),\mathbf {x} (i+3)]=0<r

Therefore,

\ C_{1}^{3}(3)={\frac {17}{49}}

\ C_{2}^{3}(3)={\frac {16}{49}}

\ C_{3}^{3}(3)={\frac {16}{49}}

\ C_{4}^{3}(3)={\frac {17}{49}}\ \cdots

At the end of these calculations, we have

\phi ^{3}(3)={1 \over 49}\sum _{i=1}^{49}\log(C_{i}^{3}(3))\approx -1.0982

Finally,

\mathrm {ApEn} =\phi ^{2}(3)-\phi ^{3}(3)\approx 0.000010997

The value is very small, so it implies the sequence is regular and predictable, which is consistent with the observation.

Python implementation

importnumpyasnpdefApEn(U,m,r)->float:"""Approximate_entropy."""def_maxdist(x_i,x_j):returnmax([abs(ua-va)forua,vainzip(x_i,x_j)])def_phi(m):x=[[U[j]forjinrange(i,i+m-1+1)]foriinrange(N-m+1)]C=[len([1forx_jinxif_maxdist(x_i,x_j)<=r])/(N-m+1.0)forx_iinx]return(N-m+1.0)**(-1)*sum(np.log(C))N=len(U)returnabs(_phi(m+1)-_phi(m))

Usage example:

>>> U=np.array([85,80,89]*17)>>> print(ApEn(U,2,3))1.0996541105257052e-05>>> randU=np.random.choice([85,80,89],size=17*3)>>> print(ApEn(randU,2,3))0.8626664154888908

MATLAB implementation

Fast Approximate Entropy from MatLab Central
approximateEntropy

Interpretation

The presence of repetitive patterns of fluctuation in a time series renders it more predictable than a time series in which such patterns are absent. ApEn reflects the likelihood that similar patterns of observations will not be followed by additional similar observations.^[9] A time series containing many repetitive patterns has a relatively small ApEn; a less predictable process has a higher ApEn.

Advantages

The advantages of ApEn include:^[2]

Lower computational demand. ApEn can be designed to work for small data samples ( $N<50$ points) and can be applied in real time.
Less effect from noise. If data is noisy, the ApEn measure can be compared to the noise level in the data to determine what quality of true information may be present in the data.

Limitations

The ApEn algorithm counts each sequence as matching itself to avoid the occurrence of $\log(0)$ in the calculations. This step might introduce bias in ApEn, which causes ApEn to have two poor properties in practice:^[10]

ApEn is heavily dependent on the record length and is uniformly lower than expected for short records.
It lacks relative consistency. That is, if ApEn of one data set is higher than that of another, it should, but does not, remain higher for all conditions tested.

Applications

ApEn has been applied to classify electroencephalography (EEG) in psychiatric diseases, such as schizophrenia,^[11] epilepsy,^[12] and addiction.^[13]

Related Research Articles

The Cauchy–Schwarz inequality is considered one of the most important and widely used inequalities in mathematics.

In vector calculus, the divergence theorem, also known as Gauss's theorem or Ostrogradsky's theorem, is a theorem which relates the flux of a vector field through a closed surface to the divergence of the field in the volume enclosed.

In mechanics and geometry, the 3D rotation group, often denoted SO(3), is the group of all rotations about the origin of three-dimensional Euclidean space $under the operation of composition.$

In linear algebra, the adjugate or classical adjoint of a square matrix $A$ is the transpose of its cofactor matrix and is denoted by $adj(A)$ . It is also occasionally known as adjunct matrix, or "adjoint", though the latter term today normally refers to a different concept, the adjoint operator which for a matrix is the conjugate transpose.

In vector calculus, Green's theorem relates a line integral around a simple closed curve $C$ to a double integral over the plane region $D$ bounded by $C$ . It is the two-dimensional special case of Stokes' theorem.

Vapnik–Chervonenkis theory was developed during 1960–1990 by Vladimir Vapnik and Alexey Chervonenkis. The theory is a form of computational learning theory, which attempts to explain the learning process from a statistical point of view.

Stellar dynamics is the branch of astrophysics which describes in a statistical way the collective motions of stars subject to their mutual gravity. The essential difference from celestial mechanics is that the number of body

In the theory of stochastic processes, the Karhunen–Loève theorem, also known as the Kosambi–Karhunen–Loève theorem states that a stochastic process can be represented as an infinite linear combination of orthogonal functions, analogous to a Fourier series representation of a function on a bounded interval. The transformation is also known as Hotelling transform and eigenvector transform, and is closely related to principal component analysis (PCA) technique widely used in image processing and in data analysis in many fields.

In physics, precisely in the study of the theory of general relativity and many alternatives to it, the post-Newtonian formalism is a calculational tool that expresses Einstein's (nonlinear) equations of gravity in terms of the lowest-order deviations from Newton's law of universal gravitation. This allows approximations to Einstein's equations to be made in the case of weak fields. Higher-order terms can be added to increase accuracy, but for strong fields, it may be preferable to solve the complete equations numerically. Some of these post-Newtonian approximations are expansions in a small parameter, which is the ratio of the velocity of the matter forming the gravitational field to the speed of light, which in this case is better called the speed of gravity. In the limit, when the fundamental speed of gravity becomes infinite, the post-Newtonian expansion reduces to Newton's law of gravity.

A parametric surface is a surface in the Euclidean space $which is defined by a parametric equation with two parameters :\mathbb {R} ^{2}\to \mathbb {R} ^{3}} . Parametric representation is a very general way to specify a surface, as well as implicit representation. Surfaces that occur in two of the main theorems of vector calculus, Stokes' theorem and the divergence theorem, are frequently given in a parametric form. The curvature and arc length of curves on the surface, surface area, differential geometric invariants such as the first and second fundamental forms, Gaussian, mean, and principal curvatures can all be computed from a given parametrization.$

Differential entropy is a concept in information theory that began as an attempt by Claude Shannon to extend the idea of (Shannon) entropy, a measure of average (surprisal) of a random variable, to continuous probability distributions. Unfortunately, Shannon did not derive this formula, and rather just assumed it was the correct continuous analogue of discrete entropy, but it is not. The actual continuous version of discrete entropy is the limiting density of discrete points (LDDP). Differential entropy is commonly encountered in the literature, but it is a limiting case of the LDDP, and one that loses its fundamental association with discrete entropy.

In numerical linear algebra, the method of successive over-relaxation (SOR) is a variant of the Gauss–Seidel method for solving a linear system of equations, resulting in faster convergence. A similar method can be used for any slowly converging iterative process.

Phase retrieval is the process of algorithmically finding solutions to the phase problem. Given a complex signal $, of amplitude, and phase :$

Entanglement distillation is the transformation of N copies of an arbitrary entangled state $into some number of approximately pure Bell pairs, using only local operations and classical communication.$

In mathematics, a submodular set function is a set function whose value, informally, has the property that the difference in the incremental value of the function that a single element makes when added to an input set decreases as the size of the input set increases. Submodular functions have a natural diminishing returns property which makes them suitable for many applications, including approximation algorithms, game theory and electrical networks. Recently, submodular functions have also found immense utility in several real world problems in machine learning and artificial intelligence, including automatic summarization, multi-document summarization, feature selection, active learning, sensor placement, image collection summarization and many other domains.

In quantum information theory, the Wehrl entropy, named after Alfred Wehrl, is a classical entropy of a quantum-mechanical density matrix. It is a type of quasi-entropy defined for the Husimi Q representation of the phase-space quasiprobability distribution. See for a comprehensive review of basic properties of classical, quantum and Wehrl entropies, and their implications in statistical mechanics.

In mathematics, there are many kinds of inequalities involving matrices and linear operators on Hilbert spaces. This article covers some important operator inequalities connected with traces of matrices.

In stochastic analysis, a rough path is a generalization of the notion of smooth path allowing to construct a robust solution theory for controlled differential equations driven by classically irregular signals, for example a Wiener process. The theory was developed in the 1990s by Terry Lyons. Several accounts of the theory are available.

Ordinal data is a categorical, statistical data type where the variables have natural, ordered categories and the distances between the categories are not known. These data exist on an ordinal scale, one of four levels of measurement described by S. S. Stevens in 1946. The ordinal scale is distinguished from the nominal scale by having a ranking. It also differs from the interval scale and ratio scale by not having category widths that represent equal increments of the underlying attribute.

In numerical analysis, the local linearization (LL) method is a general strategy for designing numerical integrators for differential equations based on a local (piecewise) linearization of the given equation on consecutive time intervals. The numerical integrators are then iteratively defined as the solution of the resulting piecewise linear equation at the end of each consecutive interval. The LL method has been developed for a variety of equations such as the ordinary, delayed, random and stochastic differential equations. The LL integrators are key component in the implementation of inference methods for the estimation of unknown parameters and unobserved variables of differential equations given time series of observations. The LL schemes are ideals to deal with complex models in a variety of fields as neuroscience, finance, forestry management, control engineering, mathematical statistics, etc.

References

1 2 3 Pincus, S. M.; Gladstone, I. M.; Ehrenkranz, R. A. (1991). "A regularity statistic for medical data analysis". Journal of Clinical Monitoring and Computing . 7 (4): 335–345. doi:10.1007/BF01619355. PMID 1744678. S2CID 23455856.
1 2 3 Pincus, S. M. (1991). "Approximate entropy as a measure of system complexity". Proceedings of the National Academy of Sciences . 88 (6): 2297–2301. Bibcode:1991PNAS...88.2297P. doi: 10.1073/pnas.88.6.2297 . PMC 51218 . PMID 11607165.
↑ Pincus, S.M.; Kalman, E.K. (2004). "Irregularity, volatility, risk, and financial market time series". Proceedings of the National Academy of Sciences . 101 (38): 13709–13714. Bibcode:2004PNAS..10113709P. doi: 10.1073/pnas.0405168101 . PMC 518821 . PMID 15358860.
↑ Pincus, S.M.; Goldberger, A.L. (1994). "Physiological time-series analysis: what does regularity quantify?". The American Journal of Physiology . 266 (4): 1643–1656. doi:10.1152/ajpheart.1994.266.4.H1643. PMID 8184944. S2CID 362684.
↑ McKinley, R.A.; McIntire, L.K.; Schmidt, R; Repperger, D.W.; Caldwell, J.A. (2011). "Evaluation of Eye Metrics as a Detector of Fatigue". Human Factors . 53 (4): 403–414. doi:10.1177/0018720811411297. PMID 21901937. S2CID 109251681.
↑ Delgado-Bonal, Alfonso; Marshak, Alexander; Yang, Yuekui; Holdaway, Daniel (2020-01-22). "Analyzing changes in the complexity of climate in the last four decades using MERRA-2 radiation data". Scientific Reports. 10 (1): 922. Bibcode:2020NatSR..10..922D. doi: 10.1038/s41598-020-57917-8 . ISSN 2045-2322. PMC 6976651 . PMID 31969616.
↑ Delgado-Bonal, Alfonso; Marshak, Alexander (June 2019). "Approximate Entropy and Sample Entropy: A Comprehensive Tutorial". Entropy. 21 (6): 541. Bibcode:2019Entrp..21..541D. doi: 10.3390/e21060541 . PMC 7515030 . PMID 33267255.
↑ PhysioNet ^{[ dead link ]}
↑ Ho, K. K.; Moody, G. B.; Peng, C.K.; Mietus, J. E.; Larson, M. G.; levy, D; Goldberger, A. L. (1997). "Predicting survival in heart failure case and control subjects by use of fully automated methods for deriving nonlinear and conventional indices of heart rate dynamics". Circulation . 96 (3): 842–848. doi:10.1161/01.cir.96.3.842. PMID 9264491.
↑ Richman, J.S.; Moorman, J.R. (2000). "Physiological time-series analysis using approximate entropy and sample entropy". American Journal of Physiology. Heart and Circulatory Physiology . 278 (6): 2039–2049. doi:10.1152/ajpheart.2000.278.6.H2039. PMID 10843903. S2CID 2389971.
↑ Sabeti, Malihe (2009). "Entropy and complexity measures for EEG signal classification of schizophrenic and control participants". Artificial Intelligence in Medicine . 47 (3): 263–274. doi:10.1016/j.artmed.2009.03.003. PMID 19403281.
↑ Yuan, Qi (2011). "Epileptic EEG classification based on extreme learning machine and nonlinear features". Epilepsy Research . 96 (1–2): 29–38. doi:10.1016/j.eplepsyres.2011.04.013. PMID 21616643. S2CID 41730913.
↑ Yun, Kyongsik (2012). "Decreased cortical complexity in methamphetamine abusers". Psychiatry Research: Neuroimaging . 201 (3): 226–32. doi:10.1016/j.pscychresns.2011.07.009. PMID 22445216. S2CID 30670300.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Pincus1991-1] 1 2 3 Pincus, S. M.; Gladstone, I. M.; Ehrenkranz, R. A. (1991). "A regularity statistic for medical data analysis". Journal of Clinical Monitoring and Computing . 7 (4): 335–345. doi:10.1007/BF01619355. PMID 1744678. S2CID 23455856.

[Pincus21991-2] 1 2 3 Pincus, S. M. (1991). "Approximate entropy as a measure of system complexity". Proceedings of the National Academy of Sciences . 88 (6): 2297–2301. Bibcode:1991PNAS...88.2297P. doi: 10.1073/pnas.88.6.2297 . PMC 51218 . PMID 11607165.

[Pincus2004-3] Pincus, S.M.; Kalman, E.K. (2004). "Irregularity, volatility, risk, and financial market time series". Proceedings of the National Academy of Sciences . 101 (38): 13709–13714. Bibcode:2004PNAS..10113709P. doi: 10.1073/pnas.0405168101 . PMC 518821 . PMID 15358860.

[Pincus1994-4] Pincus, S.M.; Goldberger, A.L. (1994). "Physiological time-series analysis: what does regularity quantify?". The American Journal of Physiology . 266 (4): 1643–1656. doi:10.1152/ajpheart.1994.266.4.H1643. PMID 8184944. S2CID 362684.

[humanfactor-5] McKinley, R.A.; McIntire, L.K.; Schmidt, R; Repperger, D.W.; Caldwell, J.A. (2011). "Evaluation of Eye Metrics as a Detector of Fatigue". Human Factors . 53 (4): 403–414. doi:10.1177/0018720811411297. PMID 21901937. S2CID 109251681.

[6] Delgado-Bonal, Alfonso; Marshak, Alexander; Yang, Yuekui; Holdaway, Daniel (2020-01-22). "Analyzing changes in the complexity of climate in the last four decades using MERRA-2 radiation data". Scientific Reports. 10 (1): 922. Bibcode:2020NatSR..10..922D. doi: 10.1038/s41598-020-57917-8 . ISSN 2045-2322. PMC 6976651 . PMID 31969616.

[7] Delgado-Bonal, Alfonso; Marshak, Alexander (June 2019). "Approximate Entropy and Sample Entropy: A Comprehensive Tutorial". Entropy. 21 (6): 541. Bibcode:2019Entrp..21..541D. doi: 10.3390/e21060541 . PMC 7515030 . PMID 33267255.

[8] PhysioNet ^{[ dead link ]}

[9] Ho, K. K.; Moody, G. B.; Peng, C.K.; Mietus, J. E.; Larson, M. G.; levy, D; Goldberger, A. L. (1997). "Predicting survival in heart failure case and control subjects by use of fully automated methods for deriving nonlinear and conventional indices of heart rate dynamics". Circulation . 96 (3): 842–848. doi:10.1161/01.cir.96.3.842. PMID 9264491.

[10] Richman, J.S.; Moorman, J.R. (2000). "Physiological time-series analysis using approximate entropy and sample entropy". American Journal of Physiology. Heart and Circulatory Physiology . 278 (6): 2039–2049. doi:10.1152/ajpheart.2000.278.6.H2039. PMID 10843903. S2CID 2389971.

[Sabeti2009-11] Sabeti, Malihe (2009). "Entropy and complexity measures for EEG signal classification of schizophrenic and control participants". Artificial Intelligence in Medicine . 47 (3): 263–274. doi:10.1016/j.artmed.2009.03.003. PMID 19403281.

[Yuan2011-12] Yuan, Qi (2011). "Epileptic EEG classification based on extreme learning machine and nonlinear features". Epilepsy Research . 96 (1–2): 29–38. doi:10.1016/j.eplepsyres.2011.04.013. PMID 21616643. S2CID 41730913.

[Yun2012-13] Yun, Kyongsik (2012). "Decreased cortical complexity in methamphetamine abusers". Psychiatry Research: Neuroimaging . 201 (3): 226–32. doi:10.1016/j.pscychresns.2011.07.009. PMID 22445216. S2CID 30670300.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

Approximate entropy

Contents

Algorithm

Example

Python implementation

MATLAB implementation

Interpretation

Advantages

Limitations

Applications

See also

Related Research Articles

References