Sample entropy

Last updated

Sample entropy (SampEn) is a modification of approximate entropy (ApEn), used for assessing the complexity of physiological time-series signals, diagnosing diseased states. [1] SampEn has two advantages over ApEn: data length independence and a relatively trouble-free implementation. Also, there is a small computational difference: In ApEn, the comparison between the template vector (see below) and the rest of the vectors also includes comparison with itself. This guarantees that probabilities are never zero. Consequently, it is always possible to take a logarithm of probabilities. Because template comparisons with itself lower ApEn values, the signals are interpreted to be more regular than they actually are. These self-matches are not included in SampEn. However, since SampEn makes direct use of the correlation integrals, it is not a real measure of information but an approximation. The foundations and differences with ApEn, as well as a step-by-step tutorial for its application is available at. [2]

Contents

There is a multiscale version of SampEn as well, suggested by Costa and others. [3] SampEn can be used in biomedical and biomechanical research, for example to evaluate postural control. [4] [5]

Definition

Like approximate entropy (ApEn), Sample entropy (SampEn) is a measure of complexity. [1] But it does not include self-similar patterns as ApEn does. For a given embedding dimension , tolerance and number of data points , SampEn is the negative natural logarithm of the probability that if two sets of simultaneous data points of length have distance then two sets of simultaneous data points of length also have distance . And we represent it by (or by including sampling time ).

Now assume we have a time-series data set of length with a constant time interval . We define a template vector of length , such that and the distance function (i≠j) is to be the Chebyshev distance (but it could be any distance function, including Euclidean distance). We define the sample entropy to be

Where

= number of template vector pairs having

= number of template vector pairs having

It is clear from the definition that will always have a value smaller or equal to . Therefore, will be always either be zero or positive value. A smaller value of also indicates more self-similarity in data set or less noise.

Generally we take the value of to be and the value of to be . Where std stands for standard deviation which should be taken over a very large dataset. For instance, the r value of 6 ms is appropriate for sample entropy calculations of heart rate intervals, since this corresponds to for a very large population.

Multiscale SampEn

The definition mentioned above is a special case of multi scale sampEn with , where is called skipping parameter. In multiscale SampEn template vectors are defined with a certain interval between its elements, specified by the value of . And modified template vector is defined as and sampEn can be written as And we calculate and like before.

Implementation

Sample entropy can be implemented easily in many different programming languages. Below lies an example written in Python.

fromitertoolsimportcombinationsfrommathimportlogdefconstruct_templates(timeseries_data:list,m:int=2):num_windows=len(timeseries_data)-m+1return[timeseries_data[x:x+m]forxinrange(0,num_windows)]defget_matches(templates:list,r:float):returnlen(list(filter(lambdax:is_match(x[0],x[1],r),combinations(templates,2))))defis_match(template_1:list,template_2:list,r:float):returnall([abs(x-y)<rfor(x,y)inzip(template_1,template_2)])defsample_entropy(timeseries_data:list,window_size:int,r:float):B=get_matches(construct_templates(timeseries_data,window_size),r)A=get_matches(construct_templates(timeseries_data,window_size+1),r)return-log(A/B)

An equivalent example in numerical Python.

importnumpydefconstruct_templates(timeseries_data,m):num_windows=len(timeseries_data)-m+1returnnumpy.array([timeseries_data[x:x+m]forxinrange(0,num_windows)])defget_matches(templates,r):returnlen(list(filter(lambdax:is_match(x[0],x[1],r),combinations(templates))))defcombinations(x):idx=numpy.stack(numpy.triu_indices(len(x),k=1),axis=-1)returnx[idx]defis_match(template_1,template_2,r):returnnumpy.all([abs(x-y)<rfor(x,y)inzip(template_1,template_2)])defsample_entropy(timeseries_data,window_size,r):B=get_matches(construct_templates(timeseries_data,window_size),r)A=get_matches(construct_templates(timeseries_data,window_size+1),r)return-numpy.log(A/B)

An example written in other languages can be found:

See also

Related Research Articles

Algorithms for calculating variance play a major role in computational statistics. A key difficulty in the design of good algorithms for this problem is that formulas for the variance may involve sums of squares, which can lead to numerical instability as well as to arithmetic overflow when dealing with large values.

In mathematics, a continuous function is a function such that a small variation of the argument induces a small variation of the value of the function. This implies there are no abrupt changes in value, known as discontinuities. More precisely, a function is continuous if arbitrarily small changes in its value can be assured by restricting to sufficiently small changes of its argument. A discontinuous function is a function that is not continuous. Until the 19th century, mathematicians largely relied on intuitive notions of continuity and considered only continuous functions. The epsilon–delta definition of a limit was introduced to formalize the definition of continuity.

<span class="mw-page-title-main">Entropy (information theory)</span> Expected amount of information needed to specify the output of a stochastic data source

In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable , which takes values in the set and is distributed according to , the entropy is where denotes the sum over the variable's possible values. The choice of base for , the logarithm, varies for different applications. Base 2 gives the unit of bits, while base e gives "natural units" nat, and base 10 gives units of "dits", "bans", or "hartleys". An equivalent definition of entropy is the expected value of the self-information of a variable.

<span class="mw-page-title-main">Moment of inertia</span> Scalar measure of the rotational inertia with respect to a fixed axis of rotation

The moment of inertia, otherwise known as the mass moment of inertia, angular/rotational mass, second moment of mass, or most accurately, rotational inertia, of a rigid body is a quantity that determines the torque needed for a desired angular acceleration about a rotational axis, akin to how mass determines the force needed for a desired acceleration. It depends on the body's mass distribution and the axis chosen, with larger moments requiring more torque to change the body's rate of rotation by a given amount.

<span class="mw-page-title-main">Quantization (signal processing)</span> Process of mapping a continuous set to a countable set

Quantization, in mathematics and digital signal processing, is the process of mapping input values from a large set to output values in a (countable) smaller set, often with a finite number of elements. Rounding and truncation are typical examples of quantization processes. Quantization is involved to some degree in nearly all digital signal processing, as the process of representing a signal in digital form ordinarily involves rounding. Quantization also forms the core of essentially all lossy compression algorithms.

In signal processing, a finite impulse response (FIR) filter is a filter whose impulse response is of finite duration, because it settles to zero in finite time. This is in contrast to infinite impulse response (IIR) filters, which may have internal feedback and may continue to respond indefinitely.

In physics, the Hamilton–Jacobi equation, named after William Rowan Hamilton and Carl Gustav Jacob Jacobi, is an alternative formulation of classical mechanics, equivalent to other formulations such as Newton's laws of motion, Lagrangian mechanics and Hamiltonian mechanics.

In statistics, G-tests are likelihood-ratio or maximum likelihood statistical significance tests that are increasingly being used in situations where chi-squared tests were previously recommended.

<span class="mw-page-title-main">Linear time-invariant system</span> Mathematical model which is both linear and time-invariant

In system analysis, among other fields of study, a linear time-invariant (LTI) system is a system that produces an output signal from any input signal subject to the constraints of linearity and time-invariance; these terms are briefly defined in the overview below. These properties apply (exactly or approximately) to many important physical systems, in which case the response y(t) of the system to an arbitrary input x(t) can be found directly using convolution: y(t) = (xh)(t) where h(t) is called the system's impulse response and ∗ represents convolution (not to be confused with multiplication). What's more, there are systematic methods for solving any such system (determining h(t)), whereas systems not meeting both properties are generally more difficult (or impossible) to solve analytically. A good example of an LTI system is any electrical circuit consisting of resistors, capacitors, inductors and linear amplifiers.

<span class="mw-page-title-main">Wavelet transform</span> Mathematical technique used in data compression and analysis

In mathematics, a wavelet series is a representation of a square-integrable function by a certain orthonormal series generated by a wavelet. This article provides a formal, mathematical definition of an orthonormal wavelet and of the integral wavelet transform.

In information theory, information dimension is an information measure for random vectors in Euclidean space, based on the normalized entropy of finely quantized versions of the random vectors. This concept was first introduced by Alfréd Rényi in 1959.

The Newman–Penrose (NP) formalism is a set of notation developed by Ezra T. Newman and Roger Penrose for general relativity (GR). Their notation is an effort to treat general relativity in terms of spinor notation, which introduces complex forms of the usual variables used in GR. The NP formalism is itself a special case of the tetrad formalism, where the tensors of the theory are projected onto a complete vector basis at each point in spacetime. Usually this vector basis is chosen to reflect some symmetry of the spacetime, leading to simplified expressions for physical observables. In the case of the NP formalism, the vector basis chosen is a null tetrad: a set of four null vectors—two real, and a complex-conjugate pair. The two real members often asymptotically point radially inward and radially outward, and the formalism is well adapted to treatment of the propagation of radiation in curved spacetime. The Weyl scalars, derived from the Weyl tensor, are often used. In particular, it can be shown that one of these scalars— in the appropriate frame—encodes the outgoing gravitational radiation of an asymptotically flat system.

In estimation theory and decision theory, a Bayes estimator or a Bayes action is an estimator or decision rule that minimizes the posterior expected value of a loss function. Equivalently, it maximizes the posterior expectation of a utility function. An alternative way of formulating an estimator within Bayesian statistics is maximum a posteriori estimation.

In statistics, the Kendall rank correlation coefficient, commonly referred to as Kendall's τ coefficient, is a statistic used to measure the ordinal association between two measured quantities. A τ test is a non-parametric hypothesis test for statistical dependence based on the τ coefficient. It is a measure of rank correlation: the similarity of the orderings of the data when ranked by each of the quantities. It is named after Maurice Kendall, who developed it in 1938, though Gustav Fechner had proposed a similar measure in the context of time series in 1897.

Rubber elasticity refers to a property of crosslinked rubber, namely that it can be stretched up to a factor of 10 from its original length. When released, it returns very nearly to its original length. This can be repeated many times with no apparent degradation to the rubber.

In differential geometry, normal coordinates at a point p in a differentiable manifold equipped with a symmetric affine connection are a local coordinate system in a neighborhood of p obtained by applying the exponential map to the tangent space at p. In a normal coordinate system, the Christoffel symbols of the connection vanish at the point p, thus often simplifying local calculations. In normal coordinates associated to the Levi-Civita connection of a Riemannian manifold, one can additionally arrange that the metric tensor is the Kronecker delta at the point p, and that the first partial derivatives of the metric at p vanish.

Pulse compression is a signal processing technique commonly used by radar, sonar and echography to either increase the range resolution when pulse length is constrained or increase the signal to noise ratio when the peak power and the bandwidth of the transmitted signal are constrained. This is achieved by modulating the transmitted pulse and then correlating the received signal with the transmitted pulse.

Non-linear least squares is the form of least squares analysis used to fit a set of m observations with a model that is non-linear in n unknown parameters (m ≥ n). It is used in some forms of nonlinear regression. The basis of the method is to approximate the model by a linear one and to refine the parameters by successive iterations. There are many similarities to linear least squares, but also some significant differences. In economic theory, the non-linear least squares method is applied in (i) the probit regression, (ii) threshold regression, (iii) smooth regression, (iv) logistic link regression, (v) Box–Cox transformed regressors ().

In statistics, an approximate entropy (ApEn) is a technique used to quantify the amount of regularity and the unpredictability of fluctuations over time-series data. For example, consider two series of data:

In the data analysis of time series, Time Warp Edit Distance (TWED) is a measure of similarity between pairs of discrete time series, controlling the relative distortion of the time units of the two series using the physical notion of elasticity. In comparison to other distance measures,, TWED is a metric. Its computational time complexity is , but can be drastically reduced in some specific situations by using a corridor to reduce the search space. Its memory space complexity can be reduced to . It was first proposed in 2009 by P.-F. Marteau.

References

  1. 1 2 Richman, JS; Moorman, JR (2000). "Physiological time-series analysis using approximate entropy and sample entropy". American Journal of Physiology. Heart and Circulatory Physiology. 278 (6): H2039–49. doi:10.1152/ajpheart.2000.278.6.H2039. PMID   10843903.
  2. Delgado-Bonal, Alfonso; Marshak, Alexander (June 2019). "Approximate Entropy and Sample Entropy: A Comprehensive Tutorial". Entropy. 21 (6): 541. Bibcode:2019Entrp..21..541D. doi: 10.3390/e21060541 . PMC   7515030 . PMID   33267255.
  3. Costa, Madalena; Goldberger, Ary; Peng, C.-K. (2005). "Multiscale entropy analysis of biological signals". Physical Review E. 71 (2): 021906. Bibcode:2005PhRvE..71b1906C. doi:10.1103/PhysRevE.71.021906. PMID   15783351.
  4. Błażkiewicz, Michalina; Kędziorek, Justyna; Hadamus, Anna (March 2021). "The Impact of Visual Input and Support Area Manipulation on Postural Control in Subjects after Osteoporotic Vertebral Fracture". Entropy. 23 (3): 375. Bibcode:2021Entrp..23..375B. doi: 10.3390/e23030375 . PMC   8004071 . PMID   33804770.
  5. Hadamus, Anna; Białoszewski, Dariusz; Błażkiewicz, Michalina; Kowalska, Aleksandra J.; Urbaniak, Edyta; Wydra, Kamil T.; Wiaderna, Karolina; Boratyński, Rafał; Kobza, Agnieszka; Marczyński, Wojciech (February 2021). "Assessment of the Effectiveness of Rehabilitation after Total Knee Replacement Surgery Using Sample Entropy and Classical Measures of Body Balance". Entropy. 23 (2): 164. Bibcode:2021Entrp..23..164H. doi: 10.3390/e23020164 . PMC   7911395 . PMID   33573057.