Sample entropy

Last updated

Sample entropy (SampEn) is a modification of approximate entropy (ApEn), used for assessing the complexity of physiological time-series signals, diagnosing diseased states. [1] SampEn has two advantages over ApEn: data length independence and a relatively trouble-free implementation. Also, there is a small computational difference: In ApEn, the comparison between the template vector (see below) and the rest of the vectors also includes comparison with itself. This guarantees that probabilities are never zero. Consequently, it is always possible to take a logarithm of probabilities. Because template comparisons with itself lower ApEn values, the signals are interpreted to be more regular than they actually are. These self-matches are not included in SampEn. However, since SampEn makes direct use of the correlation integrals, it is not a real measure of information but an approximation. The foundations and differences with ApEn, as well as a step-by-step tutorial for its application is available at. [2]

Contents

There is a multiscale version of SampEn as well, suggested by Costa and others. [3] SampEn can be used in biomedical and biomechanical research, for example to evaluate postural control. [4] [5]

Definition

Like approximate entropy (ApEn), Sample entropy (SampEn) is a measure of complexity. [1] But it does not include self-similar patterns as ApEn does. For a given embedding dimension , tolerance and number of data points , SampEn is the negative natural logarithm of the probability that if two sets of simultaneous data points of length have distance then two sets of simultaneous data points of length also have distance . And we represent it by (or by including sampling time ).

Now assume we have a time-series data set of length with a constant time interval . We define a template vector of length , such that and the distance function (i≠j) is to be the Chebyshev distance (but it could be any distance function, including Euclidean distance). We define the sample entropy to be

Where

= number of template vector pairs having

= number of template vector pairs having

It is clear from the definition that will always have a value smaller or equal to . Therefore, will be always either be zero or positive value. A smaller value of also indicates more self-similarity in data set or less noise.

Generally we take the value of to be and the value of to be . Where std stands for standard deviation which should be taken over a very large dataset. For instance, the r value of 6 ms is appropriate for sample entropy calculations of heart rate intervals, since this corresponds to for a very large population.

Multiscale SampEn

The definition mentioned above is a special case of multi scale sampEn with , where is called skipping parameter. In multiscale SampEn template vectors are defined with a certain interval between its elements, specified by the value of . And modified template vector is defined as and sampEn can be written as And we calculate and like before.

Implementation

Sample entropy can be implemented easily in many different programming languages. Below lies an example written in Python.

fromitertoolsimportcombinationsfrommathimportlogdefconstruct_templates(timeseries_data:list,m:int=2):num_windows=len(timeseries_data)-m+1return[timeseries_data[x:x+m]forxinrange(0,num_windows)]defget_matches(templates:list,r:float):returnlen(list(filter(lambdax:is_match(x[0],x[1],r),combinations(templates,2))))defis_match(template_1:list,template_2:list,r:float):returnall([abs(x-y)<rfor(x,y)inzip(template_1,template_2)])defsample_entropy(timeseries_data:list,window_size:int,r:float):B=get_matches(construct_templates(timeseries_data,window_size),r)A=get_matches(construct_templates(timeseries_data,window_size+1),r)return-log(A/B)

An example written in other languages can be found:

See also

Related Research Articles

In a chemical reaction, chemical equilibrium is the state in which both the reactants and products are present in concentrations which have no further tendency to change with time, so that there is no observable change in the properties of the system. This state results when the forward reaction proceeds at the same rate as the reverse reaction. The reaction rates of the forward and backward reactions are generally not zero, but they are equal. Thus, there are no net changes in the concentrations of the reactants and products. Such a state is known as dynamic equilibrium.

<span class="mw-page-title-main">Entropy (information theory)</span> Expected amount of information needed to specify the output of a stochastic data source

In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable , which takes values in the alphabet and is distributed according to , the entropy is

<span class="mw-page-title-main">Normal distribution</span> Probability distribution

In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is

Circular dichroism (CD) is dichroism involving circularly polarized light, i.e., the differential absorption of left- and right-handed light. Left-hand circular (LHC) and right-hand circular (RHC) polarized light represent two possible spin angular momentum states for a photon, and so circular dichroism is also referred to as dichroism for spin angular momentum. This phenomenon was discovered by Jean-Baptiste Biot, Augustin Fresnel, and Aimé Cotton in the first half of the 19th century. Circular dichroism and circular birefringence are manifestations of optical activity. It is exhibited in the absorption bands of optically active chiral molecules. CD spectroscopy has a wide range of applications in many different fields. Most notably, UV CD is used to investigate the secondary structure of proteins. UV/Vis CD is used to investigate charge-transfer transitions. Near-infrared CD is used to investigate geometric and electronic structure by probing metal d→d transitions. Vibrational circular dichroism, which uses light from the infrared energy region, is used for structural studies of small organic molecules, and most recently proteins and DNA.

<span class="mw-page-title-main">Moment of inertia</span> Scalar measure of the rotational inertia with respect to a fixed axis of rotation

The moment of inertia, otherwise known as the mass moment of inertia, angular/rotational mass, second moment of mass, or most accurately, rotational inertia, of a rigid body is a quantity that determines the torque needed for a desired angular acceleration about a rotational axis, akin to how mass determines the force needed for a desired acceleration. It depends on the body's mass distribution and the axis chosen, with larger moments requiring more torque to change the body's rate of rotation by a given amount.

<span class="mw-page-title-main">Short-time Fourier transform</span> Fourier-related transform suited to signals that change rather quickly in time

The short-time Fourier transform (STFT), is a Fourier-related transform used to determine the sinusoidal frequency and phase content of local sections of a signal as it changes over time. In practice, the procedure for computing STFTs is to divide a longer time signal into shorter segments of equal length and then compute the Fourier transform separately on each shorter segment. This reveals the Fourier spectrum on each shorter segment. One then usually plots the changing spectra as a function of time, known as a spectrogram or waterfall plot, such as commonly used in software defined radio (SDR) based spectrum displays. Full bandwidth displays covering the whole range of an SDR commonly use fast Fourier transforms (FFTs) with 2^24 points on desktop computers.

In physics, the Hamilton–Jacobi equation, named after William Rowan Hamilton and Carl Gustav Jacob Jacobi, is an alternative formulation of classical mechanics, equivalent to other formulations such as Newton's laws of motion, Lagrangian mechanics and Hamiltonian mechanics.

In statistics, G-tests are likelihood-ratio or maximum likelihood statistical significance tests that are increasingly being used in situations where chi-squared tests were previously recommended.

The equilibrium constant of a chemical reaction is the value of its reaction quotient at chemical equilibrium, a state approached by a dynamic chemical system after sufficient time has elapsed at which its composition has no measurable tendency towards further change. For a given set of reaction conditions, the equilibrium constant is independent of the initial analytical concentrations of the reactant and product species in the mixture. Thus, given the initial composition of a system, known equilibrium constant values can be used to determine the composition of the system at equilibrium. However, reaction parameters like temperature, solvent, and ionic strength may all influence the value of the equilibrium constant.

Variational Bayesian methods are a family of techniques for approximating intractable integrals arising in Bayesian inference and machine learning. They are typically used in complex statistical models consisting of observed variables as well as unknown parameters and latent variables, with various sorts of relationships among the three types of random variables, as might be described by a graphical model. As typical in Bayesian inference, the parameters and latent variables are grouped together as "unobserved variables". Variational Bayesian methods are primarily used for two purposes:

  1. To provide an analytical approximation to the posterior probability of the unobserved variables, in order to do statistical inference over these variables.
  2. To derive a lower bound for the marginal likelihood of the observed data. This is typically used for performing model selection, the general idea being that a higher marginal likelihood for a given model indicates a better fit of the data by that model and hence a greater probability that the model in question was the one that generated the data.
<span class="mw-page-title-main">Linear time-invariant system</span> Mathematical model which is both linear and time-invariant

In system analysis, among other fields of study, a linear time-invariant (LTI) system is a system that produces an output signal from any input signal subject to the constraints of linearity and time-invariance; these terms are briefly defined below. These properties apply (exactly or approximately) to many important physical systems, in which case the response y(t) of the system to an arbitrary input x(t) can be found directly using convolution: y(t) = (xh)(t) where h(t) is called the system's impulse response and ∗ represents convolution (not to be confused with multiplication). What's more, there are systematic methods for solving any such system (determining h(t)), whereas systems not meeting both properties are generally more difficult (or impossible) to solve analytically. A good example of an LTI system is any electrical circuit consisting of resistors, capacitors, inductors and linear amplifiers.

In thermodynamics, an activity coefficient is a factor used to account for deviation of a mixture of chemical substances from ideal behaviour. In an ideal mixture, the microscopic interactions between each pair of chemical species are the same and, as a result, properties of the mixtures can be expressed directly in terms of simple concentrations or partial pressures of the substances present e.g. Raoult's law. Deviations from ideality are accommodated by modifying the concentration by an activity coefficient. Analogously, expressions involving gases can be adjusted for non-ideality by scaling partial pressures by a fugacity coefficient.

The Newman–Penrose (NP) formalism is a set of notation developed by Ezra T. Newman and Roger Penrose for general relativity (GR). Their notation is an effort to treat general relativity in terms of spinor notation, which introduces complex forms of the usual variables used in GR. The NP formalism is itself a special case of the tetrad formalism, where the tensors of the theory are projected onto a complete vector basis at each point in spacetime. Usually this vector basis is chosen to reflect some symmetry of the spacetime, leading to simplified expressions for physical observables. In the case of the NP formalism, the vector basis chosen is a null tetrad: a set of four null vectors—two real, and a complex-conjugate pair. The two real members often asymptotically point radially inward and radially outward, and the formalism is well adapted to treatment of the propagation of radiation in curved spacetime. The Weyl scalars, derived from the Weyl tensor, are often used. In particular, it can be shown that one of these scalars— in the appropriate frame—encodes the outgoing gravitational radiation of an asymptotically flat system.

In mathematics, the Milstein method is a technique for the approximate numerical solution of a stochastic differential equation. It is named after Grigori N. Milstein who first published it in 1974.

In statistics, the Kendall rank correlation coefficient, commonly referred to as Kendall's τ coefficient, is a statistic used to measure the ordinal association between two measured quantities. A τ test is a non-parametric hypothesis test for statistical dependence based on the τ coefficient. It is a measure of rank correlation: the similarity of the orderings of the data when ranked by each of the quantities. It is named after Maurice Kendall, who developed it in 1938, though Gustav Fechner had proposed a similar measure in the context of time series in 1897.

<span class="mw-page-title-main">Non-random two-liquid model</span>

The non-random two-liquid model is an activity coefficient model introduced by Renon and Prausnitz in 1968 that correlates the activity coefficients of a compound with its mole fractions in the liquid phase concerned. It is frequently applied in the field of chemical engineering to calculate phase equilibria. The concept of NRTL is based on the hypothesis of Wilson, who stated that the local concentration around a molecule in most mixtures is different from the bulk concentration. This difference is due to a difference between the interaction energy of the central molecule with the molecules of its own kind and that with the molecules of the other kind . The energy difference also introduces a non-randomness at the local molecular level. The NRTL model belongs to the so-called local-composition models. Other models of this type are the Wilson model, the UNIQUAC model, and the group contribution model UNIFAC. These local-composition models are not thermodynamically consistent for a one-fluid model for a real mixture due to the assumption that the local composition around molecule i is independent of the local composition around molecule j. This assumption is not true, as was shown by Flemr in 1976. However, they are consistent if a hypothetical two-liquid model is used. Models, which have consistency between bulk and the local molecular concentrations around different types of molecules are COSMO-RS, and COSMOSPACE.

<span class="mw-page-title-main">Critical state soil mechanics</span>

Critical state soil mechanics is the area of soil mechanics that encompasses the conceptual models representing the mechanical behavior of saturated remoulded soils based on the critical state concept. At the critical state, the relationship between forces applied in the soil (stress), and the resulting deformation resulting from this stress (strain) becomes constant. The soil will continue to deform, but the stress will no longer increase.

<span class="mw-page-title-main">UNIQUAC</span> Model of phase equilibrium in statistical thermodynamics

In statistical thermodynamics, UNIQUAC is an activity coefficient model used in description of phase equilibria. The model is a so-called lattice model and has been derived from a first order approximation of interacting molecule surfaces. The model is, however, not fully thermodynamically consistent due to its two-liquid mixture approach. In this approach the local concentration around one central molecule is assumed to be independent from the local composition around another type of molecule.

<span class="mw-page-title-main">Stochastic gradient Langevin dynamics</span> Optimization and sampling technique

Stochastic gradient Langevin dynamics (SGLD) is an optimization and sampling technique composed of characteristics from Stochastic gradient descent, a Robbins–Monro optimization algorithm, and Langevin dynamics, a mathematical extension of molecular dynamics models. Like stochastic gradient descent, SGLD is an iterative optimization algorithm which uses minibatching to create a stochastic gradient estimator, as used in SGD to optimize a differentiable objective function. Unlike traditional SGD, SGLD can be used for Bayesian learning as a sampling method. SGLD may be viewed as Langevin dynamics applied to posterior distributions, but the key difference is that the likelihood gradient terms are minibatched, like in SGD. SGLD, like Langevin dynamics, produces samples from a posterior distribution of parameters based on available data. First described by Welling and Teh in 2011, the method has applications in many contexts which require optimization, and is most notably applied in machine learning problems.

The redundancy principle in biology expresses the need of many copies of the same entity to fulfill a biological function. Examples are numerous: disproportionate numbers of spermatozoa during fertilization compared to one egg, large number of neurotransmitters released during neuronal communication compared to the number of receptors, large numbers of released calcium ions during transient in cells, and many more in molecular and cellular transduction or gene activation and cell signaling. This redundancy is particularly relevant when the sites of activation are physically separated from the initial position of the molecular messengers. The redundancy is often generated for the purpose of resolving the time constraint of fast-activating pathways. It can be expressed in terms of the theory of extreme statistics to determine its laws and quantify how the shortest paths are selected. The main goal is to estimate these large numbers from physical principles and mathematical derivations.

References

  1. 1 2 Richman, JS; Moorman, JR (2000). "Physiological time-series analysis using approximate entropy and sample entropy". American Journal of Physiology. Heart and Circulatory Physiology. 278 (6): H2039–49. doi:10.1152/ajpheart.2000.278.6.H2039. PMID   10843903.
  2. Delgado-Bonal, Alfonso; Marshak, Alexander (June 2019). "Approximate Entropy and Sample Entropy: A Comprehensive Tutorial". Entropy. 21 (6): 541. Bibcode:2019Entrp..21..541D. doi: 10.3390/e21060541 . PMC   7515030 . PMID   33267255.
  3. Costa, Madalena; Goldberger, Ary; Peng, C.-K. (2005). "Multiscale entropy analysis of biological signals". Physical Review E. 71 (2): 021906. Bibcode:2005PhRvE..71b1906C. doi:10.1103/PhysRevE.71.021906. PMID   15783351.
  4. Błażkiewicz, Michalina; Kędziorek, Justyna; Hadamus, Anna (March 2021). "The Impact of Visual Input and Support Area Manipulation on Postural Control in Subjects after Osteoporotic Vertebral Fracture". Entropy. 23 (3): 375. Bibcode:2021Entrp..23..375B. doi: 10.3390/e23030375 . PMC   8004071 . PMID   33804770.
  5. Hadamus, Anna; Białoszewski, Dariusz; Błażkiewicz, Michalina; Kowalska, Aleksandra J.; Urbaniak, Edyta; Wydra, Kamil T.; Wiaderna, Karolina; Boratyński, Rafał; Kobza, Agnieszka; Marczyński, Wojciech (February 2021). "Assessment of the Effectiveness of Rehabilitation after Total Knee Replacement Surgery Using Sample Entropy and Classical Measures of Body Balance". Entropy. 23 (2): 164. Bibcode:2021Entrp..23..164H. doi: 10.3390/e23020164 . PMC   7911395 . PMID   33573057.