Dynamic time warping

Last updated
Dynamic time warping between two piecewise linear functions. The dotted line illustrates the time-warp relation. Notice that several points in the lower function are mapped to one point in the upper function, and vice versa. Dynamic time warping.png
Dynamic time warping between two piecewise linear functions. The dotted line illustrates the time-warp relation. Notice that several points in the lower function are mapped to one point in the upper function, and vice versa.
Two repetitions of a walking sequence recorded using a motion-capture system. While there are differences in walking speed between repetitions, the spatial paths of limbs remain highly similar. Two repetitions of a walking sequence of an individual recorded using a motion-capture system.gif
Two repetitions of a walking sequence recorded using a motion-capture system. While there are differences in walking speed between repetitions, the spatial paths of limbs remain highly similar.
DTW between a sinusoid and a noisy and shifted version of it. DTW illustration.png
DTW between a sinusoid and a noisy and shifted version of it.

In time series analysis, dynamic time warping (DTW) is an algorithm for measuring similarity between two temporal sequences, which may vary in speed. For instance, similarities in walking could be detected using DTW, even if one person was walking faster than the other, or if there were accelerations and decelerations during the course of an observation. DTW has been applied to temporal sequences of video, audio, and graphics data — indeed, any data that can be turned into a one-dimensional sequence can be analyzed with DTW. A well-known application has been automatic speech recognition, to cope with different speaking speeds. Other applications include speaker recognition and online signature recognition. It can also be used in partial shape matching applications.

Contents

In general, DTW is a method that calculates an optimal match between two given sequences (e.g. time series) with certain restriction and rules:

We can plot each match between the sequences and as a path in a matrix from to , such that each step is one of . In this formulation, we see that the number of possible matches is the Delannoy number.

The optimal match is the match that satisfies all the restrictions and the rules and that has the minimal cost, where the cost is computed as the sum of absolute differences, for each matched pair of indices, between their values.

The sequences are "warped" non-linearly in the time dimension to determine a measure of their similarity independent of certain non-linear variations in the time dimension. This sequence alignment method is often used in time series classification. Although DTW measures a distance-like quantity between two given sequences, it doesn't guarantee the triangle inequality to hold.

In addition to a similarity measure between the two sequences, a so called "warping path" is produced. By warping according to this path the two signals may be aligned in time. The signal with an original set of points X(original), Y(original) is transformed to X(warped), Y(warped). This finds applications in genetic sequence and audio synchronisation. In a related technique sequences of varying speed may be averaged using this technique see the average sequence section.

This is conceptually very similar to the Needleman–Wunsch algorithm.

Implementation

This example illustrates the implementation of the dynamic time warping algorithm when the two sequences s and t are strings of discrete symbols. For two symbols x and y, d(x, y) is a distance between the symbols, e.g. d(x, y) = .

int DTWDistance(s: array [1..n], t: array [1..m]) {     DTW := array [0..n, 0..m]          for i := 0 to n         for j := 0 to m             DTW[i, j] := infinity     DTW[0, 0] := 0          for i := 1 to n         for j := 1 to m             cost := d(s[i], t[j])             DTW[i, j] := cost + minimum(DTW[i-1, j  ],    // insertion                                         DTW[i  , j-1],    // deletion                                         DTW[i-1, j-1])    // match          return DTW[n, m] }

where DTW[i, j] is the distance between s[1:i] and t[1:j] with the best alignment.

We sometimes want to add a locality constraint. That is, we require that if s[i] is matched with t[j], then is not greater than w, a window parameter.

We can easily modify the above algorithm to add a locality constraint (differences marked). However, the above given modification works only if is not greater than w, i.e. the end point is within the window length from diagonal. In order to make the algorithm work, the window parameter w must be adapted so that (see the line marked with (*) in the code).

int DTWDistance(s: array [1..n], t: array [1..m], w: int) {     DTW := array [0..n, 0..m]      w := max(w, abs(n-m)) // adapt window size (*)      for i := 0 to n         for j:= 0 to m             DTW[i, j] := infinity     DTW[0, 0] := 0     for i := 1 to nfor j := max(1, i-w) to min(m, i+w)DTW[i, j] := 0      for i := 1 to n         for j := max(1, i-w) to min(m, i+w)             cost := d(s[i], t[j])             DTW[i, j] := cost + minimum(DTW[i-1, j  ],    // insertion                                         DTW[i  , j-1],    // deletion                                         DTW[i-1, j-1])    // match     return DTW[n, m] }

Warping properties

The DTW algorithm produces a discrete matching between existing elements of one series to another. In other words, it does not allow time-scaling of segments within the sequence. Other methods allow continuous warping. For example, Correlation Optimized Warping (COW) divides the sequence into uniform segments that are scaled in time using linear interpolation, to produce the best matched warping. The segment scaling causes potential creation of new elements, by time-scaling segments either down or up, and thus produces a more sensitive warping than DTW's discrete matching of raw elements.

Complexity

The time complexity of the DTW algorithm is , where and are the lengths of the two input sequences. The 50 years old quadratic time bound was broken in 2016: an algorithm due to Gold and Sharir enables computing DTW in time and space for two input sequences of length . [2] This algorithm can also be adapted to sequences of different lengths. Despite this improvement, it was shown that a strongly subquadratic running time of the form for some cannot exist unless the Strong exponential time hypothesis fails. [3] [4]

While the dynamic programming algorithm for DTW requires space in a naive implementation, the space consumption can be reduced to using Hirschberg's algorithm.

Fast computation

Fast techniques for computing DTW include Early Abandoned and Pruned DTW, [5] PrunedDTW, [6] SparseDTW, [7] FastDTW, [8] and the MultiscaleDTW. [9] [10]

A common task, retrieval of similar time series, can be accelerated by using lower bounds such as LB_Keogh, [11] LB_Improved, [12] LB_Enhanced, [13] LB_Webb [14] or LB_Petitjean. [14] However, the Early Abandon and Pruned DTW algorithm reduces the degree of acceleration that lower bounding provides and sometimes renders it ineffective. [5]

In a survey, Wang et al. reported slightly better results with the LB_Improved lower bound than the LB_Keogh bound, and found that other techniques were inefficient. [15] Subsequent to this survey, the LB_Enhanced bound was developed that is always tighter than LB_Keogh while also being more efficient to compute. [13] LB_Petitjean is the tightest known lower bound that can be computed in linear time. [14]

Average sequence

Averaging for dynamic time warping is the problem of finding an average sequence for a set of sequences. NLAAF [16] is an exact method to average two sequences using DTW. For more than two sequences, the problem is related to the one of the multiple alignment and requires heuristics. DBA [17] is currently a reference method to average a set of sequences consistently with DTW. COMASA [18] efficiently randomizes the search for the average sequence, using DBA as a local optimization process.

Supervised learning

A nearest-neighbour classifier can achieve state-of-the-art performance when using dynamic time warping as a distance measure. [19]

Amerced Dynamic Time Warping

Amerced Dynamic Time Warping (ADTW) is a variant of DTW designed to better control DTW's permissiveness in the alignments that it allows. [20] The windows that classical DTW uses to constrain alignments introduce a step function. Any warping of the path is allowed within the window and none beyond it. In contrast, ADTW employs an additive penalty that is incurred each time that the path is warped. Any amount of warping is allowed, but each warping action incurs a direct penalty. ADTW significantly outperforms DTW with windowing when applied as a nearest neighbor classifier on a set of benchmark time series classification tasks. [20]

Graphical Time Warping

Graphical Time Warping (GTW) is a generalized version of DTW that can align multiple pairs of time series or sequences jointly. [21] Compared with aligning multiple pairs independently through DTW, GTW considers both the alignment accuracy of each sequence pair (as DTW) and the similarity among pairs (according to the data structure or assigned by user). This can result in better alignment performance when the similarity among pairs exists.

Distance functions

DTW is sensitive to the distance function used to score matches between pairs of values across the two sequences. The original definition of DTW [22] used . In time series classification, has become popular. [23]

Recent work [24] has shown that tuning of this distance measure can be useful for tuning DTW performance. Specifically, tuning γ in a family of distance functions of the form makes DTW focus more on low amplitude effects when γ is small and large amplitude effects when γ is large.

Handling missing values

DTW cannot handle missing values in time series. Simple preprocessing methods such as dropping or interpolating missing values do not provide a good estimate of the DTW distance. [25]

DTW-AROW (DTW with Additional Restrictions on Warping) is a generalization of DTW to handle missing values. [25] DTW-AROW obtains both a distance and a warping path; hence, can simply be replaced by DTW to handle missing values in many applications. [25] DTW-AROW has the same time and memory complexity as DTW. [25] An open-source implementation of DTW-AROW is available in Python.

Alternative approaches

In functional data analysis, time series are regarded as discretizations of smooth (differentiable) functions of time. By viewing the observed samples as smooth functions, one can utilize continuous mathematics for analyzing data. [26] Smoothness and monotonicity of time warp functions may be obtained for instance by integrating a time-varying radial basis function, thus being a one-dimensional diffeomorphism. [27] Optimal nonlinear time warping functions are computed by minimizing a measure of distance of the set of functions to their warped average. Roughness penalty terms for the warping functions may be added, e.g., by constraining the size of their curvature. The resultant warping functions are smooth, which facilitates further processing. This approach has been successfully applied to analyze patterns and variability of speech movements. [28] [29]

Another related approach is hidden Markov model (HMM) and it has been shown that the Viterbi algorithm used to search for the most likely path through the HMM is equivalent to stochastic DTW. [30] [31] [32]

DTW and related warping methods are typically used as pre- or post-processing steps in data analyses. If both observed sequences contain random variation in their values, shape of observed sequences and random temporal misalignment, the warping may overfit to noise leading to biased results. A simultaneous model formulation with random variation in both values (vertical) and time-parametrization (horizontal) is an example of a nonlinear mixed-effects model. [33] In human movement analysis, simultaneous nonlinear mixed-effects modeling has been shown to produce superior results compared to DTW. [34]

Open-source software

Applications

Spoken-word recognition

Due to different speaking rates, a non-linear fluctuation occurs in speech pattern versus time axis, which needs to be eliminated. [22] DP matching is a pattern-matching algorithm based on dynamic programming (DP), which uses a time-normalization effect, where the fluctuations in the time axis are modeled using a non-linear time-warping function. Considering any two speech patterns, we can get rid of their timing differences by warping the time axis of one so that the maximal coincidence is attained with the other. Moreover, if the warping function is allowed to take any possible value, very less[ clarify ] distinction can be made between words belonging to different categories. So, to enhance the distinction between words belonging to different categories, restrictions were imposed on the warping function slope.

Correlation power analysis

Unstable clocks are used to defeat naive power analysis. Several techniques are used to counter this defense, one of which is dynamic time warping.

Finance and econometrics

Dynamic time warping is used in finance and econometrics to assess the quality of the prediction versus real-world data. [36] [37] [38]

See also

Related Research Articles

Speech processing is the study of speech signals and the processing methods of signals. The signals are usually processed in a digital representation, so speech processing can be regarded as a special case of digital signal processing, applied to speech signals. Aspects of speech processing includes the acquisition, manipulation, storage, transfer and output of speech signals. Different speech processing tasks include speech recognition, speech synthesis, speaker diarization, speech enhancement, speaker recognition, etc.

Vector quantization (VQ) is a classical quantization technique from signal processing that allows the modeling of probability density functions by the distribution of prototype vectors. Developed in the early 1980s by Robert M. Gray, it was originally used for data compression. It works by dividing a large set of points (vectors) into groups having approximately the same number of points closest to them. Each group is represented by its centroid point, as in k-means and some other clustering algorithms. In simpler terms, vector quantization chooses a set of points to represent a larger set of points.

A hidden Markov model (HMM) is a Markov model in which the observations are dependent on a latent Markov process. An HMM requires that there be an observable process whose outcomes depend on the outcomes of in a known way. Since cannot be observed directly, the goal is to learn about state of by observing By definition of being a Markov model, an HMM has an additional requirement that the outcome of at time must be "influenced" exclusively by the outcome of at and that the outcomes of and at must be conditionally independent of at given at time Estimation of the parameters in an HMM can be performed using maximum likelihood. For linear chain HMMs, the Baum–Welch algorithm can be used to estimate the parameters.

Pattern recognition is the task of assigning a class to an observation based on patterns extracted from data. While similar, pattern recognition (PR) is not to be confused with pattern machines (PM) which may possess (PR) capabilities but their primary function is to distinguish and create emergent pattern. PR has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphics and machine learning. Pattern recognition has its origins in statistics and engineering; some modern approaches to pattern recognition include the use of machine learning, due to the increased availability of big data and a new abundance of processing power.

<span class="mw-page-title-main">Longest common subsequence</span> Algorithmic problem on pairs of sequences

A longest common subsequence (LCS) is the longest subsequence common to all sequences in a set of sequences. It differs from the longest common substring: unlike substrings, subsequences are not required to occupy consecutive positions within the original sequences. The problem of computing longest common subsequences is a classic computer science problem, the basis of data comparison programs such as the diff utility, and has applications in computational linguistics and bioinformatics. It is also widely used by revision control systems such as Git for reconciling multiple changes made to a revision-controlled collection of files.

<span class="mw-page-title-main">Nonlinear dimensionality reduction</span> Summary of algorithms for nonlinear dimensionality reduction

Nonlinear dimensionality reduction, also known as manifold learning, refers to various related techniques that aim to project high-dimensional data onto lower-dimensional latent manifolds, with the goal of either visualizing the data in the low-dimensional space, or learning the mapping itself. The techniques described below can be understood as generalizations of linear decomposition methods used for dimensionality reduction, such as singular value decomposition and principal component analysis.

In information theory, linguistics, and computer science, the Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits required to change one word into the other. It is named after the Soviet mathematician Vladimir Levenshtein, who considered this distance in 1965.

In computational linguistics and computer science, edit distance is a string metric, i.e. a way of quantifying how dissimilar two strings are to one another, that is measured by counting the minimum number of operations required to transform one string into the other. Edit distances find applications in natural language processing, where automatic spelling correction can determine candidate corrections for a misspelled word by selecting words from a dictionary that have a low distance to the word in question. In bioinformatics, it can be used to quantify the similarity of DNA sequences, which can be viewed as strings of the letters A, C, G and T.

In electrical engineering, statistical computing and bioinformatics, the Baum–Welch algorithm is a special case of the expectation–maximization algorithm used to find the unknown parameters of a hidden Markov model (HMM). It makes use of the forward-backward algorithm to compute the statistics for the expectation step.

In mathematics, computer science and especially graph theory, a distance matrix is a square matrix containing the distances, taken pairwise, between the elements of a set. Depending upon the application involved, the distance being used to define this matrix may or may not be a metric. If there are N elements, this matrix will have size N×N. In graph-theoretic applications, the elements are more often referred to as points, nodes or vertices.

<span class="mw-page-title-main">Needleman–Wunsch algorithm</span> Method for aligning biological sequences

The Needleman–Wunsch algorithm is an algorithm used in bioinformatics to align protein or nucleotide sequences. It was one of the first applications of dynamic programming to compare biological sequences. The algorithm was developed by Saul B. Needleman and Christian D. Wunsch and published in 1970. The algorithm essentially divides a large problem into a series of smaller problems, and it uses the solutions to the smaller problems to find an optimal solution to the larger problem. It is also sometimes referred to as the optimal matching algorithm and the global alignment technique. The Needleman–Wunsch algorithm is still widely used for optimal global alignment, particularly when the quality of the global alignment is of the utmost importance. The algorithm assigns a score to every possible alignment, and the purpose of the algorithm is to find all possible alignments having the highest score.

<span class="mw-page-title-main">Smith–Waterman algorithm</span> Algorithm for determining similar regions between two molecular sequences

The Smith–Waterman algorithm performs local sequence alignment; that is, for determining similar regions between two strings of nucleic acid sequences or protein sequences. Instead of looking at the entire sequence, the Smith–Waterman algorithm compares segments of all possible lengths and optimizes the similarity measure.

A recurrent neural network (RNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between its layers. In contrast to the uni-directional feedforward neural network, it is a bi-directional artificial neural network, meaning that it allows the output from some nodes to affect subsequent input to the same nodes. Their ability to use internal state (memory) to process arbitrary sequences of inputs makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition. The term "recurrent neural network" is used to refer to the class of networks with an infinite impulse response, whereas "convolutional neural network" refers to the class of finite impulse response. Both classes of networks exhibit temporal dynamic behavior. A finite impulse recurrent network is a directed acyclic graph that can be unrolled and replaced with a strictly feedforward neural network, while an infinite impulse recurrent network is a directed cyclic graph that can not be unrolled.

k-means clustering is a method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells. k-means clustering minimizes within-cluster variances, but not regular Euclidean distances, which would be the more difficult Weber problem: the mean optimizes squared errors, whereas only the geometric median minimizes Euclidean distances. For instance, better Euclidean solutions can be found using k-medians and k-medoids.

Functional data analysis (FDA) is a branch of statistics that analyses data providing information about curves, surfaces or anything else varying over a continuum. In its most general form, under an FDA framework, each sample element of functional data is considered to be a random function. The physical continuum over which these functions are defined is often time, but may also be spatial location, wavelength, probability, etc. Intrinsically, functional data are infinite dimensional. The high intrinsic dimensionality of these data brings challenges for theory as well as computation, where these challenges vary with how the functional data were sampled. However, the high or infinite dimensional structure of the data is a rich source of information and there are many interesting challenges for research and data analysis.

<span class="mw-page-title-main">Approximate string matching</span> Finding strings that approximately match a pattern

In computer science, approximate string matching is the technique of finding strings that match a pattern approximately. The problem of approximate string matching is typically divided into two sub-problems: finding approximate substring matches inside a given string and finding dictionary strings that match the pattern approximately.

In graph theory and computer science, the lowest common ancestor (LCA) of two nodes v and w in a tree or directed acyclic graph (DAG) T is the lowest node that has both v and w as descendants, where we define each node to be a descendant of itself.

In computer science, Hirschberg's algorithm, named after its inventor, Dan Hirschberg, is a dynamic programming algorithm that finds the optimal sequence alignment between two strings. Optimality is measured with the Levenshtein distance, defined to be the sum of the costs of insertions, replacements, deletions, and null actions needed to change one string into the other. Hirschberg's algorithm is simply described as a more space-efficient version of the Needleman–Wunsch algorithm that uses divide and conquer. Hirschberg's algorithm is commonly used in computational biology to find maximal global alignments of DNA and protein sequences.

Shape context is a feature descriptor used in object recognition. Serge Belongie and Jitendra Malik proposed the term in their paper "Matching with Shape Contexts" in 2000.

Graphical time warping (GTW) is a framework for jointly aligning multiple pairs of time series or sequences. GTW considers both the alignment accuracy of each sequence pair and the similarity among pairs. On contrary, alignment with dynamic time warping (DTW) considers the pairs independently and minimizes only the distance between the two sequences in a given pair. Therefore, GTW generalizes DTW and could achieve a better alignment performance when similarity among pairs is expected.

References

  1. Olsen, NL; Markussen, B; Raket, LL (2018), "Simultaneous inference for misaligned multivariate functional data", Journal of the Royal Statistical Society, Series C, 67 (5): 1147–76, arXiv: 1606.03295 , doi:10.1111/rssc.12276, S2CID   88515233
  2. Gold, Omer; Sharir, Micha (2018). "Dynamic Time Warping and Geometric Edit Distance: Breaking the Quadratic Barrier". ACM Transactions on Algorithms. 14 (4). doi:10.1145/3230734. S2CID   52070903.
  3. Bringmann, Karl; Künnemann, Marvin (2015). "Quadratic Conditional Lower Bounds for String Problems and Dynamic Time Warping". 2015 IEEE 56th Annual Symposium on Foundations of Computer Science. pp. 79–97. arXiv: 1502.01063 . doi:10.1109/FOCS.2015.15. ISBN   978-1-4673-8191-8. S2CID   1308171.
  4. Abboud, Amir; Backurs, Arturs; Williams, Virginia Vassilevska (2015). "Tight Hardness Results for LCS and Other Sequence Similarity Measures". 2015 IEEE 56th Annual Symposium on Foundations of Computer Science. pp. 59–78. doi:10.1109/FOCS.2015.14. ISBN   978-1-4673-8191-8. S2CID   16094517.
  5. 1 2 Herrmann, Matthieu; Webb, Geoffrey I. (2021). "Early abandoning and pruning for elastic distances including dynamic time warping". Data Mining and Knowledge Discovery. 35 (6): 2577–2601. arXiv: 2102.05221 . doi:10.1007/s10618-021-00782-4. S2CID   235313990.
  6. Silva, D. F., Batista, G. E. A. P. A. (2015). Speeding Up All-Pairwise Dynamic Time Warping Matrix Calculation.
  7. Al-Naymat, G., Chawla, S., Taheri, J. (2012). SparseDTW: A Novel Approach to Speed up Dynamic Time Warping.
  8. Stan Salvador, Philip Chan, FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space. KDD Workshop on Mining Temporal and Sequential Data, pp. 70–80, 2004.
  9. Meinard Müller, Henning Mattes, and Frank Kurth (2006). An Efficient Multiscale Approach to Audio Synchronization. Proceedings of the International Conference on Music Information Retrieval (ISMIR), pp. 192—197.
  10. Thomas Prätzlich, Jonathan Driedger, and Meinard Müller (2016). Memory-Restricted Multiscale Dynamic Time Warping. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 569—573.
  11. Keogh, E.; Ratanamahatana, C. A. (2005). "Exact indexing of dynamic time warping". Knowledge and Information Systems. 7 (3): 358–386. doi:10.1007/s10115-004-0154-9. S2CID   207056701.
  12. Lemire, D. (2009). "Faster Retrieval with a Two-Pass Dynamic-Time-Warping Lower Bound". Pattern Recognition. 42 (9): 2169–2180. arXiv: 0811.3301 . Bibcode:2009PatRe..42.2169L. doi:10.1016/j.patcog.2008.11.030. S2CID   8658213.
  13. 1 2 Tan, Chang Wei; Petitjean, Francois; Webb, Geoffrey I. (2019). "Elastic bands across the path: A new framework and methods to lower bound DTW". Proceedings of the 2019 SIAM International Conference on Data Mining: 522–530. arXiv: 1808.09617 . doi:10.1137/1.9781611975673.59. ISBN   978-1-61197-567-3. S2CID   52120426.
  14. 1 2 3 Webb, Geoffrey I.; Petitjean, Francois (2021). "Tight lower bounds for Dynamic Time Warping". Pattern Recognition. 115: 107895. arXiv: 2102.07076 . Bibcode:2021PatRe.11507895W. doi:10.1016/j.patcog.2021.107895. S2CID   231925247.
  15. Wang, Xiaoyue; et al. (2010). "Experimental comparison of representation methods and distance measures for time series data". Data Mining and Knowledge Discovery. 2010: 1–35. arXiv: 1012.2789 .
  16. Gupta, L.; Molfese, D. L.; Tammana, R.; Simos, P. G. (1996). "Nonlinear alignment and averaging for estimating the evoked potential". IEEE Transactions on Biomedical Engineering. 43 (4): 348–356. doi:10.1109/10.486255. PMID   8626184. S2CID   28688330.
  17. 1 2 Petitjean, F. O.; Ketterlin, A.; Gançarski, P. (2011). "A global averaging method for dynamic time warping, with applications to clustering". Pattern Recognition. 44 (3): 678. Bibcode:2011PatRe..44..678P. doi:10.1016/j.patcog.2010.09.013. S2CID   14850691.
  18. Petitjean, F. O.; Gançarski, P. (2012). "Summarizing a set of time series by averaging: From Steiner sequence to compact multiple alignment". Theoretical Computer Science. 414: 76–91. doi: 10.1016/j.tcs.2011.09.029 .
  19. Ding, Hui; Trajcevski, Goce; Scheuermann, Peter; Wang, Xiaoyue; Keogh, Eamonn (2008). "Querying and mining of time series data: experimental comparison of representations and distance measures". Proc. VLDB Endow. 1 (2): 1542–1552. doi: 10.14778/1454159.1454226 .
  20. 1 2 Herrmann, Matthieu; Webb, Geoffrey I. (2023). "Amercing: An intuitive and effective constraint for dynamic time warping". Pattern Recognition. 137: 109333. Bibcode:2023PatRe.13709333H. doi: 10.1016/j.patcog.2023.109333 . S2CID   256182457.
  21. Wang, Yizhi; Miller, David J; Poskanzer, Kira; Wang, Yue; Tian, Lin; Yu, Guoqiang (2016). "Graphical Time Warping for Joint Alignment of Multiple Curves". Advances in Neural Information Processing Systems. Curran Associates, Inc. 29.
  22. 1 2 Sakoe, Hiroaki; Chiba, Seibi (1978). "Dynamic programming algorithm optimization for spoken word recognition". IEEE Transactions on Acoustics, Speech, and Signal Processing. 26 (1): 43–49. doi:10.1109/tassp.1978.1163055. S2CID   17900407.
  23. Bagnall, Anthony; Lines, Jason; Bostrom, Aaron; Large, James; Keogh, Eamonn (2016-11-23). "The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances". Data Mining and Knowledge Discovery. Springer Science and Business Media LLC. 31 (3): 606–660. doi:10.1007/s10618-016-0483-9. ISSN   1384-5810. PMC   6404674 . PMID   30930678.
  24. Herrmann, Matthieu; Tan, Chang Wei; Webb, Geoffrey I. (2023-04-16). "Parameterizing the cost function of dynamic time warping with application to time series classification". Data Mining and Knowledge Discovery. Springer: 1–22. arXiv: 2301.10350 . doi:10.1007/s10618-023-00926-8. ISSN   1573-756X . Retrieved 2023-04-17.
  25. 1 2 3 4 Yurtman, Aras; Soenen, Jonas; Meert, Wannes; Blockeel, Hendrik (2023). Koutra, Danai; Plant, Claudia; Gomez Rodriguez, Manuel; Baralis, Elena; Bonchi, Francesco (eds.). "Estimating Dynamic Time Warping Distance Between Time Series with Missing Data". Machine Learning and Knowledge Discovery in Databases: Research Track. Lecture Notes in Computer Science. Cham: Springer Nature Switzerland: 221–237. doi:10.1007/978-3-031-43424-2_14. ISBN   978-3-031-43424-2.
  26. Lucero, J. C.; Munhall, K. G.; Gracco, V. G.; Ramsay, J. O. (1997). "On the Registration of Time and the Patterning of Speech Movements". Journal of Speech, Language, and Hearing Research. 40 (5): 1111–1117. doi:10.1044/jslhr.4005.1111. PMID   9328881.
  27. Durrleman, S; Pennec, X.; Trouvé, A.; Braga, J.; Gerig, G. & Ayache, N. (2013). "Toward a Comprehensive Framework for the Spatiotemporal Statistical Analysis of Longitudinal Shape Data". International Journal of Computer Vision. 103 (1): 22–59. doi:10.1007/s11263-012-0592-x. PMC   3744347 . PMID   23956495.
  28. Howell, P.; Anderson, A.; Lucero, J. C. (2010). "Speech motor timing and fluency". In Maassen, B.; van Lieshout, P. (eds.). Speech Motor Control: New Developments in Basic and Applied Research. Oxford University Press. pp. 215–225. ISBN   978-0199235797.
  29. Koenig, Laura L.; Lucero, Jorge C.; Perlman, Elizabeth (2008). "Speech production variability in fricatives of children and adults: Results of functional data analysis". The Journal of the Acoustical Society of America. 124 (5): 3158–3170. Bibcode:2008ASAJ..124.3158K. doi:10.1121/1.2981639. ISSN   0001-4966. PMC   2677351 . PMID   19045800.
  30. Nakagawa, Seiichi; Nakanishi, Hirobumi (1988-01-01). "Speaker-Independent English Consonant and Japanese Word Recognition by a Stochastic Dynamic Time Warping Method". IETE Journal of Research. 34 (1): 87–95. doi:10.1080/03772063.1988.11436710. ISSN   0377-2063.
  31. Fang, Chunsheng. "From Dynamic Time Warping (DTW) to Hidden Markov Model (HMM)" (PDF).
  32. Juang, B. H. (September 1984). "On the hidden Markov model and dynamic time warping for speech recognition #x2014; A unified view". AT&T Bell Laboratories Technical Journal. 63 (7): 1213–1243. doi:10.1002/j.1538-7305.1984.tb00034.x. ISSN   0748-612X. S2CID   8461145.
  33. Raket LL, Sommer S, Markussen B (2014). "A nonlinear mixed-effects model for simultaneous smoothing and registration of functional data". Pattern Recognition Letters. 38: 1–7. Bibcode:2014PaReL..38....1R. doi:10.1016/j.patrec.2013.10.018.
  34. Raket LL, Grimme B, Schöner G, Igel C, Markussen B (2016). "Separating timing, movement conditions and individual differences in the analysis of human movement". PLOS Computational Biology. 12 (9): e1005092. arXiv: 1601.02775 . Bibcode:2016PLSCB..12E5092R. doi: 10.1371/journal.pcbi.1005092 . PMC   5033575 . PMID   27657545.
  35. Tan, Chang Wei; Herrmann, Matthieu; Webb, Geoffrey I. (2021). "Ultra fast warping window optimization for Dynamic Time Warping" (PDF). 2021 IEEE International Conference on Data Mining (ICDM). pp. 589–598. doi:10.1109/ICDM51629.2021.00070. ISBN   978-1-6654-2398-4. S2CID   246291550.
  36. Orlando, Giuseppe; Bufalo, Michele; Stoop, Ruedi (2022-02-01). "Financial markets' deterministic aspects modeled by a low-dimensional equation". Scientific Reports. 12 (1): 1693. Bibcode:2022NatSR..12.1693O. doi: 10.1038/s41598-022-05765-z . ISSN   2045-2322. PMC   8807815 . PMID   35105929.
  37. Mastroeni, Loretta; Mazzoccoli, Alessandro; Quaresima, Greta; Vellucci, Pierluigi (2021-02-01). "Decoupling and recoupling in the crude oil price benchmarks: An investigation of similarity patterns". Energy Economics. 94: 105036. doi:10.1016/j.eneco.2020.105036. ISSN   0140-9883. S2CID   230536868.
  38. Orlando, Giuseppe; Bufalo, Michele (2021-12-10). "Modelling bursts and chaos regularization in credit risk with a deterministic nonlinear model". Finance Research Letters. 47: 102599. doi:10.1016/j.frl.2021.102599. ISSN   1544-6123.

Further reading