Long-range dependence

Last updated

Long-range dependence (LRD), also called long memory or long-range persistence, is a phenomenon that may arise in the analysis of spatial or time series data. It relates to the rate of decay of statistical dependence of two points with increasing time interval or spatial distance between the points. A phenomenon is usually considered to have long-range dependence if the dependence decays more slowly than an exponential decay, typically a power-like decay. LRD is often related to self-similar processes or fields. LRD has been used in various fields such as internet traffic modelling, econometrics, hydrology, linguistics and the earth sciences. Different mathematical definitions of LRD are used for different contexts and purposes. [1] [2] [3] [4] [5] [6]

Contents

Short-range dependence versus long-range dependence

One way of characterising long-range and short-range dependent stationary process is in terms of their autocovariance functions. For a short-range dependent process, the coupling between values at different times decreases rapidly as the time difference increases. Either the autocovariance drops to zero after a certain time-lag, or it eventually has an exponential decay. In the case of LRD, there is much stronger coupling. The decay of the autocovariance function is power-like and so is slower than exponential.

A second way of characterizing long- and short-range dependence is in terms of the variance of partial sum of consecutive values. For short-range dependence, the variance grows typically proportionally to the number of terms. As for LRD, the variance of the partial sum increases more rapidly which is often a power function with the exponent greater than 1. A way of examining this behavior uses the rescaled range. This aspect of long-range dependence is important in the design of dams on rivers for water resources, where the summations correspond to the total inflow to the dam over an extended period. [7]

The above two ways are mathematically related to each other, but they are not the only ways to define LRD. In the case where the autocovariance of the process does not exist (heavy tails), one has to find other ways to define what LRD means, and this is often done with the help of self-similar processes.

The Hurst parameter H is a measure of the extent of long-range dependence in a time series (while it has another meaning in the context of self-similar processes). H takes on values from 0 to 1. A value of 0.5 indicates the absence of long-range dependence. [8] The closer H is to 1, the greater the degree of persistence or long-range dependence. H less than 0.5 corresponds to anti-persistency, which as the opposite of LRD indicates strong negative correlation so that the process fluctuates violently.

Estimation of the Hurst parameter

Slowly decaying variances, LRD, and a spectral density obeying a power-law are different manifestations of the property of the underlying covariance of a stationary process. Therefore, it is possible to approach the problem of estimating the Hurst parameter from three difference angles:

Relation to self-similar processes

Given a stationary LRD sequence, the partial sum if viewed as a process indexed by the number of terms after a proper scaling, is a self-similar process with stationary increments asymptotically, the most typical one being fractional Brownian motion. In the converse, given a self-similar process with stationary increments with Hurst index H > 0.5, its increments (consecutive differences of the process) is a stationary LRD sequence.

This also holds true if the sequence is short-range dependent, but in this case the self-similar process resulting from the partial sum can only be Brownian motion (H = 0.5).

Models

Among stochastic models that are used for long-range dependence, some popular ones are autoregressive fractionally integrated moving average models, which are defined for discrete-time processes, while continuous-time models might start from fractional Brownian motion.

See also

Notes

  1. Beran, Jan (1994). Statistics for Long-Memory Processes. CRC Press.
  2. Doukhan; et al. (2003). Theory and Applications of Long-Range Dependence. Birkhäuser.
  3. Malamud, Bruce D.; Turcotte, Donald L. (1999). Self-Affine Time Series: I. Generation and Analyses. Vol. 40. pp. 1–90. Bibcode:1999AdGeo..40....1M. doi:10.1016/S0065-2687(08)60293-9. ISBN   9780120188406.{{cite book}}: |journal= ignored (help)
  4. Samorodnitsky, Gennady (2007). Long range dependence. Foundations and Trends in Stochastic Systems.
  5. Beran; et al. (2013). Long memory processes: probabilistic properties and statistical methods. Springer.
  6. Witt, Annette; Malamud, Bruce D. (September 2013). "Quantification of Long-Range Persistence in Geophysical Time Series: Conventional and Benchmark-Based Improvement Techniques". Surveys in Geophysics. 34 (5): 541–651. Bibcode:2013SGeo...34..541W. doi: 10.1007/s10712-012-9217-8 .
    • Hurst, H.E., Black, R.P., Simaika, Y.M. (1965) Long-term storage: an experimental study Constable, London.
  7. Beran (1994) page 34

Further reading

Related Research Articles

<span class="mw-page-title-main">Autocorrelation</span> Correlation of a signal with a time-shifted copy of itself, as a function of shift

Autocorrelation, sometimes known as serial correlation in the discrete time case, is the correlation of a signal with a delayed copy of itself as a function of delay. Informally, it is the similarity between observations of a random variable as a function of the time lag between them. The analysis of autocorrelation is a mathematical tool for finding repeating patterns, such as the presence of a periodic signal obscured by noise, or identifying the missing fundamental frequency in a signal implied by its harmonic frequencies. It is often used in signal processing for analyzing functions or series of values, such as time domain signals.

<span class="mw-page-title-main">Pink noise</span> Signal with equal energy per octave

Pink noise, 1f noise, fractional noise or fractal noise is a signal or process with a frequency spectrum such that the power spectral density is inversely proportional to the frequency of the signal. In pink noise, each octave interval carries an equal amount of noise energy.

In probability theory and statistics, a Gaussian process is a stochastic process, such that every finite collection of those random variables has a multivariate normal distribution. The distribution of a Gaussian process is the joint distribution of all those random variables, and as such, it is a distribution over functions with a continuous domain, e.g. time or space.

In mathematics and statistics, a stationary process is a stochastic process whose unconditional joint probability distribution does not change when shifted in time. Consequently, parameters such as mean and variance also do not change over time.

A Lévy flight is a random walk in which the step-lengths have a stable distribution, a probability distribution that is heavy-tailed. When defined as a walk in a space of dimension greater than one, the steps made are in isotropic random directions. Later researchers have extended the use of the term "Lévy flight" to also include cases where the random walk takes place on a discrete grid rather than on a continuous space.

In statistics, econometrics, and signal processing, an autoregressive (AR) model is a representation of a type of random process; as such, it can be used to describe certain time-varying processes in nature, economics, behavior, etc. The autoregressive model specifies that the output variable depends linearly on its own previous values and on a stochastic term ; thus the model is in the form of a stochastic difference equation which should not be confused with a differential equation. Together with the moving-average (MA) model, it is a special case and key component of the more general autoregressive–moving-average (ARMA) and autoregressive integrated moving average (ARIMA) models of time series, which have a more complicated stochastic structure; it is also a special case of the vector autoregressive model (VAR), which consists of a system of more than one interlocking stochastic difference equation in more than one evolving random variable.

In statistics and econometrics, and in particular in time series analysis, an autoregressive integrated moving average (ARIMA) model is a generalization of an autoregressive moving average (ARMA) model. To better comprehend the data or to forecast upcoming series points, both of these models are fitted to time series data. ARIMA models are applied in some cases where data show evidence of non-stationarity in the sense of expected value, where an initial differencing step can be applied one or more times to eliminate the non-stationarity of the mean function. When the seasonality shows in a time series, the seasonal-differencing could be applied to eliminate the seasonal component. Since the ARMA model, according to the Wold's decomposition theorem, is theoretically sufficient to describe a regular wide-sense stationary time series, we are motivated to make stationary a non-stationary time series, e.g., by using differencing, before we can use the ARMA model. Note that if the time series contains a predictable sub-process, the predictable component is treated as a non-zero-mean but periodic component in the ARIMA framework so that it is eliminated by the seasonal differencing.

A long-tailed or heavy-tailed distribution is one that assigns relatively high probabilities to regions far from the mean or median. A more formal mathematical definition is given below. In the context of teletraffic engineering a number of quantities of interest have been shown to have a long-tailed distribution. For example, if we consider the sizes of files transferred from a web server, then, to a good degree of accuracy, the distribution is heavy-tailed, that is, there are a large number of small files transferred but, crucially, the number of very large files transferred remains a major component of the volume downloaded.

In probability theory, fractional Brownian motion (fBm), also called a fractal Brownian motion, is a generalization of Brownian motion. Unlike classical Brownian motion, the increments of fBm need not be independent. fBm is a continuous-time Gaussian process on , that starts at zero, has expectation zero for all in , and has the following covariance function:

In statistics, stochastic volatility models are those in which the variance of a stochastic process is itself randomly distributed. They are used in the field of mathematical finance to evaluate derivative securities, such as options. The name derives from the models' treatment of the underlying security's volatility as a random process, governed by state variables such as the price level of the underlying security, the tendency of volatility to revert to some long-run mean value, and the variance of the volatility process itself, among others.

<span class="mw-page-title-main">Anomalous diffusion</span> Diffusion process with a non-linear relationship to time

Anomalous diffusion is a diffusion process with a non-linear relationship between the mean squared displacement (MSD), , and time. This behavior is in stark contrast to Brownian motion, the typical diffusion process described by Einstein and Smoluchowski, where the MSD is linear in time.

The Hurst exponent is used as a measure of long-term memory of time series. It relates to the autocorrelations of the time series, and the rate at which these decrease as the lag between pairs of values increases. Studies involving the Hurst exponent were originally developed in hydrology for the practical matter of determining optimum dam sizing for the Nile river's volatile rain and drought conditions that had been observed over a long period of time. The name "Hurst exponent", or "Hurst coefficient", derives from Harold Edwin Hurst (1880–1978), who was the lead researcher in these studies; the use of the standard notation H for the coefficient also relates to his name.

In probability theory and statistics, the covariance function describes how much two random variables change together (their covariance) with varying spatial or temporal separation. For a random field or stochastic process Z(x) on a domain D, a covariance function C(xy) gives the covariance of the values of the random field at the two locations x and y:

In stochastic processes, chaos theory and time series analysis, detrended fluctuation analysis (DFA) is a method for determining the statistical self-affinity of a signal. It is useful for analysing time series that appear to be long-memory processes or 1/f noise.

The rescaled range is a statistical measure of the variability of a time series introduced by the British hydrologist Harold Edwin Hurst (1880–1978). Its purpose is to provide an assessment of how the apparent variability of a series changes with the length of the time-period being considered.

Self-similar processes are stochastic processes satisfying a mathematically precise version of the self-similarity property. Several related properties have this name, and some are defined here.

In statistics, autoregressive fractionally integrated moving average models are time series models that generalize ARIMA (autoregressive integrated moving average) models by allowing non-integer values of the differencing parameter. These models are useful in modeling time series with long memory—that is, in which deviations from the long-run mean decay more slowly than an exponential decay. The acronyms "ARFIMA" or "FARIMA" are often used, although it is also conventional to simply extend the "ARIMA(p, d, q)" notation for models, by simply allowing the order of differencing, d, to take fractional values. Fractional differencing and the ARFIMA model were introduced in the early 1980s by Clive Granger, Roselyne Joyeux, and Jonathan Hosking.

In computer networks, self-similarity is a feature of network data transfer dynamics. When modeling network data dynamics the traditional time series models, such as an autoregressive moving average model are not appropriate. This is because these models only provide a finite number of parameters in the model and thus interaction in a finite time window, but the network data usually have a long-range dependent temporal structure. A self-similar process is one way of modeling network data dynamics with such a long range correlation. This article defines and describes network data transfer dynamics in the context of a self-similar process. Properties of the process are shown and methods are given for graphing and estimating parameters modeling the self-similarity of network data.

<span class="mw-page-title-main">Brownian surface</span>

A Brownian surface is a fractal surface generated via a fractal elevation function.