Self-similar process

Last updated August 03, 2023

Self-similar processes are types of stochastic processes that exhibit the phenomenon of self-similarity. A self-similar phenomenon behaves the same when viewed at different degrees of magnification, or different scales on a dimension (space or time). Self-similar processes can sometimes be described using heavy-tailed distributions, also known as long-tailed distributions. Examples of such processes include traffic processes, such as packet inter-arrival times and burst lengths. Self-similar processes can exhibit long-range dependency.

Overview

The design of robust and reliable networks and network services has become an increasingly challenging task in today's Internet world. To achieve this goal, understanding the characteristics of Internet traffic plays a more and more critical role. Empirical studies of measured traffic traces have led to the wide recognition of self-similarity in network traffic.^[1]

Self-similar Ethernet traffic exhibits dependencies over a long range of time scales. This is to be contrasted with telephone traffic which is Poisson in its arrival and departure process.^[2]

In traditional Poisson traffic, the short-term fluctuations would average out, and a graph covering a large amount of time would approach a constant value.

Heavy-tailed distributions have been observed in many natural phenomena including both physical and sociological phenomena. Mandelbrot established the use of heavy-tailed distributions to model real-world fractal phenomena, e.g. Stock markets, earthquakes, climate, and the weather.^{[ citation needed ]} Ethernet, WWW, SS7, TCP, FTP, TELNET and VBR video (digitised video of the type that is transmitted over ATM networks) traffic is self-similar.

Self-similarity in packetised data networks can be caused by the distribution of file sizes, human interactions and/ or Ethernet dynamics. Self-similar and long-range dependent characteristics in computer networks present a fundamentally different set of problems to people doing analysis and/or design of networks, and many of the previous assumptions upon which systems have been built are no longer valid in the presence of self-similarity.^[3]

The Poisson distribution

Before the heavy-tailed distribution is introduced mathematically, the Poisson process with a memoryless waiting-time distribution, used to model (among many things) traditional telephony networks, is briefly reviewed below.

Assuming pure-chance arrivals and pure-chance terminations leads to the following:

The number of call arrivals in a given time has a Poisson distribution, i.e.:

P(a)=\left({\frac {\mu ^{a}}{a!}}\right)e^{-\mu },

where a is the number of call arrivals in time T, and $\mu$ is the mean number of call arrivals in time T. For this reason, pure-chance traffic is also known as Poisson traffic.

The number of call departures in a given time, also has a Poisson distribution, i.e.:

P(d)=\left({\frac {\lambda ^{d}}{d!}}\right)e^{-\lambda },

where d is the number of call departures in time T and $\lambda$ is the mean number of call departures in time T.

The intervals, T, between call arrivals and departures are intervals between independent, identically distributed random events. It can be shown that these intervals have a negative exponential distribution, i.e.:

P[T\geq \ t]=e^{-t/h},\,

where h is the mean holding time (MHT).^{[ citation needed ]}

The heavy-tail distribution

A distribution is said to have a heavy tail if

\lim _{x\to \infty }e^{\lambda x}\Pr[X>x]=\infty \quad {\mbox{for all }}\lambda >0.\,

One simple example of a heavy-tailed distribution is the Pareto distribution.

Modelling self-similar traffic

Since (unlike traditional telephony traffic) packetised traffic exhibits self-similar or fractal characteristics, conventional traffic models do not apply to networks which carry self-similar traffic.^{[ citation needed ]}

With the convergence of voice and data, the future multi-service network will be based on packetised traffic, and models which accurately reflect the nature of self-similar traffic will be required to develop, design and dimension future multi-service networks.^{[ citation needed ]}

Previous analytic work done in Internet studies adopted assumptions such as exponentially-distributed packet inter-arrivals, and conclusions reached under such assumptions may be misleading or incorrect in the presence of heavy-tailed distributions.^[2]

Deriving mathematical models which accurately represent long-range dependent traffic is a fertile area of research.

Self-similar stochastic processes modeled by Tweedie distributions

Leland et al have provided a mathematical formalism to describe self-similar stochastic processes.^[4] For the sequence of numbers

Y=(Y_{i}:i=0,1,2,...,N)

with mean

{\hat {\mu }}={\text{E}}(Y_{i})

,

deviations

y_{i}=Y_{i}-{\hat {\mu }}

,

variance

{\hat {\sigma }}^{2}={\text{E}}(y_{i}^{2})

,

and autocorrelation function

r(k)={\text{E}}(y_{i},y_{i+k})/{\text{E}}(y_{i}^{2})

with lag k, if the autocorrelation of this sequence has the long range behavior

r(k)\sim k^{-d}L(k)

as k $\to \infty$ and where L(k) is a slowly varying function at large values of k, this sequence is called a self-similar process.

The method of expanding bins can be used to analyze self-similar processes. Consider a set of equal-sized non-overlapping bins that divides the original sequence of N elements into groups of m equal-sized segments (N/m is integer) so that new reproductive sequences, based on the mean values, can be defined:

Y_{i}^{(m)}=(Y_{im-m+1}+...+Y_{im})/m

.

The variance determined from this sequence will scale as the bin size changes such that

{\text{var}}[Y^{(m)}]={\hat {\sigma }}^{2}m^{-d}

if and only if the autocorrelation has the limiting form^[5]

\lim _{k\to \infty }r(k)/k^{-d}=(2-d)(1-d)/2

.

One can also construct a set of corresponding additive sequences

Z_{i}^{(m)}=mY_{i}^{(m)}

,

based on the expanding bins,

Z_{i}^{(m)}=(Y_{im-m+1}+...+Y_{im})

.

Provided the autocorrelation function exhibits the same behavior, the additive sequences will obey the relationship

{\text{var}}[Z_{i}^{(m)}]=m^{2}{\text{var}}[Y^{(m)}]=({\hat {\sigma }}^{2}/{\hat {\mu }}^{2-d}){\text{E}}[Z_{i}^{(m)}]^{2-d}

Since ${\hat {\mu }}$ and ${\hat {\sigma }}^{2}$ are constants this relationship constitutes a variance-to-mean power law (Taylor's law), with p=2-d.^[6]

Tweedie distributions are a special case of exponential dispersion models, a class of models used to describe error distributions for the generalized linear model.^[7]

These Tweedie distributions are characterized by an inherent scale invariance and thus for any random variable Y that obeys a Tweedie distribution, the variance var(Y) relates to the mean E(Y) by the power law,

{\text{var}}\,(Y)=a[{\text{E}}\,(Y)]^{p},

where a and p are positive constants. The exponent p for the variance to mean power law associated with certain self-similar stochastic processes ranges between 1 and 2 and thus may be modeled in part by a Tweedie compound Poisson–gamma distribution.^[6]

The additive form of the Tweedie compound Poisson-gamma model has the cumulant generating function (CGF),

K_{p}^{*}(s;\theta ,\lambda )=\lambda \kappa _{p}(\theta )[(1+s/\theta )^{\alpha }-1]

,

where

\kappa _{p}(\theta )={\dfrac {\alpha -1}{\alpha }}\left({\dfrac {\theta }{\alpha -1}}\right)^{\alpha }

,

is the cumulant function, α is the Tweedie exponent

\alpha ={\dfrac {p-2}{p-1}}

,

s is the generating function variable, θ is the canonical parameter and λ is the index parameter.

The first and second derivatives of the CGF, with s=0, yields the mean and variance, respectively. One can thus confirm that for the additive models the variance relates to the mean by the power law,

\mathrm {var} (Z)\propto \mathrm {E} (Z)^{p}

.

Whereas this Tweedie compound Poisson-gamma CGF will represent the probability density function for certain self-similar stochastic processes, it does not return information regarding the long range correlations inherent to the sequence Y.

Nonetheless, the Tweedie distributions provide a means understand the possible origins of self-similar stochastic processes for reason of their role as foci for a central limit-like convergence effect known as the Tweedie convergence theorem. In nontechnical terms this theorem tells us that any exponential dispersion model that asymptotically manifests a variance-to-mean power law is required to have a variance function that comes within the domain of attraction of a Tweedie model.

The Tweedie convergence theorem can be used to explain the origin of the variance to mean power law, 1/f noise and multifractality, features associated with self-similar processes.^[6]

Network performance

Network performance degrades gradually with increasing self-similarity. The more self-similar the traffic, the longer the queue size. The queue length distribution of self-similar traffic decays more slowly than with Poisson sources. However, long-range dependence implies nothing about its short-term correlations which affect performance in small buffers. Additionally, aggregating streams of self-similar traffic typically intensifies the self-similarity ("burstiness") rather than smoothing it, compounding the problem.^{[ citation needed ]}

Self-similar traffic exhibits the persistence of clustering which has a negative impact on network performance.

With Poisson traffic (found in conventional telephony networks), clustering occurs in the short term but smooths out over the long term.
With self-similar traffic, the bursty behaviour may itself be bursty, which exacerbates the clustering phenomena, and degrades network performance.

Many aspects of network quality of service depend on coping with traffic peaks that might cause network failures, such as

Cell/packet loss and queue overflow
Violation of delay bounds e.g. in video
Worst cases in statistical multiplexing

Poisson processes are well-behaved because they are stateless, and peak loading is not sustained, so queues do not fill. With long-range order, peaks last longer and have greater impact: the equilibrium shifts for a while.^[8]

Related Research Articles

<span class="mw-page-title-main">Negative binomial distribution</span> Probability distribution

In probability theory and statistics, the negative binomial distribution is a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed Bernoulli trials before a specified (non-random) number of successes occurs. For example, we can define rolling a 6 on a dice as a success, and rolling any other number as a failure, and ask how many failure rolls will occur before we see the third success. In such a case, the probability distribution of the number of failures that appear will be a negative binomial distribution.

In probability theory and statistics, the exponential distribution or negative exponential distribution is the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate. It is a particular case of the gamma distribution. It is the continuous analogue of the geometric distribution, and it has the key property of being memoryless. In addition to being used for the analysis of Poisson point processes it is found in various other contexts.

In physics, mathematics and statistics, scale invariance is a feature of objects or laws that do not change if scales of length, energy, or other variables, are multiplied by a common factor, and thus represent a universality.

In probability theory, a compound Poisson distribution is the probability distribution of the sum of a number of independent identically-distributed random variables, where the number of terms to be added is itself a Poisson-distributed variable. The result can be either a continuous or a discrete distribution.

A compound Poisson process is a continuous-time stochastic process with jumps. The jumps arrive randomly according to a Poisson process and the size of the jumps is also random, with a specified probability distribution. To be precise, a compound Poisson process, parameterised by a rate $and jump size distribution G, is a process given by$

A long-tailed or heavy-tailed distribution is one that assigns relatively high probabilities to regions far from the mean or median. A more formal mathematical definition is given below. In the context of teletraffic engineering a number of quantities of interest have been shown to have a long-tailed distribution. For example, if we consider the sizes of files transferred from a web server, then, to a good degree of accuracy, the distribution is heavy-tailed, that is, there are a large number of small files transferred but, crucially, the number of very large files transferred remains a major component of the volume downloaded.

In statistics and probability theory, a point process or point field is a collection of mathematical points randomly located on a mathematical space such as the real line or Euclidean space. Point processes can be used for spatial data analysis, which is of interest in such diverse disciplines as forestry, plant ecology, epidemiology, geography, seismology, materials science, astronomy, telecommunications, computational neuroscience, economics and others.

In probability theory, the inverse Gaussian distribution is a two-parameter family of continuous probability distributions with support on (0,∞).

In probability and statistics, a natural exponential family (NEF) is a class of probability distributions that is a special case of an exponential family (EF).

A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.

In queueing theory, a discipline within the mathematical theory of probability, the Pollaczek–Khinchine formula states a relationship between the queue length and service time distribution Laplace transforms for an M/G/1 queue. The term is also used to refer to the relationships between the mean queue length and mean waiting/service time in such a model.

In queueing theory, a discipline within the mathematical theory of probability, an M/M/1 queue represents the queue length in a system having a single server, where arrivals are determined by a Poisson process and job service times have an exponential distribution. The model name is written in Kendall's notation. The model is the most elementary of queueing models and an attractive object of study as closed-form expressions can be obtained for many metrics of interest in this model. An extension of this model with more than one server is the M/M/c queue.

In probability and statistics, the Tweedie distributions are a family of probability distributions which include the purely continuous normal, gamma and inverse Gaussian distributions, the purely discrete scaled Poisson distribution, and the class of compound Poisson–gamma distributions which have positive mass at zero, but are otherwise continuous. Tweedie distributions are a special case of exponential dispersion models and are often used as distributions for generalized linear models.

In probability and statistics, the class of exponential dispersion models (EDM) is a set of probability distributions that represents a generalisation of the natural exponential family. Exponential dispersion models play an important role in statistical theory, in particular in generalized linear models because they have a special structure which enables deductions to be made about appropriate statistical inference.

In probability theory and statistics, the Conway–Maxwell–Poisson distribution is a discrete probability distribution named after Richard W. Conway, William L. Maxwell, and Siméon Denis Poisson that generalizes the Poisson distribution by adding a parameter to model overdispersion and underdispersion. It is a member of the exponential family, has the Poisson distribution and geometric distribution as special cases and the Bernoulli distribution as a limiting case.

In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event. It is named after French mathematician Siméon Denis Poisson. The Poisson distribution can also be used for the number of events in other specified interval types such as distance, area, or volume. It plays an important role for discrete-stable distributions.

Taylor's power law is an empirical law in ecology that relates the variance of the number of individuals of a species per unit area of habitat to the corresponding mean by a power law relationship. It is named after the ecologist who first proposed it in 1961, Lionel Roy Taylor (1924–2007). Taylor's original name for this relationship was the law of the mean. The name Taylor's law was coined by Southwood in 1966.

In queueing theory, a discipline within the mathematical theory of probability, the M/M/∞ queue is a multi-server queueing model where every arrival experiences immediate service and does not wait. In Kendall's notation it describes a system where arrivals are governed by a Poisson process, there are infinitely many servers, so jobs do not need to wait for a server. Each job has an exponentially distributed service time. It is a limit of the M/M/c queue model where the number of servers c becomes very large.

In statistics, a zero-inflated model is a statistical model based on a zero-inflated probability distribution, i.e. a distribution that allows for frequent zero-valued observations.

A mixed Poisson distribution is a univariate discrete probability distribution in stochastics. It results from assuming that the conditional distribution of a random variable, given the value of the rate parameter, is a Poisson distribution, and that the rate parameter itself is considered as a random variable. Hence it is a special case of a compound probability distribution. Mixed Poisson distributions can be found in actuarial mathematics as a general approach for the distribution of the number of claims and is also examined as an epidemiological model. It should not be confused with compound Poisson distribution or compound Poisson process.

References

↑ Park, Kihong; Willinger, Walter (2000), Self-Similar Network Traffic and Performance Evaluation, New York, NY, USA: John Wiley & Sons, Inc., ISBN 0471319740 .
1 2 "Appendix: Heavy-tailed Distributions". Cs.bu.edu. 2001-04-12. Retrieved 2012-06-25.
↑ "The Self-Similarity and Long Range Dependence in Networks Web site". Cs.bu.edu. Retrieved 2012-06-25.
↑ Leland, W E; Leland, W. E.; M. S. Taqqu; W. Willinger; D. V. Wilson (1994). "On the self-similar nature of ethernet traffic". IEEE/ACM Trans. Netw. 2: 1–15. doi:10.1109/90.282603. S2CID 6011907.
↑ Tsybakov B & Georganas ND (1997) On self-similar traffic in ATM queues: definitions, overflow probability bound, and cell delay distribution. IEEE/ACM Trans. Netw. 5, 397–409
1 2 3 Kendal, Wayne S.; Jørgensen, Bent (2011-12-27). "Tweedie convergence: A mathematical basis for Taylor's power law, 1/f noise, and multifractality". Physical Review E. American Physical Society (APS). 84 (6): 066120. Bibcode:2011PhRvE..84f6120K. doi:10.1103/physreve.84.066120. ISSN 1539-3755. PMID 22304168.
↑ Jørgensen, Bent (1997). The theory of dispersion models. Chapman & Hall. ISBN 978-0412997112.
↑ "Everything you always wanted to know about Self-Similar Network Traffic and Long-Range Dependency, but were ashamed to ask*". Cs.kent.ac.uk. Retrieved 2012-06-25.

External links

A site offering numerous links to articles written on the effect of self-similar traffic on network performance.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Park, Kihong; Willinger, Walter (2000), Self-Similar Network Traffic and Performance Evaluation, New York, NY, USA: John Wiley & Sons, Inc., ISBN 0471319740 .

[autogenerated1-2] 1 2 "Appendix: Heavy-tailed Distributions". Cs.bu.edu. 2001-04-12. Retrieved 2012-06-25.

[3] "The Self-Similarity and Long Range Dependence in Networks Web site". Cs.bu.edu. Retrieved 2012-06-25.

[Leland1994-4] Leland, W E; Leland, W. E.; M. S. Taqqu; W. Willinger; D. V. Wilson (1994). "On the self-similar nature of ethernet traffic". IEEE/ACM Trans. Netw. 2: 1–15. doi:10.1109/90.282603. S2CID 6011907.

[Tsybakov1997-5] Tsybakov B & Georganas ND (1997) On self-similar traffic in ATM queues: definitions, overflow probability bound, and cell delay distribution. IEEE/ACM Trans. Netw. 5, 397–409

[Kendal2011b-6] 1 2 3 Kendal, Wayne S.; Jørgensen, Bent (2011-12-27). "Tweedie convergence: A mathematical basis for Taylor's power law, 1/f noise, and multifractality". Physical Review E. American Physical Society (APS). 84 (6): 066120. Bibcode:2011PhRvE..84f6120K. doi:10.1103/physreve.84.066120. ISSN 1539-3755. PMID 22304168.

[Jørgensen-1997-7] Jørgensen, Bent (1997). The theory of dispersion models. Chapman & Hall. ISBN 978-0412997112.

[8] "Everything you always wanted to know about Self-Similar Network Traffic and Long-Range Dependency, but were ashamed to ask*". Cs.kent.ac.uk. Retrieved 2012-06-25.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

v t e Stochastic processes
Discrete time	Bernoulli process Branching process Chinese restaurant process Galton–Watson process Independent and identically distributed random variables Markov chain Moran process Random walk Loop-erased Self-avoiding Biased Maximal entropy
Continuous time	Additive process Bessel process Birth–death process pure birth Brownian motion Bridge Excursion Fractional Geometric Meander Cauchy process Contact process Continuous-time random walk Cox process Diffusion process Empirical process Feller process Fleming–Viot process Gamma process Geometric process Hawkes process Hunt process Interacting particle systems Itô diffusion Itô process Jump diffusion Jump process Lévy process Local time Markov additive process McKean–Vlasov process Ornstein–Uhlenbeck process Poisson process Compound Non-homogeneous Schramm–Loewner evolution Semimartingale Sigma-martingale Stable process Superprocess Telegraph process Variance gamma process Wiener process Wiener sausage
Both	Branching process Galves–Löcherbach model Gaussian process Hidden Markov model (HMM) Markov process Martingale Differences Local Sub- Super- Random dynamical system Regenerative process Renewal process Stochastic chains with memory of variable length White noise
Fields and other	Dirichlet process Gaussian random field Gibbs measure Hopfield model Ising model Potts model Boolean network Markov random field Percolation Pitman–Yor process Point process Cox Poisson Random field Random graph
Time series models	Autoregressive conditional heteroskedasticity (ARCH) model Autoregressive integrated moving average (ARIMA) model Autoregressive (AR) model Autoregressive–moving-average (ARMA) model Generalized autoregressive conditional heteroskedasticity (GARCH) model Moving-average (MA) model
Financial models	Binomial options pricing model Black–Derman–Toy Black–Karasinski Black–Scholes Chan–Karolyi–Longstaff–Sanders (CKLS) Chen Constant elasticity of variance (CEV) Cox–Ingersoll–Ross (CIR) Garman–Kohlhagen Heath–Jarrow–Morton (HJM) Heston Ho–Lee Hull–White LIBOR market Rendleman–Bartter SABR volatility Vašíček Wilkie
Actuarial models	Bühlmann Cramér–Lundberg Risk process Sparre–Anderson
Queueing models	Bulk Fluid Generalized queueing network M/G/1 M/M/1 M/M/c
Properties	Càdlàg paths Continuous Continuous paths Ergodic Exchangeable Feller-continuous Gauss–Markov Markov Mixing Piecewise-deterministic Predictable Progressively measurable Self-similar Stationary Time-reversible
Limit theorems	Central limit theorem Donsker's theorem Doob's martingale convergence theorems Ergodic theorem Fisher–Tippett–Gnedenko theorem Large deviation principle Law of large numbers (weak/strong) Law of the iterated logarithm Maximal ergodic theorem Sanov's theorem Zero–one laws (Blumenthal, Borel–Cantelli, Engelbert–Schmidt, Hewitt–Savage, Kolmogorov, Lévy)
Inequalities	Burkholder–Davis–Gundy Doob's martingale Doob's upcrossing Kunita–Watanabe Marcinkiewicz–Zygmund
Tools	Cameron–Martin formula Convergence of random variables Doléans-Dade exponential Doob decomposition theorem Doob–Meyer decomposition theorem Doob's optional stopping theorem Dynkin's formula Feynman–Kac formula Filtration Girsanov theorem Infinitesimal generator Itô integral Itô's lemma Karhunen–Loève theorem Kolmogorov continuity theorem Kolmogorov extension theorem Lévy–Prokhorov metric Malliavin calculus Martingale representation theorem Optional stopping theorem Prokhorov's theorem Quadratic variation Reflection principle Skorokhod integral Skorokhod's representation theorem Skorokhod space Snell envelope Stochastic differential equation Tanaka Stopping time Stratonovich integral Uniform integrability Usual hypotheses Wiener space Classical Abstract
Disciplines	Actuarial mathematics Control theory Econometrics Ergodic theory Extreme value theory (EVT) Large deviations theory Mathematical finance Mathematical statistics Probability theory Queueing theory Renewal theory Ruin theory Signal processing Statistics Stochastic analysis Time series analysis Machine learning
List of topics Category