Self-similar process

Last updated

Self-similar processes are types of stochastic processes that exhibit the phenomenon of self-similarity. A self-similar phenomenon behaves the same when viewed at different degrees of magnification, or different scales on a dimension (space or time). Self-similar processes can sometimes be described using heavy-tailed distributions, also known as long-tailed distributions. Examples of such processes include traffic processes, such as packet inter-arrival times and burst lengths. Self-similar processes can exhibit long-range dependency.

Contents

Overview

The design of robust and reliable networks and network services has become an increasingly challenging task in today's Internet world. To achieve this goal, understanding the characteristics of Internet traffic plays a more and more critical role. Empirical studies of measured traffic traces have led to the wide recognition of self-similarity in network traffic. [1]

Self-similar Ethernet traffic exhibits dependencies over a long range of time scales. This is to be contrasted with telephone traffic which is Poisson in its arrival and departure process. [2]

In traditional Poisson traffic, the short-term fluctuations would average out, and a graph covering a large amount of time would approach a constant value.

Heavy-tailed distributions have been observed in many natural phenomena including both physical and sociological phenomena. Mandelbrot established the use of heavy-tailed distributions to model real-world fractal phenomena, e.g. Stock markets, earthquakes, climate, and the weather.[ citation needed ] Ethernet, WWW, SS7, TCP, FTP, TELNET and VBR video (digitised video of the type that is transmitted over ATM networks) traffic is self-similar.

Self-similarity in packetised data networks can be caused by the distribution of file sizes, human interactions and/ or Ethernet dynamics. Self-similar and long-range dependent characteristics in computer networks present a fundamentally different set of problems to people doing analysis and/or design of networks, and many of the previous assumptions upon which systems have been built are no longer valid in the presence of self-similarity. [3]

The Poisson distribution

Before the heavy-tailed distribution is introduced mathematically, the Poisson process with a memoryless waiting-time distribution, used to model (among many things) traditional telephony networks, is briefly reviewed below.

Assuming pure-chance arrivals and pure-chance terminations leads to the following:

where a is the number of call arrivals in time T, and is the mean number of call arrivals in time T. For this reason, pure-chance traffic is also known as Poisson traffic.

where d is the number of call departures in time T and is the mean number of call departures in time T.

where h is the mean holding time (MHT).[ citation needed ]

The heavy-tail distribution

A distribution is said to have a heavy tail if

One simple example of a heavy-tailed distribution is the Pareto distribution.

Modelling self-similar traffic

Since (unlike traditional telephony traffic) packetised traffic exhibits self-similar or fractal characteristics, conventional traffic models do not apply to networks which carry self-similar traffic.[ citation needed ]

With the convergence of voice and data, the future multi-service network will be based on packetised traffic, and models which accurately reflect the nature of self-similar traffic will be required to develop, design and dimension future multi-service networks.[ citation needed ]

Previous analytic work done in Internet studies adopted assumptions such as exponentially-distributed packet inter-arrivals, and conclusions reached under such assumptions may be misleading or incorrect in the presence of heavy-tailed distributions. [2]

Deriving mathematical models which accurately represent long-range dependent traffic is a fertile area of research.

Self-similar stochastic processes modeled by Tweedie distributions

Leland et al have provided a mathematical formalism to describe self-similar stochastic processes. [4] For the sequence of numbers

with mean

,

deviations

,

variance

,

and autocorrelation function

with lag k, if the autocorrelation of this sequence has the long range behavior

as k and where L(k) is a slowly varying function at large values of k, this sequence is called a self-similar process.

The method of expanding bins can be used to analyze self-similar processes. Consider a set of equal-sized non-overlapping bins that divides the original sequence of N elements into groups of m equal-sized segments (N/m is integer) so that new reproductive sequences, based on the mean values, can be defined:

.

The variance determined from this sequence will scale as the bin size changes such that

if and only if the autocorrelation has the limiting form [5]

.

One can also construct a set of corresponding additive sequences

,

based on the expanding bins,

.

Provided the autocorrelation function exhibits the same behavior, the additive sequences will obey the relationship

Since and are constants this relationship constitutes a variance-to-mean power law (Taylor's law), with p=2-d. [6]

Tweedie distributions are a special case of exponential dispersion models, a class of models used to describe error distributions for the generalized linear model. [7]

These Tweedie distributions are characterized by an inherent scale invariance and thus for any random variable Y that obeys a Tweedie distribution, the variance var(Y) relates to the mean E(Y) by the power law,

where a and p are positive constants. The exponent p for the variance to mean power law associated with certain self-similar stochastic processes ranges between 1 and 2 and thus may be modeled in part by a Tweedie compound Poisson–gamma distribution. [6]

The additive form of the Tweedie compound Poisson-gamma model has the cumulant generating function (CGF),

,

where

,

is the cumulant function, α is the Tweedie exponent

,

s is the generating function variable, θ is the canonical parameter and λ is the index parameter.

The first and second derivatives of the CGF, with s=0, yields the mean and variance, respectively. One can thus confirm that for the additive models the variance relates to the mean by the power law,

.

Whereas this Tweedie compound Poisson-gamma CGF will represent the probability density function for certain self-similar stochastic processes, it does not return information regarding the long range correlations inherent to the sequence Y.

Nonetheless, the Tweedie distributions provide a means understand the possible origins of self-similar stochastic processes for reason of their role as foci for a central limit-like convergence effect known as the Tweedie convergence theorem. In nontechnical terms this theorem tells us that any exponential dispersion model that asymptotically manifests a variance-to-mean power law is required to have a variance function that comes within the domain of attraction of a Tweedie model.

The Tweedie convergence theorem can be used to explain the origin of the variance to mean power law, 1/f noise and multifractality, features associated with self-similar processes. [6]

Network performance

Network performance degrades gradually with increasing self-similarity. The more self-similar the traffic, the longer the queue size. The queue length distribution of self-similar traffic decays more slowly than with Poisson sources. However, long-range dependence implies nothing about its short-term correlations which affect performance in small buffers. Additionally, aggregating streams of self-similar traffic typically intensifies the self-similarity ("burstiness") rather than smoothing it, compounding the problem.[ citation needed ]

Self-similar traffic exhibits the persistence of clustering which has a negative impact on network performance.

Many aspects of network quality of service depend on coping with traffic peaks that might cause network failures, such as

Poisson processes are well-behaved because they are stateless, and peak loading is not sustained, so queues do not fill. With long-range order, peaks last longer and have greater impact: the equilibrium shifts for a while. [8]

See also

Related Research Articles

<span class="mw-page-title-main">Negative binomial distribution</span> Probability distribution

In probability theory and statistics, the negative binomial distribution is a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed Bernoulli trials before a specified (non-random) number of successes occurs. For example, we can define rolling a 6 on a dice as a success, and rolling any other number as a failure, and ask how many failure rolls will occur before we see the third success. In such a case, the probability distribution of the number of failures that appear will be a negative binomial distribution.

<span class="mw-page-title-main">Exponential distribution</span> Probability distribution

In probability theory and statistics, the exponential distribution or negative exponential distribution is the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate. It is a particular case of the gamma distribution. It is the continuous analogue of the geometric distribution, and it has the key property of being memoryless. In addition to being used for the analysis of Poisson point processes it is found in various other contexts.

In physics, mathematics and statistics, scale invariance is a feature of objects or laws that do not change if scales of length, energy, or other variables, are multiplied by a common factor, and thus represent a universality.

In probability theory, a compound Poisson distribution is the probability distribution of the sum of a number of independent identically-distributed random variables, where the number of terms to be added is itself a Poisson-distributed variable. The result can be either a continuous or a discrete distribution.

A compound Poisson process is a continuous-time stochastic process with jumps. The jumps arrive randomly according to a Poisson process and the size of the jumps is also random, with a specified probability distribution. To be precise, a compound Poisson process, parameterised by a rate and jump size distribution G, is a process given by

A long-tailed or heavy-tailed distribution is one that assigns relatively high probabilities to regions far from the mean or median. A more formal mathematical definition is given below. In the context of teletraffic engineering a number of quantities of interest have been shown to have a long-tailed distribution. For example, if we consider the sizes of files transferred from a web server, then, to a good degree of accuracy, the distribution is heavy-tailed, that is, there are a large number of small files transferred but, crucially, the number of very large files transferred remains a major component of the volume downloaded.

In statistics and probability theory, a point process or point field is a collection of mathematical points randomly located on a mathematical space such as the real line or Euclidean space. Point processes can be used for spatial data analysis, which is of interest in such diverse disciplines as forestry, plant ecology, epidemiology, geography, seismology, materials science, astronomy, telecommunications, computational neuroscience, economics and others.

<span class="mw-page-title-main">Inverse Gaussian distribution</span> Family of continuous probability distributions

In probability theory, the inverse Gaussian distribution is a two-parameter family of continuous probability distributions with support on (0,∞).

In probability and statistics, a natural exponential family (NEF) is a class of probability distributions that is a special case of an exponential family (EF).

A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.

In queueing theory, a discipline within the mathematical theory of probability, the Pollaczek–Khinchine formula states a relationship between the queue length and service time distribution Laplace transforms for an M/G/1 queue. The term is also used to refer to the relationships between the mean queue length and mean waiting/service time in such a model.

<span class="mw-page-title-main">M/M/1 queue</span> Queue with Markov (Poisson) arrival process, exponential service time distribution and one server

In queueing theory, a discipline within the mathematical theory of probability, an M/M/1 queue represents the queue length in a system having a single server, where arrivals are determined by a Poisson process and job service times have an exponential distribution. The model name is written in Kendall's notation. The model is the most elementary of queueing models and an attractive object of study as closed-form expressions can be obtained for many metrics of interest in this model. An extension of this model with more than one server is the M/M/c queue.

In probability and statistics, the Tweedie distributions are a family of probability distributions which include the purely continuous normal, gamma and inverse Gaussian distributions, the purely discrete scaled Poisson distribution, and the class of compound Poisson–gamma distributions which have positive mass at zero, but are otherwise continuous. Tweedie distributions are a special case of exponential dispersion models and are often used as distributions for generalized linear models.

In probability and statistics, the class of exponential dispersion models (EDM) is a set of probability distributions that represents a generalisation of the natural exponential family. Exponential dispersion models play an important role in statistical theory, in particular in generalized linear models because they have a special structure which enables deductions to be made about appropriate statistical inference.

<span class="mw-page-title-main">Conway–Maxwell–Poisson distribution</span> Probability distribution

In probability theory and statistics, the Conway–Maxwell–Poisson distribution is a discrete probability distribution named after Richard W. Conway, William L. Maxwell, and Siméon Denis Poisson that generalizes the Poisson distribution by adding a parameter to model overdispersion and underdispersion. It is a member of the exponential family, has the Poisson distribution and geometric distribution as special cases and the Bernoulli distribution as a limiting case.

<span class="mw-page-title-main">Poisson distribution</span> Discrete probability distribution

In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event. It is named after French mathematician Siméon Denis Poisson. The Poisson distribution can also be used for the number of events in other specified interval types such as distance, area, or volume. It plays an important role for discrete-stable distributions.

Taylor's power law is an empirical law in ecology that relates the variance of the number of individuals of a species per unit area of habitat to the corresponding mean by a power law relationship. It is named after the ecologist who first proposed it in 1961, Lionel Roy Taylor (1924–2007). Taylor's original name for this relationship was the law of the mean. The name Taylor's law was coined by Southwood in 1966.

In queueing theory, a discipline within the mathematical theory of probability, the M/M/∞ queue is a multi-server queueing model where every arrival experiences immediate service and does not wait. In Kendall's notation it describes a system where arrivals are governed by a Poisson process, there are infinitely many servers, so jobs do not need to wait for a server. Each job has an exponentially distributed service time. It is a limit of the M/M/c queue model where the number of servers c becomes very large.

In statistics, a zero-inflated model is a statistical model based on a zero-inflated probability distribution, i.e. a distribution that allows for frequent zero-valued observations.

A mixed Poisson distribution is a univariate discrete probability distribution in stochastics. It results from assuming that the conditional distribution of a random variable, given the value of the rate parameter, is a Poisson distribution, and that the rate parameter itself is considered as a random variable. Hence it is a special case of a compound probability distribution. Mixed Poisson distributions can be found in actuarial mathematics as a general approach for the distribution of the number of claims and is also examined as an epidemiological model. It should not be confused with compound Poisson distribution or compound Poisson process.

References

  1. Park, Kihong; Willinger, Walter (2000), Self-Similar Network Traffic and Performance Evaluation, New York, NY, USA: John Wiley & Sons, Inc., ISBN   0471319740 .
  2. 1 2 "Appendix: Heavy-tailed Distributions". Cs.bu.edu. 2001-04-12. Retrieved 2012-06-25.
  3. "The Self-Similarity and Long Range Dependence in Networks Web site". Cs.bu.edu. Retrieved 2012-06-25.
  4. Leland, W E; Leland, W. E.; M. S. Taqqu; W. Willinger; D. V. Wilson (1994). "On the self-similar nature of ethernet traffic". IEEE/ACM Trans. Netw. 2: 1–15. doi:10.1109/90.282603. S2CID   6011907.
  5. Tsybakov B & Georganas ND (1997) On self-similar traffic in ATM queues: definitions, overflow probability bound, and cell delay distribution. IEEE/ACM Trans. Netw. 5, 397–409
  6. 1 2 3 Kendal, Wayne S.; Jørgensen, Bent (2011-12-27). "Tweedie convergence: A mathematical basis for Taylor's power law, 1/f noise, and multifractality". Physical Review E. American Physical Society (APS). 84 (6): 066120. Bibcode:2011PhRvE..84f6120K. doi:10.1103/physreve.84.066120. ISSN   1539-3755. PMID   22304168.
  7. Jørgensen, Bent (1997). The theory of dispersion models. Chapman & Hall. ISBN   978-0412997112.
  8. "Everything you always wanted to know about Self-Similar Network Traffic and Long-Range Dependency, but were ashamed to ask*". Cs.kent.ac.uk. Retrieved 2012-06-25.