Self-similar process

Last updated

Self-similar processes are types of stochastic processes that exhibit the phenomenon of self-similarity. A self-similar phenomenon behaves the same when viewed at different degrees of magnification, or different scales on a dimension (space or time). Self-similar processes can sometimes be described using heavy-tailed distributions, also known as long-tailed distributions. Examples of such processes include traffic processes, such as packet inter-arrival times and burst lengths. Self-similar processes can exhibit long-range dependency.

Contents

Overview

The design of robust and reliable networks and network services has become an increasingly challenging task in today's Internet world. To achieve this goal, understanding the characteristics of Internet traffic plays a more and more critical role. Empirical studies of measured traffic traces have led to the wide recognition of self-similarity in network traffic. [1]

Self-similar Ethernet traffic exhibits dependencies over a long range of time scales. This is to be contrasted with telephone traffic which is Poisson in its arrival and departure process. [2]

In traditional Poisson traffic, the short-term fluctuations would average out, and a graph covering a large amount of time would approach a constant value.

Heavy-tailed distributions have been observed in many natural phenomena including both physical and sociological phenomena. Mandelbrot established the use of heavy-tailed distributions to model real-world fractal phenomena, e.g. Stock markets, earthquakes, climate, and the weather.[ citation needed ] Ethernet, WWW, SS7, TCP, FTP, TELNET and VBR video (digitised video of the type that is transmitted over ATM networks) traffic is self-similar.

Self-similarity in packetised data networks can be caused by the distribution of file sizes, human interactions and/ or Ethernet dynamics. Self-similar and long-range dependent characteristics in computer networks present a fundamentally different set of problems to people doing analysis and/or design of networks, and many of the previous assumptions upon which systems have been built are no longer valid in the presence of self-similarity. [3]

The Poisson distribution

Before the heavy-tailed distribution is introduced mathematically, the Poisson process with a memoryless waiting-time distribution, used to model (among many things) traditional telephony networks, is briefly reviewed below.

Assuming pure-chance arrivals and pure-chance terminations leads to the following:

where a is the number of call arrivals in time T, and is the mean number of call arrivals in time T. For this reason, pure-chance traffic is also known as Poisson traffic.

where d is the number of call departures in time T and is the mean number of call departures in time T.

where h is the mean holding time (MHT).[ citation needed ]

The heavy-tail distribution

A distribution is said to have a heavy tail if

One simple example of a heavy-tailed distribution is the Pareto distribution.

Modelling self-similar traffic

Since (unlike traditional telephony traffic) packetised traffic exhibits self-similar or fractal characteristics, conventional traffic models do not apply to networks which carry self-similar traffic.[ citation needed ]

With the convergence of voice and data, the future multi-service network will be based on packetised traffic, and models which accurately reflect the nature of self-similar traffic will be required to develop, design and dimension future multi-service networks.[ citation needed ]

Previous analytic work done in Internet studies adopted assumptions such as exponentially-distributed packet inter-arrivals, and conclusions reached under such assumptions may be misleading or incorrect in the presence of heavy-tailed distributions. [2]

Deriving mathematical models which accurately represent long-range dependent traffic is a fertile area of research.

Self-similar stochastic processes modeled by Tweedie distributions

The Tweedie convergence theorem can be used to explain the origin of the variance to mean power law, 1/f noise and multifractality, features associated with self-similar processes. [4]

Network performance

Network performance degrades gradually with increasing self-similarity. The more self-similar the traffic, the longer the queue size. The queue length distribution of self-similar traffic decays more slowly than with Poisson sources. However, long-range dependence implies nothing about its short-term correlations which affect performance in small buffers. Additionally, aggregating streams of self-similar traffic typically intensifies the self-similarity ("burstiness") rather than smoothing it, compounding the problem.[ citation needed ]

Self-similar traffic exhibits the persistence of clustering which has a negative impact on network performance.

Many aspects of network quality of service depend on coping with traffic peaks that might cause network failures, such as

Poisson processes are well-behaved because they are stateless, and peak loading is not sustained, so queues do not fill. With long-range order, peaks last longer and have greater impact: the equilibrium shifts for a while. [5]

See also

Related Research Articles

<span class="mw-page-title-main">Queueing theory</span> Mathematical study of waiting lines, or queues

Queueing theory is the mathematical study of waiting lines, or queues. A queueing model is constructed so that queue lengths and waiting time can be predicted. Queueing theory is generally considered a branch of operations research because the results are often used when making business decisions about the resources needed to provide a service.

<span class="mw-page-title-main">Erlang distribution</span> Family of continuous probability distributions

The Erlang distribution is a two-parameter family of continuous probability distributions with support . The two parameters are:

In physics, mathematics and statistics, scale invariance is a feature of objects or laws that do not change if scales of length, energy, or other variables, are multiplied by a common factor, and thus represent a universality.

A traffic generation model is a stochastic model of the traffic flows or data sources in a communication network, for example a cellular network or a computer network. A packet generation model is a traffic generation model of the packet flows or data sources in a packet-switched network. For example, a web traffic model is a model of the data that is sent or received by a user's web-browser. These models are useful during the development of telecommunication technologies, in view to analyse the performance and capacity of various protocols, algorithms and network topologies.

A long-tailed or heavy-tailed distribution is one that assigns relatively high probabilities to regions far from the mean or median. A more formal mathematical definition is given below. In the context of teletraffic engineering a number of quantities of interest have been shown to have a long-tailed distribution. For example, if we consider the sizes of files transferred from a web server, then, to a good degree of accuracy, the distribution is heavy-tailed, that is, there are a large number of small files transferred but, crucially, the number of very large files transferred remains a major component of the volume downloaded.

In queueing theory, a discipline within the mathematical theory of probability, a Jackson network is a class of queueing network where the equilibrium distribution is particularly simple to compute as the network has a product-form solution. It was the first significant development in the theory of networks of queues, and generalising and applying the ideas of the theorem to search for similar product-form solutions in other networks has been the subject of much research, including ideas used in the development of the Internet. The networks were first identified by James R. Jackson and his paper was re-printed in the journal Management Science’s ‘Ten Most Influential Titles of Management Sciences First Fifty Years.’

A phase-type distribution is a probability distribution constructed by a convolution or mixture of exponential distributions. It results from a system of one or more inter-related Poisson processes occurring in sequence, or phases. The sequence in which each of the phases occurs may itself be a stochastic process. The distribution can be represented by a random variable describing the time until absorption of a Markov process with one absorbing state. Each of the states of the Markov process represents one of the phases.

In queueing theory, a discipline within the mathematical theory of probability, a G-network is an open network of G-queues first introduced by Erol Gelenbe as a model for queueing systems with specific control functions, such as traffic re-routing or traffic destruction, as well as a model for neural networks. A G-queue is a network of queues with several types of novel and useful customers:

In queueing theory, a discipline within the mathematical theory of probability, the Pollaczek–Khinchine formula states a relationship between the queue length and service time distribution Laplace transforms for an M/G/1 queue. The term is also used to refer to the relationships between the mean queue length and mean waiting/service time in such a model.

<span class="mw-page-title-main">M/M/1 queue</span> Queue with Markov (Poisson) arrival process, exponential service time distribution and one server

In queueing theory, a discipline within the mathematical theory of probability, an M/M/1 queue represents the queue length in a system having a single server, where arrivals are determined by a Poisson process and job service times have an exponential distribution. The model name is written in Kendall's notation. The model is the most elementary of queueing models and an attractive object of study as closed-form expressions can be obtained for many metrics of interest in this model. An extension of this model with more than one server is the M/M/c queue.

In probability and statistics, the Tweedie distributions are a family of probability distributions which include the purely continuous normal, gamma and inverse Gaussian distributions, the purely discrete scaled Poisson distribution, and the class of compound Poisson–gamma distributions which have positive mass at zero, but are otherwise continuous. Tweedie distributions are a special case of exponential dispersion models and are often used as distributions for generalized linear models.

<span class="mw-page-title-main">Fork–join queue</span> Type of queue

In queueing theory, a discipline within the mathematical theory of probability, a fork–join queue is a queue where incoming jobs are split on arrival for service by numerous servers and joined before departure. The model is often used for parallel computations or systems where products need to be obtained simultaneously from different suppliers. The key quantity of interest in this model is usually the time taken to service a complete job. The model has been described as a "key model for the performance analysis of parallel and distributed systems." Few analytical results exist for fork–join queues, but various approximations are known.

<span class="mw-page-title-main">Poisson distribution</span> Discrete probability distribution

In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known constant mean rate and independently of the time since the last event. It can also be used for the number of events in other types of intervals than time, and in dimension greater than 1.

In queueing theory, a discipline within the mathematical theory of probability, the M/M/c queue is a multi-server queueing model. In Kendall's notation it describes a system where arrivals form a single queue and are governed by a Poisson process, there are c servers, and job service times are exponentially distributed. It is a generalisation of the M/M/1 queue which considers only a single server. The model with infinitely many servers is the M/M/∞ queue.

<span class="mw-page-title-main">Lomax distribution</span> Heavy-tail probability distribution

The Lomax distribution, conditionally also called the Pareto Type II distribution, is a heavy-tail probability distribution used in business, economics, actuarial science, queueing theory and Internet traffic modeling. It is named after K. S. Lomax. It is essentially a Pareto distribution that has been shifted so that its support begins at zero.

In queueing theory, a discipline within the mathematical theory of probability, an M/G/1 queue is a queue model where arrivals are Markovian, service times have a General distribution and there is a single server. The model name is written in Kendall's notation, and is an extension of the M/M/1 queue, where service times must be exponentially distributed. The classic application of the M/G/1 queue is to model performance of a fixed head hard disk.

In queueing theory, a discipline within the mathematical theory of probability, the M/M/∞ queue is a multi-server queueing model where every arrival experiences immediate service and does not wait. In Kendall's notation it describes a system where arrivals are governed by a Poisson process, there are infinitely many servers, so jobs do not need to wait for a server. Each job has an exponentially distributed service time. It is a limit of the M/M/c queue model where the number of servers c becomes very large.

Design of robust and reliable networks and network services relies on an understanding of the traffic characteristics of the network. Throughout history, different models of network traffic have been developed and used for evaluating existing and proposed networks and services.

In queueing theory, a discipline within the mathematical theory of probability, an M/D/1 queue represents the queue length in a system having a single server, where arrivals are determined by a Poisson process and job service times are fixed (deterministic). The model name is written in Kendall's notation. Agner Krarup Erlang first published on this model in 1909, starting the subject of queueing theory. An extension of this model with more than one server is the M/D/c queue.

<span class="mw-page-title-main">Poisson point process</span> Type of random mathematical object

In probability theory, statistics and related fields, a Poisson point process is a type of random mathematical object that consists of points randomly located on a mathematical space with the essential feature that the points occur independently of one another. The Poisson point process is also called a Poisson random measure, Poisson random point field and Poisson point field. When the process is defined on the real number line, it is often called simply the Poisson process.

References

  1. Park, Kihong; Willinger, Walter (2000), Self-Similar Network Traffic and Performance Evaluation, New York, NY, USA: John Wiley & Sons, Inc., ISBN   0471319740 .
  2. 1 2 "Appendix: Heavy-tailed Distributions". Cs.bu.edu. 2001-04-12. Retrieved 2012-06-25.
  3. "The Self-Similarity and Long Range Dependence in Networks Web site". Cs.bu.edu. Retrieved 2012-06-25.
  4. Kendal, Wayne S.; Jørgensen, Bent (2011-12-27). "Tweedie convergence: A mathematical basis for Taylor's power law, 1/f noise, and multifractality". Physical Review E. 84 (6). American Physical Society (APS): 066120. Bibcode:2011PhRvE..84f6120K. doi:10.1103/physreve.84.066120. ISSN   1539-3755. PMID   22304168.
  5. "Everything you always wanted to know about Self-Similar Network Traffic and Long-Range Dependency, but were ashamed to ask*". Cs.kent.ac.uk. Retrieved 2012-06-25.