Range (statistics)

Last updated

In statistics, the range of a set of data is the difference between the largest and smallest values. It can give you a rough idea of how the outcome of the data set will be before you look at it actually [1] Difference here is specific, the range of a set of data is the result of subtracting the smallest value from largest value.

Contents

However, in descriptive statistics, this concept of range has a more complex meaning. The range is the size of the smallest interval (statistics) which contains all the data and provides an indication of statistical dispersion. It is measured in the same units as the data. Since it only depends on two of the observations, it is most useful in representing the dispersion of small data sets. [2] Range happens to be the lowest and the hightest numbers subtracted

For continuous IID random variables

For n independent and identically distributed continuous random variables X1, X2, ..., Xn with cumulative distribution function G(x) and probability density function g(x). Let T denote the range of a sample of size n from a population with distribution function G(x).

Distribution

The range has cumulative distribution function [3] [4]

Gumbel notes that the "beauty of this formula is completely marred by the facts that, in general, we cannot express G(x + t) by G(x), and that the numerical integration is lengthy and tiresome." [3]

If the distribution of each Xi is limited to the right (or left) then the asymptotic distribution of the range is equal to the asymptotic distribution of the largest (smallest) value. For more general distributions the asymptotic distribution can be expressed as a Bessel function. [3]

Moments

The mean range is given by [5]

where x(G) is the inverse function. In the case where each of the Xi has a standard normal distribution, the mean range is given by [6]

For continuous non-IID random variables

For n nonidentically distributed independent continuous random variables X1, X2, ..., Xn with cumulative distribution functions G1(x), G2(x), ..., Gn(x) and probability density functions g1(x), g2(x), ..., gn(x), the range has cumulative distribution function [4]

For discrete IID random variables

For n independent and identically distributed discrete random variables X1, X2, ..., Xn with cumulative distribution function G(x) and probability mass function g(x) the range of the Xi is the range of a sample of size n from a population with distribution function G(x). We can assume without loss of generality that the support of each Xi is {1,2,3,...,N} where N is a positive integer or infinity. [7] [8]

Distribution

The range has probability mass function [7] [9] [10]

Example

If we suppose that g(x) = 1/N, the discrete uniform distribution for all x, then we find [9] [11]

Derivation

The probability of having a specific range value, t, can be determined by adding the probabilities of having two samples differing by t, and every other sample having a value between the two extremes. The probability of one sample having a value of x is . The probability of another having a value t greater than x is:

The probability of all other values lying between these two extremes is:

Combining the three together yields:

The range is a simple function of the sample maximum and minimum and these are specific examples of order statistics. In particular, the range is a linear function of order statistics, which brings it into the scope of L-estimation.

See also

Related Research Articles

Cumulative distribution function Probability that random variable X is less than or equal to x.

In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable , or just distribution function of , evaluated at , is the probability that will take a value less than or equal to .

Cauchy distribution Probability distribution

The Cauchy distribution, named after Augustin Cauchy, is a continuous probability distribution. It is also known, especially among physicists, as the Lorentz distribution, Cauchy–Lorentz distribution, Lorentz(ian) function, or Breit–Wigner distribution. The Cauchy distribution is the distribution of the x-intercept of a ray issuing from with a uniformly distributed angle. It is also the distribution of the ratio of two independent normally distributed random variables with mean zero.

In probability theory, the expected value of a random variable is closely related to the weighted average and intuitively is the arithmetic mean of a large number of independent realizations of that variable. The expected value is also known as the expectation, mathematical expectation, mean, average, or first moment.

Kolmogorov–Smirnov test Non-parametric statistical test between two distributions

In statistics, the Kolmogorov–Smirnov test is a nonparametric test of the equality of continuous, one-dimensional probability distributions that can be used to compare a sample with a reference probability distribution, or to compare two samples. It is named after Andrey Kolmogorov and Nikolai Smirnov.

Lorenz curve graphical representation of the distribution of income or wealth; shows, for the bottom x% of households, what percentage (y%) of the total income they have

In economics, the Lorenz curve is a graphical representation of the distribution of income or of wealth. It was developed by Max O. Lorenz in 1905 for representing inequality of the wealth distribution.

Median middle quantile of a data set or probability distribution

In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population or a probability distribution. For a data set, it may be thought of as the "middle" value. For example, the basic advantage of the median in describing data compared to the mean is that it is not skewed so much by a small proportion of extremely large or small values, and so it may give a better idea of a "typical" value. For example, in understanding statistics like household income or assets, which vary greatly, the mean may be skewed by a small number of extremely high or low values. Median income, for example, may be a better way to suggest what a "typical" income is. Because of this, the median is of central importance in robust statistics, as it is the most resistant statistic, having a breakdown point of 50%: so long as no more than half the data are contaminated, the median will not give an arbitrarily large or small result.

Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set of axioms. Typically these axioms formalise probability in terms of a probability space, which assigns a measure taking values between 0 and 1, termed the probability measure, to a set of outcomes called the sample space. Any specified subset of these outcomes is called an event.

In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment.

Random variable Variable representing a random phenomenon

In probability and statistics, a random variable, random quantity, aleatory variable, or stochastic variable is described informally as a variable whose values depend on outcomes of a random phenomenon. The formal mathematical treatment of random variables is a topic in probability theory. In that context, a random variable is understood as a measurable function defined on a probability space that maps from the sample space to the real numbers.

Probability density function Function whose integral over a region describes the probability of an event occurring in that region

In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample in the sample space can be interpreted as providing a relative likelihood that the value of the random variable would equal that sample. In other words, while the absolute likelihood for a continuous random variable to take on any particular value is 0, the value of the PDF at two different samples can be used to infer, in any particular draw of the random variable, how much more likely it is that the random variable would equal one sample compared to the other sample.

Probability mass function Discrete-variable probability distribution

In probability and statistics, a probability mass function (PMF) is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes it is also known as the discrete density function. The probability mass function is often the primary means of defining a discrete probability distribution, and such functions exist for either scalar or multivariate random variables whose domain is discrete.

In mathematics, a moment is a specific quantitative measure of the shape of a function.

In probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability distribution of the variables contained in the subset. It gives the probabilities of various values of the variables in the subset without reference to the values of the other variables. This contrasts with a conditional distribution, which gives the probabilities contingent upon the values of the other variables.

In probability and statistics, a mixture distribution is the probability distribution of a random variable that is derived from a collection of other random variables as follows: first, a random variable is selected by chance from the collection according to given probabilities of selection, and then the value of the selected random variable is realized. The underlying random variables may be random real numbers, or they may be random vectors, in which case the mixture distribution is a multivariate distribution.

Joint probability distribution statistics

Given random variables , that are defined on a probability space, the joint probability distribution for is a probability distribution that gives the probability that each of falls in any particular range or discrete set of values specified for that variable. In the case of only two random variables, this is called a bivariate distribution, but the concept generalizes to any number of random variables, giving a multivariate distribution.

Characteristic function (probability theory) real-valued random variable completely defines its probability distribution

In probability theory and statistics, the characteristic function of any real-valued random variable completely defines its probability distribution. If a random variable admits a probability density function, then the characteristic function is the Fourier transform of the probability density function. Thus it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or cumulative distribution functions. There are particularly simple results for the characteristic functions of distributions defined by the weighted sums of random variables.

In probability theory, an empirical process is a stochastic process that describes the proportion of objects in a system in a given state. For a process in a discrete state space a population continuous time Markov chain or Markov population model is a process which counts the number of objects in a given state . In mean field theory, limit theorems are considered and generalise the central limit theorem for empirical measures. Applications of the theory of empirical processes arise in non-parametric statistics.

The mean absolute difference (univariate) is a measure of statistical dispersion equal to the average absolute difference of two independent values drawn from a probability distribution. A related statistic is the relative mean absolute difference, which is the mean absolute difference divided by the arithmetic mean, and equal to twice the Gini coefficient. The mean absolute difference is also known as the absolute mean difference and the Gini mean difference (GMD). The mean absolute difference is sometimes denoted by Δ or as MD.

Studentized range distribution

In probability and statistics, studentized range distribution is the continuous probability distribution of the studentized range of an i.i.d. sample from a normally distributed population.

V-statistics are a class of statistics named for Richard von Mises who developed their asymptotic distribution theory in a fundamental paper in 1947. V-statistics are closely related to U-statistics introduced by Wassily Hoeffding in 1948. A V-statistic is a statistical function defined by a particular statistical functional of a probability distribution.

References

  1. George Woodbury (2001). An Introduction to Statistics. Cengage Learning. p. 74. ISBN   0534377556.
  2. Carin Viljoen (2000). Elementary Statistics: Vol 2. Pearson South Africa. pp. 7–27. ISBN   186891075X.
  3. 1 2 3 E. J. Gumbel (1947). "The Distribution of the Range". The Annals of Mathematical Statistics. 18 (3): 384–412. doi: 10.1214/aoms/1177730387 . JSTOR   2235736.
  4. 1 2 Tsimashenka, I.; Knottenbelt, W.; Harrison, P. (2012). "Controlling Variability in Split-Merge Systems". Analytical and Stochastic Modeling Techniques and Applications (PDF). Lecture Notes in Computer Science. 7314. p. 165. doi:10.1007/978-3-642-30782-9_12. ISBN   978-3-642-30781-2.
  5. H. O. Hartley; H. A. David (1954). "Universal Bounds for Mean Range and Extreme Observation". The Annals of Mathematical Statistics. 25 (1): 85–99. doi: 10.1214/aoms/1177728848 . JSTOR   2236514.
  6. L. H. C. Tippett (1925). "On the Extreme Individuals and the Range of Samples Taken from a Normal Population". Biometrika. 17 (3/4): 364–387. doi:10.1093/biomet/17.3-4.364. JSTOR   2332087.
  7. 1 2 Evans, D. L.; Leemis, L. M.; Drew, J. H. (2006). "The Distribution of Order Statistics for Discrete Random Variables with Applications to Bootstrapping". INFORMS Journal on Computing. 18: 19. doi:10.1287/ijoc.1040.0105.
  8. Irving W. Burr (1955). "Calculation of Exact Sampling Distribution of Ranges from a Discrete Population". The Annals of Mathematical Statistics. 26 (3): 530–532. doi: 10.1214/aoms/1177728500 . JSTOR   2236482.
  9. 1 2 Abdel-Aty, S. H. (1954). "Ordered variables in discontinuous distributions". Statistica Neerlandica. 8 (2): 61–82. doi:10.1111/j.1467-9574.1954.tb00442.x.
  10. Siotani, M. (1956). "Order statistics for discrete case with a numerical application to the binomial distribution". Annals of the Institute of Statistical Mathematics. 8: 95–96. doi:10.1007/BF02863574.
  11. Paul R. Rider (1951). "The Distribution of the Range in Samples from a Discrete Rectangular Population". Journal of the American Statistical Association . 46 (255): 375–378. doi:10.1080/01621459.1951.10500796. JSTOR   2280515.