Misconceptions about the normal distribution

Last updated

Students of statistics and probability theory sometimes develop misconceptions about the normal distribution, ideas that may seem plausible but are mathematically untrue. For example, it is sometimes mistakenly thought that two linearly uncorrelated, normally distributed random variables must be statistically independent. However, this is untrue, as can be demonstrated by counterexample. Likewise, it is sometimes mistakenly thought that a linear combination of normally distributed random variables will itself be normally distributed, but again, counterexamples prove this wrong. [1] [2]

Contents

To say that the pair of random variables has a bivariate normal distribution means that every linear combination of and for constant (i.e. not random) coefficients and (not both equal to zero) has a univariate normal distribution. In that case, if and are uncorrelated then they are independent. [3] However, it is possible for two random variables and to be so distributed jointly that each one alone is marginally normally distributed, and they are uncorrelated, but they are not independent; examples are given below.

Examples

A symmetric example

Joint range of
X
{\displaystyle X}
and
Y
{\displaystyle Y}
. Darker indicates higher value of the density function. Uncorrelated sym.png
Joint range of and . Darker indicates higher value of the density function.

Suppose has a normal distribution with expected value 0 and variance 1. Let have the Rademacher distribution, so that or , each with probability 1/2, and assume is independent of . Let . Then and are uncorrelated, as can be verified by calculating their covariance. Moreover, both have the same normal distribution. And yet, and are not independent. [4] [1] [5]

To see that and are not independent, observe that or that .

Finally, the distribution of the simple linear combination concentrates positive probability at 0: . Therefore, the random variable is not normally distributed, and so also and are not jointly normally distributed (by the definition above). [4]

An asymmetric example

The joint density of
X
{\displaystyle X}
and
Y
{\displaystyle Y}
. Darker indicates a higher value of the density. Uncorrelated asym.png
The joint density of and . Darker indicates a higher value of the density.

Suppose has a normal distribution with expected value 0 and variance 1. Let

where is a positive number to be specified below. If is very small, then the correlation is near if is very large, then is near 1. Since the correlation is a continuous function of , the intermediate value theorem implies there is some particular value of that makes the correlation 0. That value is approximately 1.54. [2] [note 1] In that case, and are uncorrelated, but they are clearly not independent, since completely determines .

To see that is normally distributedindeed, that its distribution is the same as that of one may compute its cumulative distribution function: [6]

where the next-to-last equality follows from the symmetry of the distribution of and the symmetry of the condition that .

In this example, the difference is nowhere near being normally distributed, since it has a substantial probability (about 0.88) of it being equal to 0. By contrast, the normal distribution, being a continuous distribution, has no discrete partthat is, it does not concentrate more than zero probability at any single point. Consequently and are not jointly normally distributed, even though they are separately normally distributed. [2]

Examples with support almost everywhere in the plane

Suppose that the coordinates of a random point in the plane are chosen according to the probability density function

Then the random variables and are uncorrelated, and each of them is normally distributed (with mean 0 and variance 1), but they are not independent. [7] :93

It is well-known that the ratio of two independent standard normal random deviates and has a Cauchy distribution. [8] [9] [7] :122 One can equally well start with the Cauchy random variable and derive the conditional distribution of to satisfy the requirement that with and independent and standard normal. It follows that

in which is a Rademacher random variable and is a Chi-squared random variable with two degrees of freedom.

Consider two sets of , . Note that is not indexed by – that is, the same Cauchy random variable is used in the definition of both and . This sharing of results in dependences across indices: neither nor is independent of . Nevertheless all of the and are uncorrelated as the bivariate distributions all have reflection symmetry across the axes.[ citation needed ]

Non-normal joint distributions with normal marginals. Normal marginals.png
Non-normal joint distributions with normal marginals.

The figure shows scatterplots of samples drawn from the above distribution. This furnishes two examples of bivariate distributions that are uncorrelated and have normal marginal distributions but are not independent. The left panel shows the joint distribution of and ; the distribution has support everywhere but at the origin. The right panel shows the joint distribution of and ; the distribution has support everywhere except along the axes and has a discontinuity at the origin: the density diverges when the origin is approached along any straight path except along the axes.

See also

Related Research Articles

<span class="mw-page-title-main">Binomial distribution</span> Probability distribution

In probability theory and statistics, the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes–no question, and each with its own Boolean-valued outcome: success or failure. A single success/failure experiment is also called a Bernoulli trial or Bernoulli experiment, and a sequence of outcomes is called a Bernoulli process; for a single trial, i.e., n = 1, the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis for the popular binomial test of statistical significance.

<span class="mw-page-title-main">Cumulative distribution function</span> Probability that random variable X is less than or equal to x

In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable , or just distribution function of , evaluated at , is the probability that will take a value less than or equal to .

<span class="mw-page-title-main">Random variable</span> Variable representing a random phenomenon

A random variable is a mathematical formalization of a quantity or object which depends on random events. The term 'random variable' in its mathematical definition refers to neither randomness nor variability but instead is a mathematical function in which

<span class="mw-page-title-main">Independence (probability theory)</span> When the occurrence of one event does not affect the likelihood of another

Independence is a fundamental notion in probability theory, as in statistics and the theory of stochastic processes. Two events are independent, statistically independent, or stochastically independent if, informally speaking, the occurrence of one does not affect the probability of occurrence of the other or, equivalently, does not affect the odds. Similarly, two random variables are independent if the realization of one does not affect the probability distribution of the other.

<span class="mw-page-title-main">Variance</span> Statistical measure of how far values spread from their average

In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value. It is the second central moment of a distribution, and the covariance of the random variable with itself, and it is often represented by , , , , or .

In probability theory, the central limit theorem (CLT) states that, under appropriate conditions, the distribution of a normalized version of the sample mean converges to a standard normal distribution. This holds even if the original variables themselves are not normally distributed. There are several versions of the CLT, each applying in the context of different conditions.

<span class="mw-page-title-main">Multivariate normal distribution</span> Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables, each of which clusters around a mean value.

<span class="mw-page-title-main">Law of large numbers</span> Averages of repeated trials converge to the expected value

In probability theory, the law of large numbers (LLN) is a mathematical theorem that states that the average of the results obtained from a large number of independent and identical random samples converges to the true value, if it exists. More formally, the LLN states that given a sample of independent and identically distributed values, the sample mean converges to the true mean.

<span class="mw-page-title-main">Correlation</span> Statistical concept

In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics it usually refers to the degree to which a pair of variables are linearly related. Familiar examples of dependent phenomena include the correlation between the height of parents and their offspring, and the correlation between the price of a good and the quantity the consumers are willing to purchase, as it is depicted in the so-called demand curve.

In probability theory and statistics, two real-valued random variables, , , are said to be uncorrelated if their covariance, , is zero. If two variables are uncorrelated, there is no linear relationship between them.

<span class="mw-page-title-main">Bernoulli distribution</span> Probability distribution modeling a coin toss which need not be fair

In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli, is the discrete probability distribution of a random variable which takes the value 1 with probability and the value 0 with probability . Less formally, it can be thought of as a model for the set of possible outcomes of any single experiment that asks a yes–no question. Such questions lead to outcomes that are Boolean-valued: a single bit whose value is success/yes/true/one with probability p and failure/no/false/zero with probability q. It can be used to represent a coin toss where 1 and 0 would represent "heads" and "tails", respectively, and p would be the probability of the coin landing on heads. In particular, unfair coins would have

<span class="mw-page-title-main">Pearson correlation coefficient</span> Measure of linear correlation

In statistics, the Pearson correlation coefficient (PCC) is a correlation coefficient that measures linear correlation between two sets of data. It is the ratio between the covariance of two variables and the product of their standard deviations; thus, it is essentially a normalized measurement of the covariance, such that the result always has a value between −1 and 1. As with covariance itself, the measure can only reflect a linear correlation of variables, and ignores many other types of relationships or correlations. As a simple example, one would expect the age and height of a sample of teenagers from a high school to have a Pearson correlation coefficient significantly greater than 0, but less than 1.

<span class="mw-page-title-main">Rayleigh distribution</span> Probability distribution

In probability theory and statistics, the Rayleigh distribution is a continuous probability distribution for nonnegative-valued random variables. Up to rescaling, it coincides with the chi distribution with two degrees of freedom. The distribution is named after Lord Rayleigh.

In probability theory and statistics, the Rademacher distribution is a discrete probability distribution where a random variate X has a 50% chance of being +1 and a 50% chance of being -1.

A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.

<span class="mw-page-title-main">Poisson distribution</span> Discrete probability distribution

In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known constant mean rate and independently of the time since the last event. It can also be used for the number of events in other types of intervals than time, and in dimension greater than 1.

A product distribution is a probability distribution constructed as the distribution of the product of random variables having two other known distributions. Given two statistically independent random variables X and Y, the distribution of the random variable Z that is formed as the product is a product distribution.

In probability theory, concentration inequalities provide mathematical bounds on the probability of a random variable deviating from some value.

In probability theory and statistics, cokurtosis is a measure of how much two random variables change together. Cokurtosis is the fourth standardized cross central moment. If two random variables exhibit a high level of cokurtosis they will tend to undergo extreme positive and negative deviations at the same time.

<span class="mw-page-title-main">Complex random variable</span>

In probability theory and statistics, complex random variables are a generalization of real-valued random variables to complex numbers, i.e. the possible values a complex random variable may take are complex numbers. Complex random variables can always be considered as pairs of real random variables: their real and imaginary parts. Therefore, the distribution of one complex random variable may be interpreted as the joint distribution of two real random variables.

References

  1. 1 2 Rosenthal, Jeffrey S. (2005). "A Rant About Uncorrelated Normal Random Variables".
  2. 1 2 3 Melnick, Edward L.; Tenenbein, Aaron (November 1982). "Misspecifications of the Normal Distribution". The American Statistician . 36 (4): 372–373. doi:10.1080/00031305.1982.10483052.
  3. Hogg, Robert; Tanis, Elliot (2001). "Chapter 5.4 The Bivariate Normal Distribution". Probability and Statistical Inference (6th ed.). Prentice Hall. pp. 258–259. ISBN   0130272949.
  4. 1 2 Ash, Robert B. "Lecture 21. The Multivariate Normal Distribution" (PDF). Lectures on Statistics. Archived from the original (PDF) on 2007-07-14.
  5. Romano, Joesph P.; Siegel, Andrew F. (1986). Counterexamples in Probability and Statistics. Wadsworth & Brooks/Cole. pp. 65–66. ISBN   0-534-05568-0.
  6. Wise, Gary L.; Hall, Eric B. (1993). Counterexamples in Probability and Real Analysis. Oxford University Press. pp. 140–141. ISBN   0-19-507068-2.
  7. 1 2 Stoyanov, Jordan M. (2013). Counterexamples in Probability (3rd ed.). Dover. ISBN   978-0-486-49998-7.
  8. Patel, Jagdish K.; Read, Campbell B. (1996). Handbook of the Normal Distribution (2nd ed.). Taylor and Francis. p. 113. ISBN   978-0-824-79342-5.
  9. Krishnamoorthy, K. (2006). Handbook of Statistical Distributions with Applications. CRC Press. p. 278. ISBN   978-1-420-01137-1.
Notes
  1. More precisely 1.53817..., the square root of the median of a chi-squared distribution with 3 degrees of freedom.