**Nonparametric statistics** is the branch of statistics that is not based solely on parametrized families of probability distributions (common examples of parameters are the mean and variance). Nonparametric statistics is based on either being distribution-free or having a specified distribution but with the distribution's parameters unspecified. Nonparametric statistics includes both descriptive statistics and statistical inference. Nonparametric tests are often used when the assumptions of parametric tests are violated.^{ [1] }

The term "nonparametric statistics" has been imprecisely defined in the following two ways, among others.

- The first meaning of
*nonparametric*covers techniques that do not rely on data belonging to any particular parametric family of probability distributions. These include, among others:*distribution free*methods, which do not rely on assumptions that the data are drawn from a given parametric family of probability distributions. As such it is the opposite of parametric statistics.*nonparametric statistics*(a statistic is defined to be a function on a sample; no dependency on a parameter).

Order statistics, which are based on the ranks of observations, is one example of such statistics.

The following discussion is taken from

*Kendall's*.^{ [2] }Statistical hypotheses concern the behavior of observable random variables.... For example, the hypothesis (a) that a normal distribution has a specified mean and variance is statistical; so is the hypothesis (b) that it has a given mean but unspecified variance; so is the hypothesis (c) that a distribution is of normal form with both mean and variance unspecified; finally, so is the hypothesis (d) that two unspecified continuous distributions are identical.

It will have been noticed that in the examples (a) and (b) the distribution underlying the observations was taken to be of a certain form (the normal) and the hypothesis was concerned entirely with the value of one or both of its parameters. Such a hypothesis, for obvious reasons, is called

*parametric*.Hypothesis (c) was of a different nature, as no parameter values are specified in the statement of the hypothesis; we might reasonably call such a hypothesis

*non-parametric*. Hypothesis (d) is also non-parametric but, in addition, it does not even specify the underlying form of the distribution and may now be reasonably termed*distribution-free*. Notwithstanding these distinctions, the statistical literature now commonly applies the label "non-parametric" to test procedures that we have just termed "distribution-free", thereby losing a useful classification. - The second meaning of
*non-parametric*covers techniques that do not assume that the*structure*of a model is fixed. Typically, the model grows in size to accommodate the complexity of the data. In these techniques, individual variables*are*typically assumed to belong to parametric distributions, and assumptions about the types of connections among variables are also made. These techniques include, among others:*non-parametric regression*, which is modeling whereby the structure of the relationship between variables is treated non-parametrically, but where nevertheless there may be parametric assumptions about the distribution of model residuals.*non-parametric hierarchical Bayesian models*, such as models based on the Dirichlet process, which allow the number of latent variables to grow as necessary to fit the data, but where individual variables still follow parametric distributions and even the process controlling the rate of growth of latent variables follows a parametric distribution.

Non-parametric methods are widely used for studying populations that take on a ranked order (such as movie reviews receiving one to four stars). The use of non-parametric methods may be necessary when data have a ranking but no clear numerical interpretation, such as when assessing preferences. In terms of levels of measurement, non-parametric methods result in ordinal data.

As non-parametric methods make fewer assumptions, their applicability is much wider than the corresponding parametric methods. In particular, they may be applied in situations where less is known about the application in question. Also, due to the reliance on fewer assumptions, non-parametric methods are more robust.

Another justification for the use of non-parametric methods is simplicity. In certain cases, even when the use of parametric methods is justified, non-parametric methods may be easier to use. Due both to this simplicity and to their greater robustness, non-parametric methods are seen by some statisticians as leaving less room for improper use and misunderstanding.

The wider applicability and increased robustness of non-parametric tests comes at a cost: in cases where a parametric test would be appropriate, non-parametric tests have less power. In other words, a larger sample size can be required to draw conclusions with the same degree of confidence.

*Non-parametric models* differ from parametric models in that the model structure is not specified *a priori* but is instead determined from data. The term *non-parametric* is not meant to imply that such models completely lack parameters but that the number and nature of the parameters are flexible and not fixed in advance.

- A histogram is a simple nonparametric estimate of a probability distribution.
- Kernel density estimation provides better estimates of the density than histograms.
- Nonparametric regression and semiparametric regression methods have been developed based on kernels, splines, and wavelets.
- Data envelopment analysis provides efficiency coefficients similar to those obtained by multivariate analysis without any distributional assumption.
- KNNs classify the unseen instance based on the K points in the training set which are nearest to it.
- A support vector machine (with a Gaussian kernel) is a nonparametric large-margin classifier.
- The method of moments with polynomial probability distributions.

**Non-parametric** (or **distribution-free**) **inferential statistical methods** are mathematical procedures for statistical hypothesis testing which, unlike parametric statistics, make no assumptions about the probability distributions of the variables being assessed. The most frequently used tests include

- Analysis of similarities
- Anderson–Darling test: tests whether a sample is drawn from a given distribution
- Statistical bootstrap methods: estimates the accuracy/sampling distribution of a statistic
- Cochran's Q: tests whether
*k*treatments in randomized block designs with 0/1 outcomes have identical effects - Cohen's kappa: measures inter-rater agreement for categorical items
- Friedman two-way analysis of variance by ranks: tests whether
*k*treatments in randomized block designs have identical effects - Kaplan–Meier: estimates the survival function from lifetime data, modeling censoring
- Kendall's tau: measures statistical dependence between two variables
- Kendall's W: a measure between 0 and 1 of inter-rater agreement
- Kolmogorov–Smirnov test: tests whether a sample is drawn from a given distribution, or whether two samples are drawn from the same distribution
- Kruskal–Wallis one-way analysis of variance by ranks: tests whether > 2 independent samples are drawn from the same distribution
- Kuiper's test: tests whether a sample is drawn from a given distribution, sensitive to cyclic variations such as day of the week
- Logrank test: compares survival distributions of two right-skewed, censored samples
- Mann–Whitney U or Wilcoxon rank sum test: tests whether two samples are drawn from the same distribution, as compared to a given alternative hypothesis.
- McNemar's test: tests whether, in 2 × 2 contingency tables with a dichotomous trait and matched pairs of subjects, row and column marginal frequencies are equal
- Median test: tests whether two samples are drawn from distributions with equal medians
- Pitman's permutation test: a statistical significance test that yields exact
*p*values by examining all possible rearrangements of labels - Rank products: detects differentially expressed genes in replicated microarray experiments
- Siegel–Tukey test: tests for differences in scale between two groups
- Sign test: tests whether matched pair samples are drawn from distributions with equal medians
- Spearman's rank correlation coefficient: measures statistical dependence between two variables using a monotonic function
- Squared ranks test: tests equality of variances in two or more samples
- Tukey–Duckworth test: tests equality of two distributions by using ranks
- Wald–Wolfowitz runs test: tests whether the elements of a sequence are mutually independent/random
- Wilcoxon signed-rank test: tests whether matched pair samples are drawn from populations with different mean ranks

Early nonparametric statistics include the median (13th century or earlier, use in estimation by Edward Wright, 1599; see Median § History) and the sign test by John Arbuthnot (1710) in analyzing the human sex ratio at birth (see Sign test § History).^{ [3] }^{ [4] }

- ↑ Pearce, J; Derrick, B (2019). "Preliminary testing: The devil of statistics?".
*Reinvention: An International Journal of Undergraduate Research*.**12**(2). doi: 10.31273/reinvention.v12i2.339 . - ↑ Stuart A., Ord J.K, Arnold S. (1999),
*Kendall's Advanced Theory of Statistics: Volume 2A—Classical Inference and the Linear Model*, sixth edition, §20.2–20.3 (Arnold). - ↑ Conover, W.J. (1999), "Chapter 3.4: The Sign Test",
*Practical Nonparametric Statistics*(Third ed.), Wiley, pp. 157–176, ISBN 0-471-16068-7 - ↑ Sprent, P. (1989),
*Applied Nonparametric Statistical Methods*(Second ed.), Chapman & Hall, ISBN 0-412-44980-3

- Bagdonavicius, V., Kruopis, J., Nikulin, M.S. (2011). "Non-parametric tests for complete data", ISTE & WILEY: London & Hoboken. ISBN 978-1-84821-269-5.
- Corder, G. W.; Foreman, D. I. (2014).
*Nonparametric Statistics: A Step-by-Step Approach*. Wiley. ISBN 978-1118840313. - Gibbons, Jean Dickinson; Chakraborti, Subhabrata (2003).
*Nonparametric Statistical Inference*, 4th Ed. CRC Press. ISBN 0-8247-4052-1. - Hettmansperger, T. P.; McKean, J. W. (1998).
*Robust Nonparametric Statistical Methods*. Kendall's Library of Statistics.**5**(First ed.). London: Edward Arnold. New York: John Wiley & Sons. ISBN 0-340-54937-8. MR 1604954. also ISBN 0-471-19479-4. - Hollander M., Wolfe D.A., Chicken E. (2014).
*Nonparametric Statistical Methods*, John Wiley & Sons. - Sheskin, David J. (2003)
*Handbook of Parametric and Nonparametric Statistical Procedures*. CRC Press. ISBN 1-58488-440-1 - Wasserman, Larry (2007).
*All of Nonparametric Statistics*, Springer. ISBN 0-387-25145-6.

**Analysis of variance** (**ANOVA**) is a collection of statistical models and their associated estimation procedures used to analyze the differences among means. ANOVA was developed by the statistician Ronald Fisher. ANOVA is based on the law of total variance, where the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether two or more population means are equal, and therefore generalizes the *t*-test beyond two means.

A **statistical model** is a mathematical model that embodies a set of statistical assumptions concerning the generation of sample data. A statistical model represents, often in considerably idealized form, the data-generating process.

**Statistical inference** is the process of using data analysis to infer properties of an underlying distribution of probability. Inferential statistical analysis infers properties of a population, for example by testing hypotheses and deriving estimates. It is assumed that the observed data set is sampled from a larger population.

**Statistics** is a field of inquiry that studies the collection, analysis, interpretation, and presentation of data. It is applicable to a wide variety of academic disciplines, from the physical and social sciences to the humanities; it is also used and misused for making informed decisions in all areas of business and government.

**Parametric statistics** is a branch of statistics which assumes that sample data comes from a population that can be adequately modeled by a probability distribution that has a fixed set of parameters. Conversely a **non-parametric model** differs precisely in that it makes no assumptions about a parametric distribution when modeling the data.

In statistics, **Spearman's rank correlation coefficient** or **Spearman's ρ**, named after Charles Spearman and often denoted by the Greek letter (rho) or as , is a nonparametric measure of rank correlation. It assesses how well the relationship between two variables can be described using a monotonic function.

In statistics, the **Mann–Whitney U test** is a nonparametric test of the null hypothesis that, for randomly selected values

The ** t-test** is any statistical hypothesis test in which the test statistic follows a Student's

**Mathematical statistics** is the application of probability theory, a branch of mathematics, to statistics, as opposed to techniques for collecting statistical data. Specific mathematical techniques which are used for this include mathematical analysis, linear algebra, stochastic analysis, differential equations, and measure theory.

The **Kruskal–Wallis test** by ranks, **Kruskal–Wallis H test**, or

The following is a glossary of terms used in the mathematical sciences statistics and probability.

The **Friedman test** is a non-parametric statistical test developed by Milton Friedman. Similar to the parametric repeated measures ANOVA, it is used to detect differences in treatments across multiple test attempts. The procedure involves ranking each row together, then considering the values of ranks by columns. Applicable to complete block designs, it is thus a special case of the Durbin test.

The **sign test** is a statistical method to test for consistent differences between pairs of observations, such as the weight of subjects before and after treatment. Given pairs of observations for each subject, the sign test determines if one member of the pair tends to be greater than the other member of the pair.

The **Wald–Wolfowitz runs test**, named after statisticians Abraham Wald and Jacob Wolfowitz is a non-parametric statistical test that checks a randomness hypothesis for a two-valued data sequence. More precisely, it can be used to test the hypothesis that the elements of the sequence are mutually independent.

In statistics, **resampling** is any of a variety of methods for doing one of the following:

- Estimating the precision of sample statistics by using subsets of available data (
**jackknifing**) or drawing randomly with replacement from a set of data points (**bootstrapping**) - Exchanging labels on data points when performing significance tests
- Validating models by using random subsets

In statistics, **Levene's test** is an inferential statistic used to assess the equality of variances for a variable calculated for two or more groups. Some common statistical procedures assume that variances of the populations from which different samples are drawn are equal. Levene's test assesses this assumption. It tests the null hypothesis that the population variances are equal. If the resulting *p*-value of Levene's test is less than some significance level (typically 0.05), the obtained differences in sample variances are unlikely to have occurred based on random sampling from a population with equal variances. Thus, the null hypothesis of equal variances is rejected and it is concluded that there is a difference between the variances in the population.

**Exact statistics**, such as that described in exact test, is a branch of statistics that was developed to provide more accurate results pertaining to statistical testing and interval estimation by eliminating procedures based on asymptotic and approximate statistical methods. The main characteristic of exact methods is that statistical tests and confidence intervals are based on exact probability statements that are valid for any sample size. Exact statistical methods help avoid some of the unreasonable assumptions of traditional statistical methods, such as the assumption of equal variances in classical ANOVA. They also allow exact inference on variance components of mixed models.

In statistics, the **Goldfeld–Quandt test** checks for homoscedasticity in regression analyses. It does this by dividing a dataset into two parts or groups, and hence the test is sometimes called a two-group test. The Goldfeld–Quandt test is one of two tests proposed in a 1965 paper by Stephen Goldfeld and Richard Quandt. Both a parametric and nonparametric test are described in the paper, but the term "Goldfeld–Quandt test" is usually associated only with the former.

In statistics, one purpose for the analysis of variance (ANOVA) is to analyze differences in means between groups. The test statistic, *F*, assumes independence of observations, homogeneous variances, and population normality. **ANOVA on ranks** is a statistic designed for situations when the normality assumption has been violated.

This page is based on this Wikipedia article

Text is available under the CC BY-SA 4.0 license; additional terms may apply.

Images, videos and audio are available under their respective licenses.

Text is available under the CC BY-SA 4.0 license; additional terms may apply.

Images, videos and audio are available under their respective licenses.