Friedman test

Last updated January 29, 2025

The Friedman test is a non-parametric statistical test developed by Milton Friedman.^[1]^[2]^[3] Similar to the parametric repeated measures ANOVA, it is used to detect differences in treatments across multiple test attempts. The procedure involves ranking each row (or block) together, then considering the values of ranks by columns. Applicable to complete block designs, it is thus a special case of the Durbin test.

${\textstyle n}$ wine judges each rate ${\textstyle k}$ different wines. Are any of the ${\textstyle k}$ wines ranked consistently higher or lower than the others?
${\textstyle n}$ welders each use ${\textstyle k}$ welding torches, and the ensuing welds were rated on quality. Do any of the ${\textstyle k}$ torches produce consistently better or worse welds?

The Friedman test is used for one-way repeated measures analysis of variance by ranks. In its use of ranks it is similar to the Kruskal–Wallis one-way analysis of variance by ranks.

The Friedman test is widely supported by many statistical software packages.

Method

Given data $\{x_{ij}\}_{n\times k}$ , that is, a matrix with $n$ rows (the blocks), $k$ columns (the treatments) and a single observation at the intersection of each block and treatment, calculate the ranks within each block. If there are tied values, assign to each tied value the average of the ranks that would have been assigned without ties. Replace the data with a new matrix $\{r_{ij}\}_{n\times k}$ where the entry $r_{ij}$ is the rank of $x_{ij}$ within block $i$ .
Find the values ${\bar {r}}_{\cdot j}={\frac {1}{n}}\sum _{i=1}^{n}{r_{ij}}$
The test statistic is given by $Q={\frac {12n}{k(k+1)}}\sum _{j=1}^{k}\left({\bar {r}}_{\cdot j}-{\frac {k+1}{2}}\right)^{2}$ . Note that the value of ${\textstyle Q}$ does need to be adjusted for tied values in the data.^[4]
Finally, when ${\textstyle n}$ or ${\textstyle k}$ is large (i.e. ${\textstyle n>15}$ or ${\textstyle k>4}$ ), the probability distribution of ${\textstyle Q}$ can be approximated by that of a chi-squared distribution. In this case the $p$ -value is given by $\mathbf {P} (\chi _{k-1}^{2}\geq Q)$ . If ${\textstyle n}$ or ${\textstyle k}$ is small, the approximation to chi-square becomes poor and the $p$ -value should be obtained from tables of ${\textstyle Q}$ specially prepared for the Friedman test. If the $p$ -value is significant, appropriate post-hoc multiple comparisons tests would be performed.

Related tests

When using this kind of design for a binary response, one instead uses the Cochran's Q test.
The Sign test (with a two-sided alternative) is equivalent to a Friedman test on two groups.
Kendall's W is a normalization of the Friedman statistic between ${\textstyle 0}$ and ${\textstyle 1}$ .
The Wilcoxon signed-rank test is a nonparametric test of nonindependent data from only two groups.
The Skillings–Mack test is a general Friedman-type statistic that can be used in almost any block design with an arbitrary missing-data structure.
The Wittkowski test is a general Friedman-Type statistics similar to Skillings-Mack test. When the data do not contain any missing value, it gives the same result as Friedman test. But if the data contain missing values, it is both, more precise and sensitive than Skillings-Mack test.^[5]

Post hoc analysis

Post-hoc tests were proposed by Schaich and Hamerle (1984)^[6] as well as Conover (1971, 1980)^[7] in order to decide which groups are significantly different from each other, based upon the mean rank differences of the groups. These procedures are detailed in Bortz, Lienert and Boehnke (2000, p. 275).^[8] Eisinga, Heskes, Pelzer and Te Grotenhuis (2017)^[9] provide an exact test for pairwise comparison of Friedman rank sums, implemented in R. The Eisinga c.s. exact test offers a substantial improvement over available approximate tests, especially if the number of groups ( $k$ ) is large and the number of blocks ( $n$ ) is small.

Not all statistical packages support post-hoc analysis for Friedman's test, but user-contributed code exists that provides these facilities (for example in SPSS,^[10] and in R.^[11]). The R package titled PMCMRplus contains numerous non-parametric methods for post-hoc analysis after Friedman,^[12] including support for the Nemenyi test.

Related Research Articles

Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures used to analyze the differences between groups. It uses F-test by comparing variance between groups and taking noise, or assumed normal distribution of group, into consideration by dividing by variance between elements in a group. ANOVA was developed by the statistician Ronald Fisher. ANOVA is based on the law of total variance, where the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether two or more population means are equal, and therefore generalizes the t-test beyond two means. In other words, the ANOVA is used to test the difference between two or more means.

Nonparametric statistics is a type of statistical analysis that makes minimal assumptions about the underlying distribution of the data being studied. Often these models are infinite-dimensional, rather than finite dimensional, as in parametric statistics. Nonparametric statistics can be used for descriptive statistics or statistical inference. Nonparametric tests are often used when the assumptions of parametric tests are evidently violated.

In statistics, Spearman's rank correlation coefficient or Spearman's ρ, named after Charles Spearman and often denoted by the Greek letter (rho) or as $, is a nonparametric measure of rank correlation. It assesses how well the relationship between two variables can be described using a monotonic function.$

An F-test is any statistical test used to compare the variances of two samples or the ratio of variances between multiple samples. The test statistic, random variable F, is used to determine if the tested data has an F-distribution under the true null hypothesis, and true customary assumptions about the error term (ε). It is most often used when comparing statistical models that have been fitted to a data set, in order to identify the model that best fits the population from which the data were sampled. Exact "F-tests" mainly arise when the models have been fitted to the data using least squares. The name was coined by George W. Snedecor, in honour of Ronald Fisher. Fisher initially developed the statistic as the variance ratio in the 1920s.

<span class="mw-page-title-main">Multidimensional scaling</span> Set of related ordination techniques used in information visualization

Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a data set. MDS is used to translate distances between each pair of $objects in a set into a configuration of points mapped into an abstract Cartesian space.$

The Mann–Whitney $test$ is a nonparametric statistical test of the null hypothesis that, for randomly selected values X and Y from two populations, the probability of X being greater than Y is equal to the probability of Y being greater than X.

The Kruskal–Wallis test by ranks, Kruskal–Wallis $test$ , or one-way ANOVA on ranks is a non-parametric statistical test for testing whether samples originate from the same distribution. It is used for comparing two or more independent samples of equal or different sample sizes. It extends the Mann–Whitney U test, which is used for comparing only two groups. The parametric equivalent of the Kruskal–Wallis test is the one-way analysis of variance (ANOVA).

The Wilcoxon signed-rank test is a non-parametric rank test for statistical hypothesis testing used either to test the location of a population based on a sample of data, or to compare the locations of two populations using two matched samples. The one-sample version serves a purpose similar to that of the one-sample Student's t-test. For two matched samples, it is a paired difference test like the paired Student's t-test. The Wilcoxon test is a good alternative to the t-test when the normal distribution of the differences between paired individuals cannot be assumed. Instead, it assumes a weaker hypothesis that the distribution of this difference is symmetric around a central value and it aims to test whether this center value differs significantly from zero. The Wilcoxon test is a more powerful alternative to the sign test because it considers the magnitude of the differences, but it requires this moderately strong assumption of symmetry.

Functional data analysis (FDA) is a branch of statistics that analyses data providing information about curves, surfaces or anything else varying over a continuum. In its most general form, under an FDA framework, each sample element of functional data is considered to be a random function. The physical continuum over which these functions are defined is often time, but may also be spatial location, wavelength, probability, etc. Intrinsically, functional data are infinite dimensional. The high intrinsic dimensionality of these data brings challenges for theory as well as computation, where these challenges vary with how the functional data were sampled. However, the high or infinite dimensional structure of the data is a rich source of information and there are many interesting challenges for research and data analysis.

In statistics, a rank correlation is any of several statistics that measure an ordinal association — the relationship between rankings of different ordinal variables or different rankings of the same variable, where a "ranking" is the assignment of the ordering labels "first", "second", "third", etc. to different observations of a particular variable. A rank correlation coefficient measures the degree of similarity between two rankings, and can be used to assess the significance of the relation between them. For example, two common nonparametric methods of significance that use rank correlation are the Mann–Whitney U test and the Wilcoxon signed-rank test.

The Anderson–Darling test is a statistical test of whether a given sample of data is drawn from a given probability distribution. In its basic form, the test assumes that there are no parameters to be estimated in the distribution being tested, in which case the test and its set of critical values is distribution-free. However, the test is most often used in contexts where a family of distributions is being tested, in which case the parameters of that family need to be estimated and account must be taken of this in adjusting either the test-statistic or its critical values. When applied to testing whether a normal distribution adequately describes a set of data, it is one of the most powerful statistical tools for detecting most departures from normality. K-sample Anderson–Darling tests are available for testing whether several collections of observations can be modelled as coming from a single population, where the distribution function does not have to be specified.

In statistics, Levene's test is an inferential statistic used to assess the equality of variances for a variable calculated for two or more groups. This test is used because some common statistical procedures assume that variances of the populations from which different samples are drawn are equal. Levene's test assesses this assumption. It tests the null hypothesis that the population variances are equal. If the resulting p-value of Levene's test is less than some significance level (typically 0.05), the obtained differences in sample variances are unlikely to have occurred based on random sampling from a population with equal variances. Thus, the null hypothesis of equal variances is rejected and it is concluded that there is a difference between the variances in the population.

Kendall's W is a non-parametric statistic for rank correlation. It is a normalization of the statistic of the Friedman test, and can be used for assessing agreement among raters and in particular inter-rater reliability. Kendall's W ranges from 0 to 1.

Tukey's range test, also known as Tukey's test, Tukey method, Tukey's honest significance test, or Tukey's HSDtest, is a single-step multiple comparison procedure and statistical test. It can be used to correctly interpret the statistical significance of the difference between means that have been selected for comparison because of their extreme values.

In statistics, Scheffé's method, named after American statistician Henry Scheffé, is a method for adjusting significance levels in a linear regression analysis to account for multiple comparisons. It is particularly useful in analysis of variance, and in constructing simultaneous confidence bands for regressions involving basis functions.

Durbin test is a non-parametric statistical test for balanced incomplete designs that reduces to the Friedman test in the case of a complete block design. In the analysis of designed experiments, the Friedman test is the most common non-parametric test for complete block designs.

Named after the Dutch mathematician Bartel Leendert van der Waerden, the Van der Waerden test is a statistical test that k population distribution functions are equal. The Van der Waerden test converts the ranks from a standard Kruskal-Wallis test to quantiles of the standard normal distribution. These are called normal scores and the test is computed from these normal scores.

Cochran's $test$ is a non-parametric statistical test to verify whether k treatments have identical effects in the analysis of two-way randomized block designs where the response variable is binary. It is named after William Gemmell Cochran. Cochran's Q test should not be confused with Cochran's C test, which is a variance outlier test. Put in simple technical terms, Cochran's Q test requires that there only be a binary response and that there be more than 2 groups of the same size. The test assesses whether the proportion of successes is the same between groups. Often it is used to assess if different observers of the same phenomenon have consistent results.

The Newman–Keuls or Student–Newman–Keuls (SNK)method is a stepwise multiple comparisons procedure used to identify sample means that are significantly different from each other. It was named after Student (1927), D. Newman, and M. Keuls. This procedure is often used as a post-hoc test whenever a significant difference between three or more sample means has been revealed by an analysis of variance (ANOVA). The Newman–Keuls method is similar to Tukey's range test as both procedures use studentized range statistics. Unlike Tukey's range test, the Newman–Keuls method uses different critical values for different pairs of mean comparisons. Thus, the procedure is more likely to reveal significant differences between group means and to commit type I errors by incorrectly rejecting a null hypothesis when it is true. In other words, the Neuman-Keuls procedure is more powerful but less conservative than Tukey's range test.

In statistics, one purpose for the analysis of variance (ANOVA) is to analyze differences in means between groups. The test statistic, F, assumes independence of observations, homogeneous variances, and population normality. ANOVA on ranks is a statistic designed for situations when the normality assumption has been violated.

References

↑ Friedman, Milton (December 1937). "The use of ranks to avoid the assumption of normality implicit in the analysis of variance". Journal of the American Statistical Association. 32 (200): 675–701. doi:10.1080/01621459.1937.10503522. JSTOR 2279372.
↑ Friedman, Milton (March 1939). "A correction: The use of ranks to avoid the assumption of normality implicit in the analysis of variance". Journal of the American Statistical Association. 34 (205): 109. doi:10.1080/01621459.1939.10502372. JSTOR 2279169.
↑ Friedman, Milton (March 1940). "A comparison of alternative tests of significance for the problem of m rankings". The Annals of Mathematical Statistics. 11 (1): 86–92. doi: 10.1214/aoms/1177731944 . JSTOR 2235971.
↑ "FRIEDMAN TEST in NIST Dataplot". August 20, 2018.
↑ Wittkowski, Knut M. (1988). "Friedman-Type statistics and consistent multiple comparisons for unbalanced designs with missing data". Journal of the American Statistical Association. 83 (404): 1163–1170. CiteSeerX 10.1.1.533.1948 . doi:10.1080/01621459.1988.10478715. JSTOR 2290150.
↑ Schaich, E. & Hamerle, A. (1984). Verteilungsfreie statistische Prüfverfahren. Berlin: Springer. ISBN 3-540-13776-9.
↑ Conover, W. J. (1971, 1980). Practical nonparametric statistics. New York: Wiley. ISBN 0-471-16851-3.
↑ Bortz, J., Lienert, G. & Boehnke, K. (2000). Verteilungsfreie Methoden in der Biostatistik. Berlin: Springer. ISBN 3-540-67590-6.
↑ Eisinga, R.; Heskes, T.; Pelzer, B.; Te Grotenhuis, M. (2017). "Exact p-values for pairwise comparison of Friedman rank sums, with application to comparing classifiers". BMC Bioinformatics. 18 (1): 68. doi: 10.1186/s12859-017-1486-2 . PMC 5267387 . PMID 28122501.
↑ "Post-hoc comparisons for Friedman test". Archived from the original on 2012-11-03. Retrieved 2010-02-22.
↑ "Post hoc analysis for Friedman's Test (R code)". February 22, 2010.
↑ "PMCMRplus: Calculate Pairwise Multiple Comparisons of Mean Rank Sums Extended". 17 August 2022.

v t e Milton Friedman
Academic career	Statistics Friedman test Decision theory Friedman–Savage utility function Economics Monetarism Friedman rule Friedman's k-percent rule Miracle of Chile Permanent income hypothesis
Works	Essays in Positive Economics (1953) Capitalism and Freedom (1962) Price Theory (1962) A Monetary History of the United States (1963) Free to Choose (1980) Milton Friedman bibliography
Philosophy	Friedman doctrine
Family	Rose Friedman (wife) David D. Friedman (son) Patri Friedman (grandson)