Friedman test

Last updated

The Friedman test is a non-parametric statistical test developed by Milton Friedman. [1] [2] [3] Similar to the parametric repeated measures ANOVA, it is used to detect differences in treatments across multiple test attempts. The procedure involves ranking each row (or block) together, then considering the values of ranks by columns. Applicable to complete block designs, it is thus a special case of the Durbin test.

Contents

Classic examples of use are:

• n wine judges each rate k different wines. Are any of the k wines ranked consistently higher or lower than the others?
• n welders each use k welding torches, and the ensuing welds were rated on quality. Do any of the k torches produce consistently better or worse welds?

The Friedman test is used for one-way repeated measures analysis of variance by ranks. In its use of ranks it is similar to the Kruskal–Wallis one-way analysis of variance by ranks.

The Friedman test is widely supported by many statistical software packages.

Method

1. Given data ${\displaystyle \{x_{ij}\}_{n\times k}}$, that is, a matrix with ${\displaystyle n}$ rows (the blocks), ${\displaystyle k}$ columns (the treatments) and a single observation at the intersection of each block and treatment, calculate the ranks within each block. If there are tied values, assign to each tied value the average of the ranks that would have been assigned without ties. Replace the data with a new matrix ${\displaystyle \{r_{ij}\}_{n\times k}}$ where the entry ${\displaystyle r_{ij}}$ is the rank of ${\displaystyle x_{ij}}$ within block ${\displaystyle i}$.
2. Find the values ${\displaystyle {\bar {r}}_{\cdot j}={\frac {1}{n}}\sum _{i=1}^{n}{r_{ij}}}$
3. The test statistic is given by ${\displaystyle Q={\frac {12n}{k(k+1)}}\sum _{j=1}^{k}\left({\bar {r}}_{\cdot j}-{\frac {k+1}{2}}\right)^{2}}$. Note that the value of Q does need to be adjusted for tied values in the data. [4]
4. Finally, when n or k is large (i.e. n > 15 or k > 4), the probability distribution of Q can be approximated by that of a chi-squared distribution. In this case the p-value is given by ${\displaystyle \mathbf {P} (\chi _{k-1}^{2}\geq Q)}$. If n or k is small, the approximation to chi-square becomes poor and the p-value should be obtained from tables of Q specially prepared for the Friedman test. If the p-value is significant, appropriate post-hoc multiple comparisons tests would be performed.
• When using this kind of design for a binary response, one instead uses the Cochran's Q test.
• The Sign test (with a two-sided alternative) is equivalent to a Friedman test on two groups.
• Kendall's W is a normalization of the Friedman statistic between 0 and 1.
• The Wilcoxon signed-rank test is a nonparametric test of nonindependent data from only two groups.
• The Skillings–Mack test is a general Friedman-type statistic that can be used in almost any block design with an arbitrary missing-data structure.
• The Wittkowski test is a general Friedman-Type statistics similar to Skillings-Mack test. When the data do not contain any missing value, it gives the same result as Friedman test. But if the data contain missing values, it is both, more precise and sensitive than Skillings-Mack test. [5] An implementation of the test exists in R. [6]

Post hoc analysis

Post-hoc tests were proposed by Schaich and Hamerle (1984) [7] as well as Conover (1971, 1980) [8] in order to decide which groups are significantly different from each other, based upon the mean rank differences of the groups. These procedures are detailed in Bortz, Lienert and Boehnke (2000, p. 275). [9] Eisinga, Heskes, Pelzer and Te Grotenhuis (2017) [10] provide an exact test for pairwise comparison of Friedman rank sums, implemented in R. The Eisinga c.s. exact test offers a substantial improvement over available approximate tests, especially if the number of groups (${\displaystyle k}$) is large and the number of blocks (${\displaystyle n}$) is small.

Not all statistical packages support post-hoc analysis for Friedman's test, but user-contributed code exists that provides these facilities (for example in SPSS, [11] and in R. [12] ). Also, there is a specialized package available in R containing numerous non-parametric methods for post-hoc analysis after Friedman. [13]

Related Research Articles

Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures used to analyze the differences among means. ANOVA was developed by the statistician Ronald Fisher. ANOVA is based on the law of total variance, where the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether two or more population means are equal, and therefore generalizes the t-test beyond two means.

Nonparametric statistics is the branch of statistics that is not based solely on parametrized families of probability distributions. Nonparametric statistics is based on either being distribution-free or having a specified distribution but with the distribution's parameters unspecified. Nonparametric statistics includes both descriptive statistics and statistical inference. Nonparametric tests are often used when the assumptions of parametric tests are violated.

In statistics, Spearman's rank correlation coefficient or Spearman's ρ, named after Charles Spearman and often denoted by the Greek letter (rho) or as , is a nonparametric measure of rank correlation. It assesses how well the relationship between two variables can be described using a monotonic function.

In statistics, the Mann–Whitney U test is a nonparametric test of the null hypothesis that, for randomly selected values X and Y from two populations, the probability of X being greater than Y is equal to the probability of Y being greater than X.

In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable and one or more independent variables. The most common form of regression analysis is linear regression, in which one finds the line that most closely fits the data according to a specific mathematical criterion. For example, the method of ordinary least squares computes the unique line that minimizes the sum of squared differences between the true data and that line. For specific mathematical reasons, this allows the researcher to estimate the conditional expectation of the dependent variable when the independent variables take on a given set of values. Less common forms of regression use slightly different procedures to estimate alternative location parameters or estimate the conditional expectation across a broader collection of non-linear models.

The Kruskal–Wallis test by ranks, Kruskal–Wallis H test, or one-way ANOVA on ranks is a non-parametric method for testing whether samples originate from the same distribution. It is used for comparing two or more independent samples of equal or different sample sizes. It extends the Mann–Whitney U test, which is used for comparing only two groups. The parametric equivalent of the Kruskal–Wallis test is the one-way analysis of variance (ANOVA).

The Wilcoxon signed-rank test is a non-parametric statistical hypothesis test used to compare two related samples, matched samples, or repeated measurements on a single sample to assess whether their population mean ranks differ. It can be used as an alternative to the paired Student's t-test when the distribution of the difference between two samples' means cannot be assumed to be normally distributed. A Wilcoxon signed-rank test is a nonparametric test that can be used to determine whether two dependent samples were selected from populations having the same distribution.

In statistics, a rank correlation is any of several statistics that measure an ordinal association—the relationship between rankings of different ordinal variables or different rankings of the same variable, where a "ranking" is the assignment of the ordering labels "first", "second", "third", etc. to different observations of a particular variable. A rank correlation coefficient measures the degree of similarity between two rankings, and can be used to assess the significance of the relation between them. For example, two common nonparametric methods of significance that use rank correlation are the Mann–Whitney U test and the Wilcoxon signed-rank test.

In statistics, resampling is any of a variety of methods for doing one of the following:

1. Estimating the precision of sample statistics by using subsets of available data (jackknifing) or drawing randomly with replacement from a set of data points (bootstrapping)
2. Exchanging labels on data points when performing significance tests
3. Validating models by using random subsets

In statistics, Levene's test is an inferential statistic used to assess the equality of variances for a variable calculated for two or more groups. Some common statistical procedures assume that variances of the populations from which different samples are drawn are equal. Levene's test assesses this assumption. It tests the null hypothesis that the population variances are equal. If the resulting p-value of Levene's test is less than some significance level (typically 0.05), the obtained differences in sample variances are unlikely to have occurred based on random sampling from a population with equal variances. Thus, the null hypothesis of equal variances is rejected and it is concluded that there is a difference between the variances in the population.

Omnibus tests are a kind of statistical test. They test whether the explained variance in a set of data is significantly greater than the unexplained variance, overall. One example is the F-test in the analysis of variance. There can be legitimate significant effects within a model even if the omnibus test is not significant. For instance, in a model with two independent variables, if only one variable exerts a significant effect on the dependent variable and the other does not, then the omnibus test may be non-significant. This fact does not affect the conclusions that may be drawn from the one significant variable. In order to test effects within an omnibus test, researchers often use contrasts.

In the analysis of designed experiments, the Friedman test is the most common non-parametric test for complete block designs. The Durbin test is a nonparametric test for balanced incomplete designs that reduces to the Friedman test in the case of a complete block design.

Named after the Dutch mathematician Bartel Leendert van der Waerden, the Van der Waerden test is a statistical test that k population distribution functions are equal. The Van der Waerden test converts the ranks from a standard Kruskal-Wallis one-way analysis of variance to quantiles of the standard normal distribution. These are called normal scores and the test is computed from these normal scores.

In statistics, in the analysis of two-way randomized block designs where the response variable can take only two possible outcomes, Cochran's Q test is a non-parametric statistical test to verify whether k treatments have identical effects. It is named after William Gemmell Cochran. Cochran's Q test should not be confused with Cochran's C test, which is a variance outlier test. Put in simple technical terms, Cochran's Q test requires that there only be a binary response and that there be more than 2 groups of the same size. The test assesses whether the proportion of successes is the same between groups. Often it is used to assess if different observers of the same phenomenon have consistent results.

In statistics, an additive model (AM) is a nonparametric regression method. It was suggested by Jerome H. Friedman and Werner Stuetzle (1981) and is an essential part of the ACE algorithm. The AM uses a one-dimensional smoother to build a restricted class of nonparametric regression models. Because of this, it is less affected by the curse of dimensionality than e.g. a p-dimensional smoother. Furthermore, the AM is more flexible than a standard linear model, while being more interpretable than a general regression surface at the cost of approximation errors. Problems with AM include model selection, overfitting, and multicollinearity.

The Newman–Keuls or Student–Newman–Keuls (SNK)method is a stepwise multiple comparisons procedure used to identify sample means that are significantly different from each other. It was named after Student (1927), D. Newman, and M. Keuls. This procedure is often used as a post-hoc test whenever a significant difference between three or more sample means has been revealed by an analysis of variance (ANOVA). The Newman–Keuls method is similar to Tukey's range test as both procedures use studentized range statistics. Unlike Tukey's range test, the Newman–Keuls method uses different critical values for different pairs of mean comparisons. Thus, the procedure is more likely to reveal significant differences between group means and to commit type I errors by incorrectly rejecting a null hypothesis when it is true. In other words, the Neuman-Keuls procedure is more powerful but less conservative than Tukey's range test.

In statistics, Tukey's test of additivity, named for John Tukey, is an approach used in two-way ANOVA to assess whether the factor variables are additively related to the expected value of the response variable. It can be applied when there are no replicated values in the data set, a situation in which it is impossible to directly estimate a fully general non-additive regression structure and still have information left to estimate the error variance. The test statistic proposed by Tukey has one degree of freedom under the null hypothesis, hence this is often called "Tukey's one-degree-of-freedom test."

In statistics, one purpose for the analysis of variance (ANOVA) is to analyze differences in means between groups. The test statistic, F, assumes independence of observations, homogeneous variances, and population normality. ANOVA on ranks is a statistic designed for situations when the normality assumption has been violated.

In statistics, the two-way analysis of variance (ANOVA) is an extension of the one-way ANOVA that examines the influence of two different categorical independent variables on one continuous dependent variable. The two-way ANOVA not only aims at assessing the main effect of each independent variable but also if there is any interaction between them.

Ordinal data is a categorical, statistical data type where the variables have natural, ordered categories and the distances between the categories is not known. These data exist on an ordinal scale, one of four levels of measurement described by S. S. Stevens in 1946. The ordinal scale is distinguished from the nominal scale by having a ranking. It also differs from interval and ratio scales by not having category widths that represent equal increments of the underlying attribute.

References

1. Friedman, Milton (December 1937). "The use of ranks to avoid the assumption of normality implicit in the analysis of variance". Journal of the American Statistical Association. 32 (200): 675–701. doi:10.1080/01621459.1937.10503522. JSTOR   2279372.
2. Friedman, Milton (March 1939). "A correction: The use of ranks to avoid the assumption of normality implicit in the analysis of variance". Journal of the American Statistical Association. 34 (205): 109. doi:10.1080/01621459.1939.10502372. JSTOR   2279169.
3. Friedman, Milton (March 1940). "A comparison of alternative tests of significance for the problem of m rankings". The Annals of Mathematical Statistics. 11 (1): 86–92. doi:. JSTOR   2235971.
4. "FRIEDMAN TEST in NIST Dataplot". August 20, 2018.
5. Wittkowski, Knut M. (1988). "Friedman-Type statistics and consistent multiple comparisons for unbalanced designs with missing data". Journal of the American Statistical Association. 83 (404): 1163–1170. CiteSeerX  . doi:10.1080/01621459.1988.10478715. JSTOR   2290150.
6. "muStat package (R code)". August 23, 2012.
7. Schaich, E. & Hamerle, A. (1984). Verteilungsfreie statistische Prüfverfahren. Berlin: Springer. ISBN   3-540-13776-9.
8. Conover, W. J. (1971, 1980). Practical nonparametric statistics. New York: Wiley. ISBN   0-471-16851-3.
9. Bortz, J., Lienert, G. & Boehnke, K. (2000). Verteilungsfreie Methoden in der Biostatistik. Berlin: Springer. ISBN   3-540-67590-6.
10. Eisinga, R.; Heskes, T.; Pelzer, B.; Te Grotenhuis, M. (2017). "Exact p-values for pairwise comparison of Friedman rank sums, with application to comparing classifiers". BMC Bioinformatics. 18: 68. doi:10.1186/s12859-017-1486-2. PMC  . PMID   28122501.
11. "Post-hoc comparisons for Friedman test". Archived from the original on 2012-11-03. Retrieved 2010-02-22.
12. "Post hoc analysis for Friedman's Test (R code)". February 22, 2010.