Cucconi test

Last updated

In statistics, the Cucconi test is a nonparametric test for jointly comparing central tendency and variability (detecting location and scale changes) in two samples. Many rank tests have been proposed for the two-sample location-scale problem. Nearly all of them are Lepage-type tests, that is a combination of a location test and a scale test. The Cucconi test was first proposed by Odoardo Cucconi in 1968. [1]

Contents

The Cucconi test is not as familiar as other location-scale tests but it is of interest for several reasons. First, from a historical point of view, it was proposed some years before the Lepage test, the standard rank test for the two-sample location-scale problem. Secondly, as opposed to other location-scale tests, the Cucconi test is not a combination of location and scale tests. Thirdly, it compares favorably with Lepage type tests in terms of power and type-one error probability [2] and very importantly it is easier to be computed because it requires only the ranks of one sample in the combined sample, whereas the other tests also require scores of various types as well as to permutationally estimate mean and variance of test statistics because their analytic formulae are not available. [3]

The Cucconi test is based on the following statistic:

where is based on the standardized sum of squared ranks of the first sample elements in the pooled sample, and is based on the standardized sum of squared contrary-ranks of the first sample elements in the pooled sample. is the correlation coefficient between and . The test statistic rejects for large values, a table of critical values is available. [4] The p-value can be easily computed via permutations.

The interest on this test has recently increased spanning applications in many different fields like hydrology, applied psychology and industrial quality control. [5]

See also

Related Research Articles

<span class="mw-page-title-main">Kolmogorov–Smirnov test</span> Non-parametric statistical test between two distributions

In statistics, the Kolmogorov–Smirnov test is a nonparametric test of the equality of continuous, one-dimensional probability distributions that can be used to test whether a sample came from a given reference probability distribution, or to test whether two samples came from the same distribution. Intuitively, the test provides a method to qualitatively answer the question "How likely is it that we would see a collection of samples like this if they were drawn from that probability distribution?" or, in the second case, "How likely is it that we would see two sets of samples like this if they were drawn from the same probability distribution?". It is named after Andrey Kolmogorov and Nikolai Smirnov.

In fluid mechanics, the Rayleigh number (Ra, after Lord Rayleigh) for a fluid is a dimensionless number associated with buoyancy-driven flow, also known as free (or natural) convection. It characterises the fluid's flow regime: a value in a certain lower range denotes laminar flow; a value in a higher range, turbulent flow. Below a certain critical value, there is no fluid motion and heat transfer is by conduction rather than convection. For most engineering purposes, the Rayleigh number is large, somewhere around 106 to 108.

<span class="mw-page-title-main">Pearson correlation coefficient</span> Measure of linear correlation

In statistics, the Pearson correlation coefficient (PCC) is a correlation coefficient that measures linear correlation between two sets of data. It is the ratio between the covariance of two variables and the product of their standard deviations; thus, it is essentially a normalized measurement of the covariance, such that the result always has a value between −1 and 1. As with covariance itself, the measure can only reflect a linear correlation of variables, and ignores many other types of relationships or correlations. As a simple example, one would expect the age and height of a sample of teenagers from a high school to have a Pearson correlation coefficient significantly greater than 0, but less than 1.

<span class="mw-page-title-main">Spearman's rank correlation coefficient</span> Nonparametric measure of rank correlation

In statistics, Spearman's rank correlation coefficient or Spearman's ρ, named after Charles Spearman and often denoted by the Greek letter (rho) or as , is a nonparametric measure of rank correlation. It assesses how well the relationship between two variables can be described using a monotonic function.

Cronbach's alpha, also known as tau-equivalent reliability or coefficient alpha, is a reliability coefficient and a measure of the internal consistency of tests and measures.

Mann–Whitney test is a nonparametric test of the null hypothesis that, for randomly selected values X and Y from two populations, the probability of X being greater than Y is equal to the probability of Y being greater than X.

Hotellings <i>T</i>-squared distribution Type of probability distribution

In statistics, particularly in hypothesis testing, the Hotelling's T-squared distribution (T2), proposed by Harold Hotelling, is a multivariate probability distribution that is tightly related to the F-distribution and is most notable for arising as the distribution of a set of sample statistics that are natural generalizations of the statistics underlying the Student's t-distribution. The Hotelling's t-squared statistic (t2) is a generalization of Student's t-statistic that is used in multivariate hypothesis testing.

<span class="mw-page-title-main">Kruskal–Wallis test</span> Non-parametric method for testing whether samples originate from the same distribution

The Kruskal–Wallis test by ranks, Kruskal–Wallis test, or one-way ANOVA on ranks is a non-parametric method for testing whether samples originate from the same distribution. It is used for comparing two or more independent samples of equal or different sample sizes. It extends the Mann–Whitney U test, which is used for comparing only two groups. The parametric equivalent of the Kruskal–Wallis test is the one-way analysis of variance (ANOVA).

A permutation test is an exact statistical hypothesis test making use of the proof by contradiction. A permutation test involves two or more samples. The null hypothesis is that all samples come from the same distribution . Under the null hypothesis, the distribution of the test statistic is obtained by calculating all possible values of the test statistic under possible rearrangements of the observed data. Permutation tests are, therefore, a form of resampling.

In statistics, the Behrens–Fisher problem, named after Walter-Ulrich Behrens and Ronald Fisher, is the problem of interval estimation and hypothesis testing concerning the difference between the means of two normally distributed populations when the variances of the two populations are not assumed to be equal, based on two independent samples.

In statistics, a rank correlation is any of several statistics that measure an ordinal association—the relationship between rankings of different ordinal variables or different rankings of the same variable, where a "ranking" is the assignment of the ordering labels "first", "second", "third", etc. to different observations of a particular variable. A rank correlation coefficient measures the degree of similarity between two rankings, and can be used to assess the significance of the relation between them. For example, two common nonparametric methods of significance that use rank correlation are the Mann–Whitney U test and the Wilcoxon signed-rank test.

In statistics, M-estimators are a broad class of extremum estimators for which the objective function is a sample average. Both non-linear least squares and maximum likelihood estimation are special cases of M-estimators. The definition of M-estimators was motivated by robust statistics, which contributed new types of M-estimators. However, M-estimators are not inherently robust, as is clear from the fact that they include maximum likelihood estimators, which are in general not robust. The statistical procedure of evaluating an M-estimator on a data set is called M-estimation.

Kendall's W is a non-parametric statistic for rank correlation. It is a normalization of the statistic of the Friedman test, and can be used for assessing agreement among raters and in particular inter-rater reliability. Kendall's W ranges from 0 to 1.

The term kernel is used in statistical analysis to refer to a window function. The term "kernel" has several distinct meanings in different branches of statistics.

<span class="mw-page-title-main">Quantile regression</span> Statistics concept

Quantile regression is a type of regression analysis used in statistics and econometrics. Whereas the method of least squares estimates the conditional mean of the response variable across values of the predictor variables, quantile regression estimates the conditional median of the response variable. Quantile regression is an extension of linear regression used when the conditions of linear regression are not met.

In statistics, the Breusch–Godfrey test is used to assess the validity of some of the modelling assumptions inherent in applying regression-like models to observed data series. In particular, it tests for the presence of serial correlation that has not been included in a proposed model structure and which, if present, would mean that incorrect conclusions would be drawn from other tests or that sub-optimal estimates of model parameters would be obtained.

In statistics, one purpose for the analysis of variance (ANOVA) is to analyze differences in means between groups. The test statistic, F, assumes independence of observations, homogeneous variances, and population normality. ANOVA on ranks is a statistic designed for situations when the normality assumption has been violated.

In statistical inference, the concept of a confidence distribution (CD) has often been loosely referred to as a distribution function on the parameter space that can represent confidence intervals of all levels for a parameter of interest. Historically, it has typically been constructed by inverting the upper limits of lower sided confidence intervals of all levels, and it was also commonly associated with a fiducial interpretation, although it is a purely frequentist concept. A confidence distribution is NOT a probability distribution function of the parameter of interest, but may still be a function useful for making inferences.

Distribution-free (nonparametric) control charts are one of the most important tools of statistical process monitoring and control. Implementation techniques of distribution-free control charts do not require any knowledge about the underlying process distribution or its parameters. The main advantage of distribution-free control charts is its in-control robustness, in the sense that, irrespective of the nature of the underlying process distributions, the properties of these control charts remain the same when the process is smoothly operating without presence of any assignable cause.

In statistics, the Lepage test is an exact distribution-free test for jointly monitoring the location and scale (variability) in two-sample treatment versus control comparisons. This is one of the most famous rank tests for the two-sample location-scale problem. The Lepage test statistic is the squared Euclidean distance of standardized Wilcoxon rank-sum test for location and the standardized Ansari–Bradley test for scale. The Lepage test was first introduced by Yves Lepage in 1971 in a paper in Biometrika. A large number of Lepage-type tests exists in statistical literature for simultaneously testing location and scale shifts in case-control studies. The details may be found in the book: Nonparametric statistical tests: A computational approach. Kössler, W. in 2006 also introduced various Lepage type tests using some alternative score functions optimal for various distributions. Dr. Amitava Mukherjee and Dr. Marco Marozzi introduced a class of percentile modified version of the Lepage test. An alternative to the Lepage-type tests is known as the Cucconi test proposed by Odoardo Cucconi in 1968.

References

  1. Cucconi, Odoardo (1968). "Un nuovo test non parametrico per il confronto tra due gruppi campionari". Giornale Degli Economisti. 27 (3/4): 225–248. JSTOR   23241361.
  2. Marozzi, Marco (2013). "Nonparametric Simultaneous Tests for Location and Scale Testing: a Comparison of Several Methods". Communications in Statistics – Simulation and Computation. 42 (6): 1298–1317. doi:10.1080/03610918.2012.665546. S2CID   28146102.
  3. Marozzi, Marco (2014). "The multisample Cucconi test". Statistical Methods & Applications. 23 (2): 209–227. doi:10.1007/s10260-014-0255-x. S2CID   45130096.
  4. Marozzi, Marco (2009). "Some Notes on the Location-Scale Cucconi Test". Journal of Nonparametric Statistics. 21 (5): 629–647. doi:10.1080/10485250902952435. S2CID   120038970.
  5. 1 2 Mukherjee, Amitava; Marozzi, Marco (2017-09-19). "A distribution-free phase-II CUSUM procedure for monitoring service quality". Total Quality Management & Business Excellence. 28 (11–12): 1227–1263. doi:10.1080/14783363.2015.1134266. ISSN   1478-3363. S2CID   155905572.
  6. Chowdhury, S.; Mukherjee, A.; Chakraborti, S. (March 2014). "A New Distribution-free Control Chart for Joint Monitoring of Unknown Location and Scale Parameters of Continuous Distributions". Quality and Reliability Engineering International. 30 (2): 191–204. doi:10.1002/qre.1488. S2CID   10932084.