List of unsolved problems in statistics

Last updated

There are many longstanding unsolved problems in mathematics for which a solution has still not yet been found. The notable unsolved problems in statistics are generally of a different flavor; according to John Tukey, [1] "difficulties in identifying problems have delayed statistics far more than difficulties in solving problems." A list of "one or two open problems" (in fact 22 of them) was given by David Cox. [2]

Contents

Inference and testing

Experimental design

Problems of a more philosophical nature

Notes

  1. Tukey, John W. (1954). "Unsolved Problems of Experimental Statistics". Journal of the American Statistical Association. 49 (268): 706–731. doi:10.2307/2281535. JSTOR   2281535.
  2. Cox, D. R. (1984). "Present Position and Potential Developments: Some Personal Views: Design of Experiments and Regression". Journal of the Royal Statistical Society. Series A (General). 147 (2): 306–315. doi:10.2307/2981685. JSTOR   2981685.
  3. Pal, Nabendu; Lim, Wooi K. (1997). "A note on second-order admissibility of the Graybill-Deal estimator of a common mean of several normal populations". Journal of Statistical Planning and Inference. 63: 71–78. doi:10.1016/S0378-3758(96)00202-9.
  4. Fraser, D.A.S.; Rousseau, J. (2008). "Studentization and deriving accurate p-values" (PDF). Biometrika. 95: 1–16. doi:10.1093/biomet/asm093.
  5. Jordan, M. I. (2011). "What are the open problems in Bayesian statistics?" (PDF). The ISBA Bulletin. 18 (1): 1–5.
  6. Zabell, S. L. (1992). "Predicting the unpredictable". Synthese. 90 (2): 205. doi:10.1007/bf00485351. S2CID   9416747.

Related Research Articles

<span class="mw-page-title-main">Estimator</span> Rule for calculating an estimate of a given quantity based on observed data

In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule, the quantity of interest and its result are distinguished. For example, the sample mean is a commonly used estimator of the population mean.

<span class="mw-page-title-main">Statistics</span> Study of the collection, analysis, interpretation, and presentation of data

Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model to be studied. Populations can be diverse groups of people or objects such as "all people living in a country" or "every atom composing a crystal". Statistics deals with every aspect of data, including the planning of data collection in terms of the design of surveys and experiments.

<span class="mw-page-title-main">Statistical inference</span> Process of using data analysis

Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability. Inferential statistical analysis infers properties of a population, for example by testing hypotheses and deriving estimates. It is assumed that the observed data set is sampled from a larger population.

The following outline is provided as an overview of and topical guide to statistics:

<span class="mw-page-title-main">Statistical hypothesis test</span> Method of statistical inference

A statistical hypothesis test is a method of statistical inference used to decide whether the data sufficiently support a particular hypothesis. A statistical hypothesis test typically involves a calculation of a test statistic. Then a decision is made, either by comparing the test statistic to a critical value or equivalently by evaluating a p-value computed from the test statistic. Roughly 100 specialized statistical tests have been defined.

In statistics, point estimation involves the use of sample data to calculate a single value which is to serve as a "best guess" or "best estimate" of an unknown population parameter. More formally, it is the application of a point estimator to the data to obtain a point estimate.

Student's t-test is a statistical test used to test whether the difference between the response of two groups is statistically significant or not. It is any statistical hypothesis test in which the test statistic follows a Student's t-distribution under the null hypothesis. It is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known. When the scaling term is estimated based on the data, the test statistic—under certain conditions—follows a Student's t distribution. The t-test's most common application is to test whether the means of two populations are significantly different. In many cases, a Z-test will yield very similar results to a t-test since the latter converges to the former as the size of the dataset increases.

In statistics, a nuisance parameter is any parameter which is unspecified but which must be accounted for in the hypothesis testing of the parameters which are of interest.

This glossary of statistics and probability is a list of definitions of terms and concepts used in the mathematical sciences of statistics and probability, their sub-disciplines, and related fields. For additional related terms, see Glossary of mathematics and Glossary of experimental design.

In statistics, the Behrens–Fisher problem, named after Walter-Ulrich Behrens and Ronald Fisher, is the problem of interval estimation and hypothesis testing concerning the difference between the means of two normally distributed populations when the variances of the two populations are not assumed to be equal, based on two independent samples.

In statistics, resampling is the creation of new samples based on one observed sample. Resampling methods are:

  1. Permutation tests
  2. Bootstrapping
  3. Cross validation
  4. Jackknife

Bootstrapping is any test or metric that uses random sampling with replacement, and falls under the broader class of resampling methods. Bootstrapping assigns measures of accuracy to sample estimates. This technique allows estimation of the sampling distribution of almost any statistic using random sampling methods.

In estimation theory and decision theory, a Bayes estimator or a Bayes action is an estimator or decision rule that minimizes the posterior expected value of a loss function. Equivalently, it maximizes the posterior expectation of a utility function. An alternative way of formulating an estimator within Bayesian statistics is maximum a posteriori estimation.

Tukey's range test, also known as Tukey's test, Tukey method, Tukey's honest significance test, or Tukey's HSDtest, is a single-step multiple comparison procedure and statistical test. It can be used to correctly interpret the statistical significance of the difference between means that have been selected for comparison because of their extreme values.

Frequentist inference is a type of statistical inference based in frequentist probability, which treats “probability” in equivalent terms to “frequency” and draws conclusions from sample-data by means of emphasizing the frequency or proportion of findings in the data. Frequentist inference underlies frequentist statistics, in which the well-established methodologies of statistical hypothesis testing and confidence intervals are founded.

In statistical theory, a U-statistic is a class of statistics defined as the average over the application of a given function applied to all tuples of a fixed size. The letter "U" stands for unbiased. In elementary statistics, U-statistics arise naturally in producing minimum-variance unbiased estimators.

Exact statistics, such as that described in exact test, is a branch of statistics that was developed to provide more accurate results pertaining to statistical testing and interval estimation by eliminating procedures based on asymptotic and approximate statistical methods. The main characteristic of exact methods is that statistical tests and confidence intervals are based on exact probability statements that are valid for any sample size. Exact statistical methods help avoid some of the unreasonable assumptions of traditional statistical methods, such as the assumption of equal variances in classical ANOVA. They also allow exact inference on variance components of mixed models.

In statistics, a generalized p-value is an extended version of the classical p-value, which except in a limited number of applications, provides only approximate solutions.

References