List of important publications in statistics

Last updated

This is a list of important publications in statistics, organized by field.

Contents

Some reasons why a particular publication might be regarded as important:

Probability

Théorie analytique des probabilités
Author: Pierre-Simon Laplace
Publication data: 1820 (3rd ed.)
Online version: Internet Archive; CNRS, with more accurate character recognition; Gallica-Math, complete PDF and PDFs by section
Description: Introduced the Laplace transform, exponential families, and conjugate priors in Bayesian statistics. Pioneering asymptotic statistics, proved an early version of the Bernstein–von Mises theorem on the irrelevance of the (regular) prior distribution on the limiting posterior distribution, highlighting the asymptotic role of the Fisher information. Studies the influence of median and skewness in regression analysis. Inspired the field of robust regression, proposed the Laplace distribution and was the first to provide alternatives to Carl Friedrich Gauss's work on statistics.
Importance: Topic creator, Breakthrough, Influence

Mathematical statistics

Mathematical Methods of Statistics

Author: Harald Cramér
Publication data: Princeton Mathematical Series, vol. 9. Princeton University Press, Princeton, N. J., 1946. xvi+575 pp. (A first version was published by Almqvist & Wiksell in Uppsala, Sweden, but had little circulation because of World War II.)
Description: Carefully written and extensive account of measure-theoretic probability for statisticians, along with careful mathematical treatment of classical statistics.
Importance: Made measure-theoretic probability the standard language for advanced statistics in the English-speaking world, following its earlier adoption in France and the USSR.

Statistical Decision Functions

Author: Abraham Wald
Publication data: 1950. John Wiley & Sons.
Description: Exposition of statistical decision theory as a foundations of statistics. Included earlier results of Wald on sequential analysis and the sequential probability ratio test and on Wald's complete class theorem characterizing admissible decision rules as limits of Bayesian procedures.
Importance: Raised the mathematical status of statistical theory and attracted mathematical statisticians like John von Neumann, Aryeh Dvoretzky, Jacob Wolfowitz, Jack C. Kiefer, and David Blackwell, providing greater ties with economic theory and operations research. Spurred further work on decision theory.

Testing Statistical Hypotheses

Author: Erich Leo Lehmann
Publication data: 1959. John Wiley & Sons.
Description: Exposition of statistical hypothesis testing using the statistical decision theory of Abraham Wald, with some use of measure-theoretic probability.
Importance: Made Wald's ideas accessible. Collected and organized many results of statistical theory that were scattered throughout journal articles, civilizing statistics.

Bayesian statistics

An Essay Towards Solving a Problem in the Doctrine of Chances

Author: Thomas Bayes
Publication data: 1763-12-23
Online version: "An Essay towards solving a Problem in the Doctrine of Chances. By the late Rev. Mr. Bayes, F.R.S. communicated by Mr. Price, in a Letter to John Canton, A.M. F.R.S." (PDF). Department of Mathematics, University of York.
Description: In this paper Bayes addresses the problem of using a sequence of identical "trials" to determine the per-trial probability of "success" – the so-called inverse probability problem. It later inspired the theorem that bears his name (Bayes' theorem). See also Pierre Simon de Laplace.
Importance: Topic creator, Breakthrough, Influence

On Small Differences in Sensation

Author: Charles Sanders Peirce and Joseph Jastrow
Publication data: Peirce, Charles Sanders; Jastrow, Joseph (1885). "On Small Differences in Sensation". Memoirs of the National Academy of Sciences. 3: 73–83.
Online version: http://psychclassics.yorku.ca/Peirce/small-diffs.htm
Description: Peirce and Jastrow use logistic regression to estimate subjective probabilities of subjects's judgments of the heavier of two measurements, following a randomized controlled repeated measures design. [1] [2]
Importance: Pioneered elicitation of subjective probabilities. [1] [2]

Truth and Probability

Author: Frank P. Ramsey
Publication data: * Ramsey, Frank Plumpton; "Truth and Probability" (PDF), Chapter VII in The Foundations of Mathematics and other Logical Essays (1931).
Online version: https://web.archive.org/web/20080227205205/http://cepa.newschool.edu/het//texts/ramsey/ramsess.pdf
Description: Ramsey proposes elucidating a person's subjective probability for a proposition using a sequence of bets. Ramsey described his work as an elaboration of some pragmatic ideas of C. S. Peirce, which were expressed in "How to Make Our Ideas Clear".
Importance: Popularized the "Ramsey test" for eliciting subjective probabilities.

Bayesian Inference in Statistical Analysis

Author: George E. P. Box and George C. Tiao
Publication data: Addison Wesley Publishing Co., 1973. Reprinted 1992: Wiley ISBN   0471574287
Description: The first complete analysis of Bayesian Inference for many statistical problems.
Importance: Includes a large body of research on Bayesian analysis for outlier problems, variance components, linear models and multivariate statistics.

Theory of Probability

Author: Bruno de Finetti
Publication data: Two volumes, A.F.M. Smith and A. Machi (trs.), New York: John Wiley & Sons, Inc., 1974, 1975.
Description: The first detailed statement of the operational subjective position, dating from the author's research in the 1920s and 30s.
Importance: Emphasizes exchangeable random variables which are often mixtures of independent random variables. Argues for finitely additive probability measures that need not be countably additive. Emphasizes expectations rather than probability measures.

Introduction to statistical decision theory

Author: John W. Pratt, Howard Raiffa, and Robert Schlaifer
Publication data: preliminary edition, 1965. Cambridge, Mass.: MIT Press, 1995.
Description: Extensive exposition of statistical decision theory, statistics, and decision analysis from a Bayesian standpoint. Many examples and problems come from business and economics.
Importance: Greatly extended the scope of applied Bayesian statistics by using conjugate priors for exponential families. Extensive treatment of sequential decision making, for example mining decisions. For many years, it was required for all doctoral students at Harvard Business School.

Multivariate analysis

An Introduction to Multivariate Analysis

Authors: Theodore W. Anderson
Publication data: 1958, John Wiley
Description:
Importance: This textbook educated a generation of theorists and applied statisticians, emphasizing hypothesis testing via likelihood ratio tests and the properties of power functions: Admissibility, unbiasedness and monotonicity. [3] [4]

Time series

Time Series Analysis Forecasting and Control

Authors: George E.P. Box and Gwilym M. Jenkins
Publication data: Holden-Day, 1970
Description: Systematic approach to ARIMA and ARMAX modelling
Importance: This book introduces ARIMA and associated input-output models, studies how to fit them and develops a methodology for time series forecasting and control. It has changed econometrics, process control and forecasting.

Applied statistics

Statistical Methods for Research Workers

Author: R.A. Fisher
Publication data: Edinburgh: Oliver & Boyd, 1925 (1st edition); London: Macmillan, 1970 (15th edition)
Online version: http://psychclassics.yorku.ca/Fisher/Methods/
Description: The original manual for researchers, especially biologists, on how to statistically evaluate numerical data.
Importance: Hugely influential text by the father of modern statistics that remained in print for more than 50 years. [5] Responsible for the widespread use of tests of statistical significance.

Statistical Methods

Author: George W. Snedecor
Publication data: 1937, Collegiate Press
Description: One of the first comprehensive texts on statistical methods. Reissued as Statistical Methods Applied to Experiments in Agriculture and Biology in 1940 and then again as Statistical Methods with Cochran, WG in 1967. A classic text.
Importance: Influence

Principles and Procedures of Statistics with Special Reference to the Biological Sciences.

Authors: Steel, R.G.D, and Torrie, J. H.
Publication data: McGraw Hill (1960) 481 pages
Description: Excellent introductory text for analysis of variance (one-way, multi-way, factorial, split-plot, and unbalanced designs). Also analysis of co-variance, multiple and partial regression and correlation, non-linear regression, and non-parametric analyses. This book was written before computer programmes were available, so it gives the detail needed to make the calculations manually.Cited in more than 1,381 publications between 1961 and 1975. [6]
Importance: Influence

Biometry: The Principles and Practices of Statistics in Biological Research

Authors: Robert R. Sokal; F. J. Rohlf
Publication data: 1st ed. W. H. Freemann (1969); 2nd ed. W. H. Freemann (1981); 3rd ed. Freeman & Co. (1994)
Description:: Key textbook on Biometry: the application of statistical methods for descriptive, experimental, and analytical study of biological phenomena.
Importance Cited in more than 7,000 publications. [7]

Statistical learning theory

On the uniform convergence of relative frequencies of events to their probabilities

Authors: V. Vapnik, A. Chervonenkis
Publication data: Theory of Probability and Its Applications, 16(2):264–280, 1971 doi : 10.1137/1116025
Description: Computational learning theory, VC theory, statistical uniform convergence and the VC dimension.
Importance: Breakthrough, Influence

Variance component estimation

On the mathematical foundations of theoretical statistics

Author: Fisher, RA
Publication data: 1922, Philosophical Transactions of the Royal Society of London, Series A, volume 222, pages 309–368
Description: First comprehensive treatise of estimation by maximum likelihood. [8]
Importance: Topic creator, Breakthrough, Influence

Estimation of variance and covariance components

Author: Henderson, CR
Publication data: 1953, Biometrics, volume 9, pages 226–252
Description: First description of three methods of estimation of variance components in mixed linear models for unbalanced data. "One of the most frequently cited papers in the scientific literature." [9] [10]
Importance: Topic creator, Breakthrough, Influence

Maximum-likelihood estimation for the mixed analysis of variance model

Author: H. O. Hartley and J. N. K. Rao
Publication data: 1967, Biometrika, volume 54, pages 93-108 doi : 10.1093/biomet/54.1-2.93
Description: First description of maximum likelihood methods for variance component estimation in mixed models
Importance: Topic creator, Breakthrough, Influence

Recovery of inter-block information when block sizes are unequal

Author: Patterson, HD; Thompson, R
Publication data: 1971, Biometrika, volume 58, pages 545-554 doi : 10.1093/biomet/58.3.545
Description: First description of restricted maximum likelihood (REML)
Importance: Topic creator, Breakthrough, Influence

Estimation of Variance and Covariance Components in Linear Models

Author: Rao, CR
Publication data: 1972, Journal of the American Statistical Association, volume 67, pages. 112–115
Description: First description of Minimum Variance Quadratic Unbiased Estimation (MIVQUE) and Minimum Norm Quadratic Unbiased Estimation (MINQUE) for unbalanced data
Importance: Topic creator, Breakthrough, Influence

Survival analysis

Nonparametric estimation from incomplete observations

Author: Kaplan, EL and Meier, P
Publication data: 1958, Journal of the American Statistical Association, volume 53, pages 457–481. JSTOR   2281868
Description: First description of the now ubiquitous Kaplan-Meier estimator of survival functions from data with censored observations
Importance: Breakthrough, Influence

A generalized Wilcoxon test for comparing arbitrarily singly-censored samples

Author: Gehan, EA
Publication data: 1965, Biometrika, volume 52, pages 203–223. doi : 10.1093/biomet/52.1-2.203
Description: First presentation of the extension of the Wilcoxon rank-sum test to censored data
Importance: Influence

Evaluation of survival data and two new rank order statistics arising in its consideration

Author: Mantel, N
Publication data: 1966, Cancer Chemotherapy Reports, volume 50, pages 163–170. PMID   5910392
Description: Development of the logrank test for censored survival data. [11]
Importance: Topic creator, Breakthrough, Influence

Regression Models and Life Tables

Author: Cox, DR
Publication data: 1972, Journal of the Royal Statistical Society, Series B, volume 34, pages 187–220. JSTOR   2985181
Description: Seminal paper introducing semi-parametric proportional hazards models (Cox models) for survival data
Importance: Topic creator, Breakthrough, Influence

The Statistical Analysis of Failure Time Data

Author: Kalbfleisch, JD and Prentice, RL
Publication data: 1980, John Wiley & Sons, New York
Description: First comprehensive text covering the methods of estimation and inference for time to event analyses
Importance: Influence

Meta analysis

Report on Certain Enteric Fever Inoculation Statistics

Author: Pearson, K
Publication data: 1904, British Medical Journal, volume 2, pages 1243-1246 PMID   20761760
Description: Generally considered to be the first synthesis of results from separate studies, although no formal statistical methods for combining results are presented.
Importance: Breakthrough, Influence

The Probability Integral Transformation for Testing Goodness of Fit and Combining Independent Tests of Significance

Author: Pearson, ES
Publication data: 1938 Biometrika, volume 30, pages 134-148 doi : 10.1093/biomet/30.1-2.134
Description: One of the first published methods for formally combining results from different experiments
Importance: Breakthrough, Influence

Combining Independent Tests of Significance

Author: Fisher, RA
Publication data: 1948, The American Statistician, volume 2, page 30
Description: One of the first published methods for formally combining results from different experiments
Importance: Breakthrough, Influence

The combination of estimates from different experiments

Author: Cochran, WG
Publication data: 1954, Biometrics, volume 10, page 101–129
Description: A comprehensive treatment of the various methods for formally combining results from different experiments
Importance: Breakthrough, Influence

Experimental design

On Small Differences in Sensation

Author: Charles Sanders Peirce and Joseph Jastrow
Publication data: Peirce, Charles Sanders; Jastrow, Joseph (1885). "On Small Differences in Sensation". Memoirs of the National Academy of Sciences. 3: 73–83.
Online version: http://psychclassics.yorku.ca/Peirce/small-diffs.htm
Description: Peirce and Jastrow use logistic regression to estimate subjective probabilities of subjects's judgments of the heavier of two measurements, following a randomized controlled repeated measures design. [1] [2]
Importance: The first randomized experiment, which also used blinding; it seems also to have been the first experiment for estimating subjective probabilities. [1] [2]

The Design of Experiments

Author: Fisher, RA
Publication data: 1935, Oliver and Boyd, Edinburgh
Description: The first textbook on experimental design
Importance: Influence [12] [13] [14]

The Design and Analysis of Experiments

Author: Oscar Kempthorne
Publication data: 1950, John Wiley & Sons, New York (Reprinted with corrections in 1979 by Robert E. Krieger)
Description: Early exposition of the general linear model using matrix algebra (following lecture notes of George W. Brown). Bases inference on the randomization distribution objectively defined by the experimental protocol, rather than a so-called "statistical model" expressing the subjective beliefs of a statistician: The normal model is regarded as a convenient approximation to the randomization-distribution, whose quality is assessed by theorems about moments and simulation experiments.
Importance: The first and most extensive discussion of randomation-based inference in the field of design of experiments until the recent 2-volume work by Hinkelmann and Kempthorne; randomization-based inference is called "design-based" inference in survey sampling of finite populations. Introduced the treatment-unit additivity hypothesis, which was discussed in chapter 2 of David R. Cox's book on experiments (1958) and which has influenced Donald Rubin and Paul Rosenbaum's analysis of observational data.

On the Experimental Attainment of Optimum Conditions (with discussion)

Author: George E. P. Box and K. B. Wilson.
Publication data: (1951) Journal of the Royal Statistical Society Series B 13(1):1–45.
Description: Introduced Box-Wilson central composite design for fitting a quadratic polynomial in several variables to experimental data, when an initial affine model had failed to yield a direction of ascent. The design and analysis is motivated by a problem in chemical engineering.
Importance: Introduced response surface methodology for approximating local optima of systems with noisy observations of responses.

See also

Related Research Articles

Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures used to analyze the differences among means. ANOVA was developed by the statistician Ronald Fisher. ANOVA is based on the law of total variance, where the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether two or more population means are equal, and therefore generalizes the t-test beyond two means. In other words, the ANOVA is used to test the difference between two or more means.

Bayesian probability is an interpretation of the concept of probability, in which, instead of frequency or propensity of some phenomenon, probability is interpreted as reasonable expectation representing a state of knowledge or as quantification of a personal belief.

<span class="mw-page-title-main">Design of experiments</span> Design of tasks

The design of experiments, also known as experiment design or experimental design, is the design of any task that aims to describe and explain the variation of information under conditions that are hypothesized to reflect the variation. The term is generally associated with experiments in which the design introduces conditions that directly affect the variation, but may also refer to the design of quasi-experiments, in which natural conditions that influence the variation are selected for observation.

<span class="mw-page-title-main">Frequentist probability</span> Interpretation of probability

Frequentist probability or frequentism is an interpretation of probability; it defines an event's probability as the limit of its relative frequency in infinitely many trials . Probabilities can be found by a repeatable objective process. The continued use of frequentist methods in scientific inference, however, has been called into question.

<span class="mw-page-title-main">Statistics</span> Study of the collection, analysis, interpretation, and presentation of data

Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model to be studied. Populations can be diverse groups of people or objects such as "all people living in a country" or "every atom composing a crystal". Statistics deals with every aspect of data, including the planning of data collection in terms of the design of surveys and experiments.

<span class="mw-page-title-main">Statistical inference</span> Process of using data analysis

Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability. Inferential statistical analysis infers properties of a population, for example by testing hypotheses and deriving estimates. It is assumed that the observed data set is sampled from a larger population.

The theory of statistics provides a basis for the whole range of techniques, in both study design and data analysis, that are used within applications of statistics. The theory covers approaches to statistical-decision problems and to statistical inference, and the actions and deductions that satisfy the basic principles stated for these different approaches. Within a given approach, statistical theory gives ways of comparing statistical procedures; it can find the best possible procedure within a given context for given statistical problems, or can provide guidance on the choice between alternative procedures.

The following outline is provided as an overview of and topical guide to statistics:

<span class="mw-page-title-main">Statistical hypothesis test</span> Method of statistical inference

A statistical hypothesis test is a method of statistical inference used to decide whether the data sufficiently supports a particular hypothesis. A statistical hypothesis test typically involves a calculation of a test statistic. Then a decision is made, either by comparing the test statistic to a critical value or equivalently by evaluating a p-value computed from the test statistic. Roughly 100 specialized statistical tests have been defined.

In statistics, point estimation involves the use of sample data to calculate a single value which is to serve as a "best guess" or "best estimate" of an unknown population parameter. More formally, it is the application of a point estimator to the data to obtain a point estimate.

<span class="mw-page-title-main">Confidence interval</span> Range to estimate an unknown parameter

Informally, in frequentist statistics, a confidence interval (CI) is an interval which is expected to typically contain the parameter being estimated. More specifically, given a confidence level , a CI is a random interval which contains the parameter being estimated % of the time. The confidence level, degree of confidence or confidence coefficient represents the long-run proportion of CIs that theoretically contain the true value of the parameter; this is tantamount to the nominal coverage probability. For example, out of all intervals computed at the 95% level, 95% of them should contain the parameter's true value.

Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability, where probability expresses a degree of belief in an event. The degree of belief may be based on prior knowledge about the event, such as the results of previous experiments, or on personal beliefs about the event. This differs from a number of other interpretations of probability, such as the frequentist interpretation, which views probability as the limit of the relative frequency of an event after many trials. More concretely, analysis in Bayesian methods codifies prior knowledge in the form of a prior distribution.

<span class="mw-page-title-main">Mathematical statistics</span> Branch of statistics

Mathematical statistics is the application of probability theory, a branch of mathematics, to statistics, as opposed to techniques for collecting statistical data. Specific mathematical techniques which are used for this include mathematical analysis, linear algebra, stochastic analysis, differential equations, and measure theory.

<span class="mw-page-title-main">Optimal experimental design</span> Experimental design that is optimal with respect to some statistical criterion

In the design of experiments, optimal experimental designs are a class of experimental designs that are optimal with respect to some statistical criterion. The creation of this field of statistics has been credited to Danish statistician Kirstine Smith.

This glossary of statistics and probability is a list of definitions of terms and concepts used in the mathematical sciences of statistics and probability, their sub-disciplines, and related fields. For additional related terms, see Glossary of mathematics and Glossary of experimental design.

Statistics, in the modern sense of the word, began evolving in the 18th century in response to the novel needs of industrializing sovereign states.

The Foundations of Statistics are the mathematical and philosophical bases for statistical methods. These bases are the theoretical frameworks that ground and justify methods of statistical inference, estimation, hypothesis testing, uncertainty quantification, and the interpretation of statistical conclusions. Further, a foundation can be used to explain statistical paradoxes, provide descriptions of statistical laws, and guide the application of statistics to real-world problems.

Oscar Kempthorne was a British statistician and geneticist known for his research on randomization-analysis and the design of experiments, which had wide influence on research in agriculture, genetics, and other areas of science.

References

  1. 1 2 3 4 Stigler, Stephen M. (March 1978). "Mathematical Statistics in the Early States". Annals of Statistics. 6 (2): 239–265. doi: 10.1214/aos/1176344123 . JSTOR   2958876. MR   0483118.
  2. 1 2 3 4 Stephen M. Stigler (November 1992). "A Historical View of Statistical Concepts in Psychology and Educational Research". American Journal of Education. 101 (1): 60–70. doi:10.1086/444032. S2CID   143685203.
  3. Pages 560-561 in Sen, Pranab Kumar; Anderson, T. W.; Arnold, S. F.; Eaton, M. L.; Giri, N. C.; Gnanadesikan, R.; Kendall, M. G.; Kshirsagar, A. M.; et al. (June 1986). "Review: Contemporary Textbooks on Multivariate Statistical Analysis: A Panoramic Appraisal and Critique". Journal of the American Statistical Association. 81 (394): 560–564. doi:10.2307/2289251. ISSN   0162-1459. JSTOR   2289251.
  4. Schervish, Mark J. (November 1987). "A Review of Multivariate Analysis". Statistical Science. 2 (4): 396–413. doi: 10.1214/ss/1177013111 . ISSN   0883-4237. JSTOR   2245530.
  5. "Statistical Methods for Research Workers". Encyclopædia Britannica, Inc.
  6. "Steel, Robert GD & Torrie, JH. Principles and procedures of statistics" (PDF). Current Contents/Life Sciences. 39: 20. 1977.
  7. "Sokal RR and Rohlf FI. Biometry: the principles and practice of statistics in biological research" (PDF). Current Contents/Agriculture, Biology, Environment. 41: 22. 1982.
  8. Aldrich, John (1997). "R.A. Fisher and the making of maximum likelihood 1912-1922". Statistical Science. 12 (3): 162–176. doi: 10.1214/ss/1030037906 .
  9. Searle, SR (November 1991). "C.R. Henderson, the statistician; and his contributions to variance components estimation". Journal of Dairy Science . 74 (11): 4035–4044. doi: 10.3168/jds.S0022-0302(91)78599-8 . hdl: 1813/31657 . ISSN   0022-0302. PMID   1757641.
  10. "Henderson, CR: Estimation of variance and covariance components" (PDF). Current Contents/Agriculture Biology & Environmental Sciences. 24: 10. 1980.
  11. "Mantel N. Evaluation of survival data and two new rank order statistics arising in its consideration" (PDF). Current Contents/Life Sciences. 8: 19. 1983.
  12. Stanley, J. C. (1966). "The Influence of Fisher's "The Design of Experiments" on Educational Research Thirty Years Later". American Educational Research Journal. 3 (3): 223–229. doi:10.3102/00028312003003223. S2CID   145725524.
  13. Box, JF (February 1980). "R. A. Fisher and the Design of Experiments, 1922-1926". The American Statistician . 34 (1): 1–7. doi:10.2307/2682986. JSTOR   2682986.
  14. Yates, F (June 1964). "Sir Ronald Fisher and the Design of Experiments". Biometrics. 20 (2): 307–321. doi:10.2307/2528399. JSTOR   2528399.