Statistical theory

Last updated

The theory of statistics provides a basis for the whole range of techniques, in both study design and data analysis, that are used within applications of statistics. [1] [2] The theory covers approaches to statistical-decision problems and to statistical inference, and the actions and deductions that satisfy the basic principles stated for these different approaches. Within a given approach, statistical theory gives ways of comparing statistical procedures; it can find a best possible procedure within a given context for given statistical problems, or can provide guidance on the choice between alternative procedures. [2] [3]

Contents

Apart from philosophical considerations about how to make statistical inferences and decisions, much of statistical theory consists of mathematical statistics, and is closely linked to probability theory, to utility theory, and to optimization.

Scope

Statistical theory provides an underlying rationale and provides a consistent basis for the choice of methodology used in applied statistics.

Modelling

Statistical models describe the sources of data and can have different types of formulation corresponding to these sources and to the problem being studied. Such problems can be of various kinds:

Statistical models, once specified, can be tested to see whether they provide useful inferences for new data sets. [4]

Data collection

Statistical theory provides a guide to comparing methods of data collection, where the problem is to generate informative data using optimization and randomization while measuring and controlling for observational error. [5] [6] [7] Optimization of data collection reduces the cost of data while satisfying statistical goals, [8] [9] while randomization allows reliable inferences. Statistical theory provides a basis for good data collection and the structuring of investigations in the topics of:

Summarising data

The task of summarising statistical data in conventional forms (also known as descriptive statistics) is considered in theoretical statistics as a problem of defining what aspects of statistical samples need to be described and how well they can be described from a typically limited sample of data. Thus the problems theoretical statistics considers include:

Interpreting data

Besides the philosophy underlying statistical inference, statistical theory has the task of considering the types of questions that data analysts might want to ask about the problems they are studying and of providing data analytic techniques for answering them. Some of these tasks are:

When a statistical procedure has been specified in the study protocol, then statistical theory provides well-defined probability statements for the method when applied to all populations that could have arisen from the randomization used to generate the data. This provides an objective way of estimating parameters, estimating confidence intervals, testing hypotheses, and selecting the best. Even for observational data, statistical theory provides a way of calculating a value that can be used to interpret a sample of data from a population, it can provide a means of indicating how well that value is determined by the sample, and thus a means of saying corresponding values derived for different populations are as different as they might seem; however, the reliability of inferences from post-hoc observational data is often worse than for planned randomized generation of data.

Applied statistical inference

Statistical theory provides the basis for a number of data-analytic approaches that are common across scientific and social research. Interpreting data is done with one of the following approaches:

Many of the standard methods for those approaches rely on certain statistical assumptions (made in the derivation of the methodology) actually holding in practice. Statistical theory studies the consequences of departures from these assumptions. In addition it provides a range of robust statistical techniques that are less dependent on assumptions, and it provides methods checking whether particular assumptions are reasonable for a given data set.

See also

Related Research Articles

Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures used to analyze the differences among means. ANOVA was developed by the statistician Ronald Fisher. ANOVA is based on the law of total variance, where the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether two or more population means are equal, and therefore generalizes the t-test beyond two means. In other words, the ANOVA is used to test the difference between two or more means.

Bayesian probability is an interpretation of the concept of probability, in which, instead of frequency or propensity of some phenomenon, probability is interpreted as reasonable expectation representing a state of knowledge or as quantification of a personal belief.

Statistics Study of the collection, analysis, interpretation, and presentation of data

Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model to be studied. Populations can be diverse groups of people or objects such as "all people living in a country" or "every atom composing a crystal". Statistics deals with every aspect of data, including the planning of data collection in terms of the design of surveys and experiments.

Statistical inference Process of using data analysis

Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability. Inferential statistical analysis infers properties of a population, for example by testing hypotheses and deriving estimates. It is assumed that the observed data set is sampled from a larger population.

In statistics, survey sampling describes the process of selecting a sample of elements from a target population to conduct a survey. The term "survey" may refer to many different types or techniques of observation. In survey sampling it most often involves a questionnaire used to measure the characteristics and/or attitudes of people. Different ways of contacting members of a sample once they have been selected is the subject of survey data collection. The purpose of sampling is to reduce the cost and/or the amount of work that it would take to survey the entire target population. A survey that measures the entire target population is called a census. A sample refers to a group or section of a population from which information is to be obtained

Statistics, like all mathematical disciplines, does not infer valid conclusions from nothing. Inferring interesting conclusions about real statistical populations almost always requires some background assumptions. Those assumptions must be made carefully, because incorrect assumptions can generate wildly inaccurate conclusions.

A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis.

Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is an important technique in statistics, and especially in mathematical statistics. Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range of activities, including science, engineering, philosophy, medicine, sport, and law. In the philosophy of decision theory, Bayesian inference is closely related to subjective probability, often called "Bayesian probability".

In statistics, interval estimation is the use of sample data to estimate an interval of plausible values of a parameter of interest. This is in contrast to point estimation, which gives a single value.

Nonparametric statistics is the branch of statistics that is not based solely on parametrized families of probability distributions. Nonparametric statistics is based on either being distribution-free or having a specified distribution but with the distribution's parameters unspecified. Nonparametric statistics includes both descriptive statistics and statistical inference. Nonparametric tests are often used when the assumptions of parametric tests are violated.

In inferential statistics, the null hypothesis is that two possibilities are the same. The null hypothesis is that the observed difference is due to chance alone. Using statistical tests, it is possible to calculate the likelihood that the null hypothesis is true.

Mathematical statistics Branch of statistics

Mathematical statistics is the application of probability theory, a branch of mathematics, to statistics, as opposed to techniques for collecting statistical data. Specific mathematical techniques which are used for this include mathematical analysis, linear algebra, stochastic analysis, differential equations, and measure theory.

Random assignment or random placement is an experimental technique for assigning human participants or animal subjects to different groups in an experiment using randomization, such as by a chance procedure or a random number generator. This ensures that each participant or subject has an equal chance of being placed in any group. Random assignment of participants helps to ensure that any differences between and within the groups are not systematic at the outset of the experiment. Thus, any differences between groups recorded at the end of the experiment can be more confidently attributed to the experimental procedures or treatment.

Multiple comparisons problem Problem where one considers a set of inferences simultaneously based on the observed values

In statistics, the multiple comparisons, multiplicity or multiple testing problem occurs when one considers a set of statistical inferences simultaneously or infers a subset of parameters selected based on the observed values.

Statistics, in the modern sense of the word, began evolving in the 18th century in response to the novel needs of industrializing sovereign states. The evolution of statistics was, in particular, intimately connected with the development of European states following the peace of Westphalia (1648), and with the development of probability theory, which put statistics on a firm theoretical basis.

The foundations of statistics concern the epistemological debate in statistics over how one should conduct inductive inference from data. Among the issues considered in statistical inference are the question of Bayesian inference versus frequentist inference, the distinction between Fisher's "significance testing" and Neyman–Pearson "hypothesis testing", and whether the likelihood principle should be followed. Some of these issues have been debated for up to 200 years without resolution.

Frequentist inference is a type of statistical inference based in frequentist probability, which treats “probability” in equivalent terms to “frequency” and draws conclusions from sample-data by means of emphasizing the frequency or proportion of findings in the data. Frequentist-inference underlies frequentist statistics, in which the well-established methodologies of statistical hypothesis testing and confidence intervals are founded.

Statistical proof is the rational demonstration of degree of certainty for a proposition, hypothesis or theory that is used to convince others subsequent to a statistical test of the supporting evidence and the types of inferences that can be drawn from the test scores. Statistical methods are used to increase the understanding of the facts and the proof demonstrates the validity and logic of inference with explicit reference to a hypothesis, the experimental data, the facts, the test, and the odds. Proof has two essential aims: the first is to convince and the second is to explain the proposition through peer and public review.

Oscar Kempthorne was a British statistician and geneticist known for his research on randomization-analysis and the design of experiments, which had wide influence on research in agriculture, genetics, and other areas of science.

References

Citations

  1. Cox & Hinkley (1974, p.1)
  2. 1 2 Rao, C. R. (1981). "Foreword". In Arthanari, T. S.; Dodge, Yadolah (eds.). Mathematical Programming in Statistics. New York: John Wiley & Sons. pp. vii–viii. ISBN   0-471-08073-X. MR   0607328.
  3. Lehmann & Romano (2005)
  4. Freedman (2009)
  5. Charles Sanders Peirce and Joseph Jastrow (1885). "On Small Differences in Sensation". Memoirs of the National Academy of Sciences. 3: 73–83. http://psychclassics.yorku.ca/Peirce/small-diffs.htm
  6. Hacking, Ian (September 1988). "Telepathy: Origins of Randomization in Experimental Design". Isis . 79 (3): 427–451. doi:10.1086/354775. JSTOR   234674. MR   1013489.
  7. Stephen M. Stigler (November 1992). "A Historical View of Statistical Concepts in Psychology and Educational Research". American Journal of Education . 101 (1): 60–70. doi:10.1086/444032.
  8. 1 2 Atkinson et al. (2007)
  9. Kiefer, Jack Carl (1985). Brown, Lawrence D.; Olkin, Ingram; Sacks, Jerome; et al. (eds.). Jack Carl Kiefer: Collected papers III—Design of experiments. Springer-Verlag and the Institute of Mathematical Statistics. pp. 718+xxv. ISBN   0-387-96004-X.
  10. Hinkelmann & Kempthorne (2008)
  11. Bailey (2008).
  12. Kish (1965)
  13. Cochran (1977)
  14. Särndal et al. (1992)

Sources

Further reading