This article does not cite any sources .(March 2019) (Learn how and when to remove this template message) |

Statistics |
---|

** Statistics ** is a field of inquiry that studies the collection, analysis, interpretation, and presentation of data. It is applicable to a wide variety of academic disciplines, from the physical and social sciences to the humanities; it is also used and misused for making informed decisions in all areas of business and government.

- Nature of statistics
- History of statistics
- Describing data
- Experiments and surveys
- Sampling
- Analysing data
- Filtering data
- Statistical inference
- Probability distributions
- Random variables
- Probability theory
- Computational statistics
- Statistics software
- Statistics organizations
- Statistics publications
- Persons influential in the field of statistics
- See also

Statistics can be described as all of the following:

- An academic discipline: one with academic departments, curricula and degrees; national and international societies; and specialized journals.
- A scientific field (a branch of science) – widely recognized category of specialized expertise within science, and typically embodies its own terminology and nomenclature. Such a field will usually be represented by one or more scientific journals, where peer reviewed research is published.
- A formal science – branch of knowledge concerned with formal systems.
- A mathematical science – field of science that is primarily mathematical in nature but may not be universally considered subfields of mathematics proper. Statistics, for example, is mathematical in its methods but grew out of political arithmetic which merged with inverse probability and grew through applications in the social sciences and some areas of physics and biometrics to become its own separate, though closely allied, field.

**Statistics** is the discipline that concerns the collection, organization, analysis, interpretation and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model to be studied. Populations can be diverse groups of people or objects such as "all people living in a country" or "every atom composing a crystal". Statistics deals with every aspect of data, including the planning of data collection in terms of the design of surveys and experiments. See glossary of probability and statistics.

**Statistical inference** is the process of using data analysis to deduce properties of an underlying distribution of probability. Inferential statistical analysis infers properties of a population, for example by testing hypotheses and deriving estimates. It is assumed that the observed data set is sampled from a larger population.

In statistics, **maximum likelihood estimation** (**MLE**) is a method of estimating the parameters of a probability distribution by maximizing a likelihood function, so that under the assumed statistical model the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

In statistics, **point estimation** involves the use of sample data to calculate a single value which is to serve as a "best guess" or "best estimate" of an unknown population parameter. More formally, it is the application of a point estimator to the data to obtain a point estimate.

**Nonparametric statistics** is the branch of statistics that is not based solely on parametrized families of probability distributions. Nonparametric statistics is based on either being distribution-free or having a specified distribution but with the distribution's parameters unspecified. Nonparametric statistics includes both descriptive statistics and statistical inference.

In statistics, a vector of random variables is heteroscedastic if the variability of the random disturbance is different across elements of the vector. Here, variability could be quantified by the variance or any other measure of statistical dispersion. Thus heteroscedasticity is the absence of homoscedasticity. A typical example is the set of observations of income in different cities.

**Mathematical statistics** is the application of probability theory, a branch of mathematics, to statistics, as opposed to techniques for collecting statistical data. Specific mathematical techniques which are used for this include mathematical analysis, linear algebra, stochastic analysis, differential equations, and measure theory.

**Estimation theory** is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value affects the distribution of the measured data. An estimator attempts to approximate the unknown parameters using the measurements.

*Most of the terms listed in Wikipedia glossaries are already defined and explained within Wikipedia itself. However, glossaries like this one are useful for looking up, comparing and reviewing large numbers of terms together. You can help enhance this page by adding new terms or writing definitions for existing ones.*

**Robust statistics** are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal. Robust statistical methods have been developed for many common problems, such as estimating location, scale, and regression parameters. One motivation is to produce statistical methods that are not unduly affected by outliers. Another motivation is to provide methods with good performance when there are small departures from parametric distribution. For example, robust methods work well for mixtures of two normal distributions with different standard-deviations; under this model, non-robust methods like a t-test work poorly.

In statistics, **resampling** is any of a variety of methods for doing one of the following:

- Estimating the precision of sample statistics by using subsets of available data (
**jackknifing**) or drawing randomly with replacement from a set of data points (**bootstrapping**) - Exchanging labels on data points when performing significance tests
- Validating models by using random subsets

In statistics, **M-estimators** are a broad class of extremum estimators for which the objective function is a sample average. Both non-linear least squares and maximum likelihood estimation are special cases of M-estimators. The definition of M-estimators was motivated by robust statistics, which contributed new types of M-estimators. The statistical procedure of evaluating an M-estimator on a data set is called **M-estimation**.

**Gauss Moutinho Cordeiro** is a Brazilian engineer, mathematician and statistician who has made significant contributions to the theory of statistical inference, mainly through asymptotic theory and applied probability.

In statistics, **bootstrapping** is any test or metric that relies on random sampling with replacement. Bootstrapping allows assigning measures of accuracy to sample estimates. This technique allows estimation of the sampling distribution of almost any statistic using random sampling methods. Generally, it falls in the broader class of resampling methods.

The term **kernel** is used in statistical analysis to refer to a window function. The term "kernel" has several distinct meanings in different branches of statistics.

**Approximate Bayesian computation** (**ABC**) constitutes a class of computational methods rooted in Bayesian statistics that can be used to estimate the posterior distributions of model parameters.

The **Heckman correction** is a statistical technique to correct bias from non-randomly selected samples or otherwise incidentally truncated dependent variables, a pervasive issue in quantitative social sciences when using observational data. Conceptually, this is achieved by explicitly modelling the individual sampling probability of each observation together with the conditional expectation of the dependent variable. The resulting likelihood function is mathematically similar to the Tobit model for censored dependent variables, a connection first drawn by James Heckman in 1976. Heckman also developed a two-step control function approach to estimate this model, which reduced the computional burden of having to estimate both equations jointly, albeit at the cost of inefficiency. Heckman received the Nobel Memorial Prize in Economic Sciences in 2000 for his work in this field.

In statistics, **linear regression** is a linear approach to modeling the relationship between a scalar response and one or more explanatory variables. The case of one explanatory variable is called simple linear regression. For more than one explanatory variable, the process is called **multiple linear regression**. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable.

This page is based on this Wikipedia article

Text is available under the CC BY-SA 4.0 license; additional terms may apply.

Images, videos and audio are available under their respective licenses.

Text is available under the CC BY-SA 4.0 license; additional terms may apply.

Images, videos and audio are available under their respective licenses.