Reification (statistics)

Last updated

In statistics, reification is the use of an idealized model of a statistical process. The model is then used to make inferences connecting model results, which imperfectly represent the actual process, with experimental observations.

Also, [1] a process whereby model-derived quantities such as principal components, factors and latent variables are identified, named and treated as if they were directly measurable quantities.

Notes

  1. Everitt, 2002

Related Research Articles

Observational error is the difference between a measured value of a quantity and its unknown true value. Such errors are inherent in the measurement process; for example lengths measured with a ruler calibrated in whole centimeters will have a measurement error of several millimeters. The error or uncertainty of a measurement can be estimated, and is specified with the measurement as, for example, 32.3 ± 0.5 cm.

A statistic (singular) or sample statistic is any quantity computed from values in a sample which is considered for a statistical purpose. Statistical purposes include estimating a population parameter, describing a sample, or evaluating a hypothesis. The average of sample values is a statistic. The term statistic is used both for the function and for the value of the function on a given sample. When a statistic is being used for a specific purpose, it may be referred to by a name indicating its purpose.

Accuracy and precision are two measures of observational error. Accuracy is how close a given set of measurements are to their true value, while precision is how close the measurements are to each other.

<span class="mw-page-title-main">Standard score</span> How many standard deviations apart from the mean an observed datum is

In statistics, the standard score is the number of standard deviations by which the value of a raw score is above or below the mean value of what is being observed or measured. Raw scores above the mean have positive standard scores, while those below the mean have negative standard scores.

<span class="mw-page-title-main">Statistical parameter</span> Quantity that indexes a parametrized family of probability distributions

In statistics, as opposed to its general use in mathematics, a parameter is any quantity of a statistical population that summarizes or describes an aspect of the population, such as a mean or a standard deviation. If a population exactly follows a known and defined distribution, for example the normal distribution, then a small set of parameters can be measured which completely describes the population, and can be considered to define a probability distribution for the purposes of extracting samples from this population.

<span class="mw-page-title-main">Dependent and independent variables</span> Concept in mathematical modeling, statistical modeling and experimental sciences

A variable is considered dependent if it depends on an independent variable. Dependent variables are studied under the supposition or demand that they depend, by some law or rule, on the values of other variables. Independent variables, in turn, are not seen as depending on any other variable in the scope of the experiment in question. In this sense, some common independent variables are time, space, density, mass, fluid flow rate, and previous values of some observed value of interest to predict future values.

The Akaike information criterion (AIC) is an estimator of prediction error and thereby relative quality of statistical models for a given set of data. Given a collection of models for the data, AIC estimates the quality of each model, relative to each of the other models. Thus, AIC provides a means for model selection.

<span class="mw-page-title-main">Mathematical statistics</span> Branch of statistics

Mathematical statistics is the application of probability theory, a branch of mathematics, to statistics, as opposed to techniques for collecting statistical data. Specific mathematical techniques which are used for this include mathematical analysis, linear algebra, stochastic analysis, differential equations, and measure theory.

In statistical data analysis the total sum of squares is a quantity that appears as part of a standard way of presenting results of such analyses. For a set of observations, , it is defined as the sum over all squared differences between the observations and their overall mean .:

In probability theory and statistics, coherence can have several different meanings. Coherence in statistics is an indication of the quality of the information, either within a single data set, or between similar but not identical data sets. Fully coherent data are logically consistent and can be reliably combined for analysis.

A latent variable model is a statistical model that relates a set of observable variables to a set of latent variables.

In weather forecasting, model output statistics (MOS) is a multiple linear regression technique in which predictands, often near-surface quantities, are related statistically to one or more predictors. The predictors are typically forecasts from a numerical weather prediction (NWP) model, climatic data, and, if applicable, recent surface observations. Thus, output from NWP models can be transformed by the MOS technique into sensible weather parameters that are familiar to a layperson.

Youden's J statistic is a single statistic that captures the performance of a dichotomous diagnostic test. (Bookmaker) Informedness is its generalization to the multiclass case and estimates the probability of an informed decision.

<span class="mw-page-title-main">Data</span> Units of information

In common usage data is a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted formally. A datum is an individual value in a collection of data. Data is usually organized into structures such as tables that provide additional context and meaning, and which may themselves be used as data in larger structures. Data may be used as variables in a computational process. Data may represent abstract ideas or concrete measurements. Data is commonly used in scientific research, economics, and in virtually every other form of human organizational activity. Examples of data sets include price indices, unemployment rates, literacy rates, and census data. In this context, data represents the raw facts and figures from which useful information can be extracted.

In the theory of stochastic processes in probability theory and statistics, a nuisance variable is a random variable that is fundamental to the probabilistic model, but that is of no particular interest in itself or is no longer of any interest: one such usage arises for the Chapman–Kolmogorov equation. For example, a model for a stochastic process may be defined conceptually using intermediate variables that are not observed in practice. If the problem is to derive the theoretical properties, such as the mean, variance and covariances of quantities that would be observed, then the intermediate variables are nuisance variables.

The item–total correlation test arises in psychometrics in contexts where a number of tests or questions are given to an individual and where the problem is to construct a useful single quantity for each individual that can be used to compare that individual with others in a given population. The test is used to see whether any of the tests or questions ("items") do not have responses that vary in line with those for other tests across the population. The summary measure would be an average of some form, weighted where necessary, and the item-correlation test is used to decide whether or not responses to a given test should be included in the set being averaged. In some fields of application such a summary measure is called a scale.

In statistics, efficiency is a measure of quality of an estimator, of an experimental design, or of a hypothesis testing procedure. Essentially, a more efficient estimator needs fewer input data or observations than a less efficient one to achieve the Cramér–Rao bound. An efficient estimator is characterized by having the smallest possible variance, indicating that there is a small deviance between the estimated value and the "true" value in the L2 norm sense.

In statistics, a regression diagnostic is one of a set of procedures available for regression analysis that seek to assess the validity of a model in any of a number of different ways. This assessment may be an exploration of the model's underlying statistical assumptions, an examination of the structure of the model by considering formulations that have fewer, more or different explanatory variables, or a study of subgroups of observations, looking for those that are either poorly represented by the model (outliers) or that have a relatively large effect on the regression model's predictions.

Sophia Rabe-Hesketh is a statistician who works as a professor in the Department of Educational Statistics and Biostatistics at the University of California, Berkeley. Her research involves the development of generalized linear mixed models of data that incorporate latent variables to handle hidden data.

References