Focused information criterion

Last updated April 01, 2019

In statistics, the focused information criterion (FIC) is a method for selecting the most appropriate model among a set of competitors for a given data set. Unlike most other model selection strategies, like the Akaike information criterion (AIC), the Bayesian information criterion (BIC) and the deviance information criterion (DIC), the FIC does not attempt to assess the overall fit of candidate models but focuses attention directly on the parameter of primary interest with the statistical analysis, say $\mu$ , for which competing models lead to different estimates, say ${\hat {\mu }}_{j}$ for model $j$ . The FIC method consists in first developing an exact or approximate expression for the precision or quality of each estimator, say $r_{j}$ for ${\hat {\mu }}_{j}$ , and then use data to estimate these precision measures, say ${\hat {r}}_{j}$ . In the end the model with best estimated precision is selected. The FIC methodology was developed by Gerda Claeskens and Nils Lid Hjort, first in two 2003 discussion articles in Journal of the American Statistical Association and later on in other papers and in their 2008 book.

Statistics is a branch of mathematics dealing with data collection, organization, analysis, interpretation and presentation. In applying statistics to, for example, a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model process to be studied. Populations can be diverse topics such as "all people living in a country" or "every atom composing a crystal". Statistics deals with all aspects of data, including the planning of data collection in terms of the design of surveys and experiments. See glossary of probability and statistics.

Model selection is the task of selecting a statistical model from a set of candidate models, given data. In the simplest cases, a pre-existing set of data is considered. However, the task can also involve the design of experiments such that the data collected is well-suited to the problem of model selection. Given candidate models of similar predictive or explanatory power, the simplest model is most likely to be the best choice.

The Akaike information criterion (AIC) is an estimator of the relative quality of statistical models for a given set of data. Given a collection of models for the data, AIC estimates the quality of each model, relative to each of the other models. Thus, AIC provides a means for model selection.

The concrete formulae and implementation for FIC depend firstly on the particular parameter of interest, the choice of which does not depend on mathematics but on the scientific and statistical context. Thus the FIC apparatus may be selecting one model as most appropriate for estimating a quantile of a distribution but preferring another model as best for estimating the mean value. Secondly, the FIC formulae depend on the specifics of the models used for the observed data and also on how precision is to be measured. The clearest case is where precision is taken to be mean squared error, say $r_{j}=b_{j}^{2}+\tau _{j}^{2}$ in terms of squared bias and variance for the estimator associated with model $j$ . FIC formulae are then available in a variety of situations, both for handling parametric, semiparametric and nonparametric situations, involving separate estimation of squared bias and variance, leading to estimated precision ${\hat {r}}_{j}$ . In the end the FIC selects the model with smallest estimated mean squared error.

In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator measures the average of the squares of the errors—that is, the average squared difference between the estimated values and what is estimated. MSE is a risk function, corresponding to the expected value of the squared error loss. The fact that MSE is almost always strictly positive is because of randomness or because the estimator does not account for information that could produce a more accurate estimate.

In statistics, the bias of an estimator is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. Otherwise the estimator is said to be biased. In statistics, "bias" is an objective property of an estimator, and while not a desired property, it is not pejorative, unlike the ordinary English use of the term "bias".

In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its mean. Informally, it measures how far a set of (random) numbers are spread out from their average value. Variance has a central role in statistics, where some ideas that use it include descriptive statistics, statistical inference, hypothesis testing, goodness of fit, and Monte Carlo sampling. Variance is an important tool in the sciences, where statistical analysis of data is common. The variance is the square of the standard deviation, the second central moment of a distribution, and the covariance of the random variable with itself, and it is often represented by $,, or .$

Associated with the use of the FIC for selecting a good model is the FIC plot, designed to give a clear and informative picture of all estimates, across all candidate models, and their merit. It displays estimates on the $y$ axis along with FIC scores on the $x$ axis; thus estimates found to the left in the plot are associated with the better models and those found in the middle and to the right stem from models less or not adequate for the purpose of estimating the focus parameter in question.

Generally speaking, complex models (with many parameters relative to sample size) tend to lead to estimators with small bias but high variance; more parsimonious models (with fewer parameters) typically yield estimators with larger bias but smaller variance. The FIC method balances the two desired data of having small bias and small variance in an optimal fashion. The main difficulty lies with the bias $b_{j}$ , as it involves the distance from the expected value of the estimator to the true underlying quantity to be estimated, and the true data generating mechanism may lie outside each of the candidate models.

In situations where there is not a unique focus parameter, but rather a family of such, there are versions of average FIC (AFIC or wFIC) that find the best model in terms of suitably weighted performance measures, e.g. when searching for a regression model to perform particularly well in a portion of the covariate space.

In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables. More specifically, regression analysis helps one understand how the typical value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed.

It is also possible to keep several of the best models on board, ending the statistical analysis with a data-dicated weighted average of the estimators of the best FIC scores, typically giving highest weight to estimators associated with the best FIC scores. Such schemes of model averaging extend the direct FIC selection method.

The FIC methodology applies in particular to selection of variables in different forms of regression analysis, including the framework of generalised linear models and the semiparametric proportional hazards models (i.e. Cox regression).

In statistics, the generalized linear model (GLM) is a flexible generalization of ordinary linear regression that allows for response variables that have error distribution models other than a normal distribution. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.

Related Research Articles

In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule, the quantity of interest and its result are distinguished.

The method of least squares is a standard approach in regression analysis to approximate the solution of overdetermined systems, i.e., sets of equations in which there are more equations than unknowns. "Least squares" means that the overall solution minimizes the sum of the squares of the residuals made in the results of every single equation.

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of a statistical model, given observations. The method obtains the parameter estimates by finding the parameter values that maximize the likelihood function. The estimates are called maximum likelihood estimates, which is also abbreviated as MLE.

Overfitting production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit additional data or predict future observations reliably

In statistics, overfitting is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit additional data or predict future observations reliably". An overfitted model is a statistical model that contains more parameters than can be justified by the data. The essence of overfitting is to have unknowingly extracted some of the residual variation as if that variation represented underlying model structure.

In statistics, ordinary least squares (OLS) is a type of linear least squares method for estimating the unknown parameters in a linear regression model. OLS chooses the parameters of a linear function of a set of explanatory variables by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable in the given dataset and those predicted by the linear function.

Most of the terms listed in Wikipedia glossaries are already defined and explained within Wikipedia itself. However, glossaries like this one are useful for looking up, comparing and reviewing large numbers of terms together. You can help enhance this page by adding new terms or writing definitions for existing ones.

In statistics, the Bayesian information criterion (BIC) or Schwarz information criterion is a criterion for model selection among a finite set of models; the model with the lowest BIC is preferred. It is based, in part, on the likelihood function and it is closely related to the Akaike information criterion (AIC).

The deviance information criterion (DIC) is a hierarchical modeling generalization of the Akaike information criterion (AIC). It is particularly useful in Bayesian model selection problems where the posterior distributions of the models have been obtained by Markov chain Monte Carlo (MCMC) simulation. DIC is an asymptotic approximation as the sample size becomes large, like AIC. It is only valid when the posterior distribution is approximately multivariate normal.

In statistics, a generalized additive model (GAM) is a generalized linear model in which the linear predictor depends linearly on unknown smooth functions of some predictor variables, and interest focuses on inference about these smooth functions. GAMs were originally developed by Trevor Hastie and Robert Tibshirani to blend properties of generalized linear models with additive models.

In statistics, resampling is any of a variety of methods for doing one of the following:

Estimating the precision of sample statistics by using subsets of available data (jackknifing) or drawing randomly with replacement from a set of data points (bootstrapping)
Exchanging labels on data points when performing significance tests
Validating models by using random subsets

In statistics, semiparametric regression includes regression models that combine parametric and nonparametric models. They are often used in situations where the fully nonparametric model may not perform well or when the researcher wants to use a parametric model but the functional form with respect to a subset of the regressors or the density of the errors is not known. Semiparametric regression models are a particular type of semiparametric modelling and, since semiparametric models contain a parametric component, they rely on parametric assumptions and so may be misspecified, just like a fully parametric model.

Bootstrapping (statistics) statistical method

In statistics, bootstrapping is any test or metric that relies on random sampling with replacement. Bootstrapping allows assigning measures of accuracy to sample estimates. This technique allows estimation of the sampling distribution of almost any statistic using random sampling methods. Generally, it falls in the broader class of resampling methods.

In statistics, Mallows’s C_p, named for Colin Lingwood Mallows, is used to assess the fit of a regression model that has been estimated using ordinary least squares. It is applied in the context of model selection, where a number of predictor variables are available for predicting some outcome, and the goal is to find the best model involving a subset of these predictors. A small value of C_p means that the model is relatively precise.

In statistics, the Hannan–Quinn information criterion (HQC) is a criterion for model selection. It is an alternative to Akaike information criterion (AIC) and Bayesian information criterion (BIC). It is given as

In statistics, a generalized estimating equation (GEE) is used to estimate the parameters of a generalized linear model with a possible unknown correlation between outcomes.

In statistics, the t-statistic is the ratio of the departure of the estimated value of a parameter from its hypothesized value to its standard error. It is used in hypothesis testing via Student's t-test. For example, it is used in estimating the population mean from a sampling distribution of sample means if the population standard deviation is unknown.

In statistics, an adaptive estimator is an estimator in a parametric or semiparametric model with nuisance parameters such that the presence of these nuisance parameters does not affect efficiency of estimation.

References

Claeskens, G. and Hjort, N.L. (2003). "The focused information criterion" (with discussion). Journal of the American Statistical Association , volume 98, pp. 879–899. doi : 10.1198/016214503000000819
Hjort, N.L. and Claeskens, G. (2003). "Frequentist model average estimators" (with discussion). Journal of the American Statistical Association , volume 98, pp. 900–916. doi : 10.1198/016214503000000828
Hjort, N.L. and Claeskens, G. (2006). "Focused information criteria and model averaging for the Cox hazard regression model." Journal of the American Statistical Association , volume 101, pp. 1449–1464. doi : 10.1198/016214506000000069
Claeskens, G. and Hjort, N.L. (2008). Model Selection and Model Averaging. Cambridge University Press.

Gerda Claeskens is a Belgian statistician. She is a professor of statistics in the Faculty of Economics and Business at KU Leuven, associated with the KU Research Centre for Operations Research and Business Statistics (ORSTAT).

The Journal of the American Statistical Association (JASA) is the primary journal published by the American Statistical Association, the main professional body for statisticians in the United States. It is published four times a year.

In computing, a Digital Object Identifier or DOI is a persistent identifier or handle used to uniquely identify objects, standardized by the International Organization for Standardization (ISO). An implementation of the Handle System, DOIs are in wide use mainly to identify academic, professional, and government information, such as journal articles, research reports and data sets, and official publications though they also have been used to identify other types of information resources, such as commercial videos.

External links

Interview on frequentist model averaging with Essential Science Indicators
Webpage for Model Selection and Model Averaging the Claeskens and Hjort book

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

Focused information criterion

Contents

See also

Related Research Articles

References

External links