Commonality analysis

Last updated

Commonality analysis is a statistical technique within multiple linear regression that decomposes a model's R2 statistic (i.e., explained variance) by all independent variables on a dependent variable in a multiple linear regression model into commonality coefficients. [1] [2] These coefficients are variance components that are uniquely explained by each independent variable (i.e., unique effects), [note 1] and variance components that are shared in each possible combination of the independent variables (i.e., common effects). These commonality coefficients sum up to the total variance explained (model R2) of all the independent variables on the dependent variable. Commonality analysis produces 2k  1 commonality coefficients, where k is the number of the independent variables.

Contents

Example

As an illustrative example, in the case of three independent variables (A, B, and C), commonality returns 7 (23  1) coefficients:

The unique coefficient indicates to which degree the variable is independently associated with the dependent variable. Positive commonality coefficients indicate that a part of the explained variance of the dependent variable is shared between independent variables. Negative commonality coefficients indicate that there is a suppressor effects between independent variables.

Calculation

The calculation of commonality coefficients can be done in principle with any software that calculates R2 (e.g., in SPSS; see [3] ), however, this becomes quickly burdensome as number of independent variable increases. For example, with 10 independent variables, there are 210  1 = 1023 commonality coefficients to be calculated. The yhat package [4] in R can be used to calculate commonality coefficients, and to produce bootstrapped confidence intervals for commonality coefficients.

Notes

  1. Commonality coefficients for the unique effects of the predictors are also known as uniqueness coefficients. [1] The uniqueness coefficient of a given independent variable is equal to the square of the semipartial correlation of that independent variable with the dependent variable. [1]

Related Research Articles

Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable, i.e., multivariate random variables. Multivariate statistics concerns understanding the different aims and background of each of the different forms of multivariate analysis, and how they relate to each other. The practical application of multivariate statistics to a particular problem may involve several types of univariate and multivariate analyses in order to understand the relationships between variables and their relevance to the problem being studied.

Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. For example, it is possible that variations in six observed variables mainly reflect the variations in two unobserved (underlying) variables. Factor analysis searches for such joint variations in response to unobserved latent variables. The observed variables are modelled as linear combinations of the potential factors plus "error" terms, hence factor analysis can be thought of as a special case of errors-in-variables models.

Analysis of covariance (ANCOVA) is a general linear model which blends ANOVA and regression. ANCOVA evaluates whether the means of a dependent variable (DV) are equal across levels of a categorical independent variable (IV) often called a treatment, while statistically controlling for the effects of other continuous variables that are not of primary interest, known as covariates (CV) or nuisance variables. Mathematically, ANCOVA decomposes the variance in the DV into variance explained by the CV(s), variance explained by the categorical IV, and residual variance. Intuitively, ANCOVA can be thought of as 'adjusting' the DV by the group means of the CV(s).

In statistics, path analysis is used to describe the directed dependencies among a set of variables. This includes models equivalent to any form of multiple regression analysis, factor analysis, canonical correlation analysis, discriminant analysis, as well as more general families of models in the multivariate analysis of variance and covariance analyses.

<span class="mw-page-title-main">Regression analysis</span> Set of statistical processes for estimating the relationships among variables

In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable and one or more independent variables. The most common form of regression analysis is linear regression, in which one finds the line that most closely fits the data according to a specific mathematical criterion. For example, the method of ordinary least squares computes the unique line that minimizes the sum of squared differences between the true data and that line. For specific mathematical reasons, this allows the researcher to estimate the conditional expectation of the dependent variable when the independent variables take on a given set of values. Less common forms of regression use slightly different procedures to estimate alternative location parameters or estimate the conditional expectation across a broader collection of non-linear models.

In statistics, the coefficient of multiple correlation is a measure of how well a given variable can be predicted using a linear function of a set of other variables. It is the correlation between the variable's values and the best predictions that can be computed linearly from the predictive variables.

Linear discriminant analysis (LDA), normal discriminant analysis (NDA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistics and other fields, to find a linear combination of features that characterizes or separates two or more classes of objects or events. The resulting combination may be used as a linear classifier, or, more commonly, for dimensionality reduction before later classification.

In statistics, multicollinearity is a phenomenon in which one predictor variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy. In this situation, the coefficient estimates of the multiple regression may change erratically in response to small changes in the model or the data. Multicollinearity does not reduce the predictive power or reliability of the model as a whole, at least within the sample data set; it only affects calculations regarding individual predictors. That is, a multivariable regression model with collinear predictors can indicate how well the entire bundle of predictors predicts the outcome variable, but it may not give valid results about any individual predictor, or about which predictors are redundant with respect to others.

<span class="mw-page-title-main">Coefficient of determination</span> Indicator for how well data points fit a line or curve

In statistics, the coefficient of determination, denoted R2 or r2 and pronounced "R squared", is the proportion of the variation in the dependent variable that is predictable from the independent variable(s).

Multilevel models are statistical models of parameters that vary at more than one level. An example could be a model of student performance that contains measures for individual students as well as measures for classrooms within which the students are grouped. These models can be seen as generalizations of linear models, although they can also extend to non-linear models. These models became much more popular after sufficient computing power and software became available.

In statistics, the Durbin–Watson statistic is a test statistic used to detect the presence of autocorrelation at lag 1 in the residuals from a regression analysis. It is named after James Durbin and Geoffrey Watson. The small sample distribution of this ratio was derived by John von Neumann. Durbin and Watson applied this statistic to the residuals from least squares regressions, and developed bounds tests for the null hypothesis that the errors are serially uncorrelated against the alternative that they follow a first order autoregressive process. Note that the distribution of this test statistic does not depend on the estimated regression coefficients and the variance of the errors.

Omnibus tests are a kind of statistical test. They test whether the explained variance in a set of data is significantly greater than the unexplained variance, overall. One example is the F-test in the analysis of variance. There can be legitimate significant effects within a model even if the omnibus test is not significant. For instance, in a model with two independent variables, if only one variable exerts a significant effect on the dependent variable and the other does not, then the omnibus test may be non-significant. This fact does not affect the conclusions that may be drawn from the one significant variable. In order to test effects within an omnibus test, researchers often use contrasts.

<span class="mw-page-title-main">Mediation (statistics)</span> Statistical model

In statistics, a mediation model seeks to identify and explain the mechanism or process that underlies an observed relationship between an independent variable and a dependent variable via the inclusion of a third hypothetical variable, known as a mediator variable. Rather than a direct causal relationship between the independent variable and the dependent variable, a mediation model proposes that the independent variable influences the mediator variable, which in turn influences the dependent variable. Thus, the mediator variable serves to clarify the nature of the relationship between the independent and dependent variables.

In statistics, standardized (regression) coefficients, also called beta coefficients or beta weights, are the estimates resulting from a regression analysis where the underlying data have been standardized so that the variances of dependent and independent variables are equal to 1. Therefore, standardized coefficients are unitless and refer to how many standard deviations a dependent variable will change, per standard deviation increase in the predictor variable.

In statistics, the variance inflation factor (VIF) is the ratio (quotient) of the variance of estimating some parameter in a model that includes multiple other terms (parameters) by the variance of a model constructed using only one term. It quantifies the severity of multicollinearity in an ordinary least squares regression analysis. It provides an index that measures how much the variance of an estimated regression coefficient is increased because of collinearity. Cuthbert Daniel claims to have invented the concept behind the variance inflation factor, but did not come up with the name.

In statistics, a generalized estimating equation (GEE) is used to estimate the parameters of a generalized linear model with a possible unmeasured correlation between observations from different timepoints. Although some believe that Generalized estimating equations are robust in everything even with the wrong choice of working-correlation matrix, Generalized estimating equations are only robust to loss of consistency with the wrong choice.

In statistics and regression analysis, moderation occurs when the relationship between two variables depends on a third variable. The third variable is referred to as the moderator variable or simply the moderator. The effect of a moderating variable is characterized statistically as an interaction; that is, a categorical or continuous variable that is associated with the direction and/or magnitude of the relation between dependent and independent variables. Specifically within a correlational analysis framework, a moderator is a third variable that affects the zero-order correlation between two other variables, or the value of the slope of the dependent variable on the independent variable. In analysis of variance (ANOVA) terms, a basic moderator effect can be represented as an interaction between a focal independent variable and a factor that specifies the appropriate conditions for its operation.

The following outline is provided as an overview of and topical guide to regression analysis:

In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables. The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable.

Quasi-variance (qv) estimates are a statistical approach that is suitable for communicating the effects of a categorical explanatory variable within a statistical model. In standard statistical models the effects of a categorical explanatory variable are assessed by comparing one category that is set as a benchmark against which all other categories are compared. The benchmark category is usually referred to as the 'reference' or 'base' category. In order for comparisons to be made the reference category is arbitrarily fixed to zero. Statistical data analysis software usually undertakes formal comparisons of whether or not each level of the categorical variable differs from the reference category. These comparisons generate the well known ‘significance values’ of parameter estimates. Whilst it is straightforward to compare any one category with the reference category, it is more difficult to formally compare two other categories of an explanatory variable with each other when neither is the reference category. This is known as the reference category problem.

References

  1. 1 2 3 Nimon, Kim F.; Oswald, Frederick L. (October 2013). "Understanding the Results of Multiple Linear Regression: Beyond Standardized Regression Coefficients". Organizational Research Methods. 16 (4): 650–674. doi:10.1177/1094428113493929. hdl: 1911/71722 . ISSN   1094-4281. S2CID   55244970.
  2. Nimon, Kim; Reio, Thomas G. (22 June 2011). "Regression Commonality Analysis: A Technique for Quantitative Theory Building". Human Resource Development Review. 10 (3): 329–340. doi:10.1177/1534484311411077. ISSN   1534-4843. S2CID   144437265.
  3. "Commonality analysis: Demonstration of an SPSS solution for regression analysis" (PDF).
  4. Nimon, Kim; Lewis, Mitzi; Kane, Richard; Haynes, R. Michael (May 2008). "An R package to compute commonality coefficients in the multiple regression case: An introduction to the package and a practical example". Behavior Research Methods. 40 (2): 457–466. doi: 10.3758/BRM.40.2.457 . ISSN   1554-351X. PMID   18522056.