Interaction (statistics)

Last updated November 22, 2024

In statistics, an interaction may arise when considering the relationship among three or more variables, and describes a situation in which the effect of one causal variable on an outcome depends on the state of a second causal variable (that is, when effects of the two causes are not additive).^[1]^[2] Although commonly thought of in terms of causal relationships, the concept of an interaction can also describe non-causal associations (then also called moderation or effect modification). Interactions are often considered in the context of regression analyses or factorial experiments.

Introduction
In modeling
In ANOVA
Qualitative and quantitative interactions
Unit treatment additivity
Categorical variables
Designed experiments
Model size
In regression
Interaction plots
Example: Interaction of species and air temperature and their effect on body temperature
Example: effect of stroke severity and treatment on recovery
Hypothesis tests for interactions
Example: Interaction of temperature and time in cookie baking
Examples
See also
References
Further reading
External links

The presence of interactions can have important implications for the interpretation of statistical models. If two variables of interest interact, the relationship between each of the interacting variables and a third "dependent variable" depends on the value of the other interacting variable. In practice, this makes it more difficult to predict the consequences of changing the value of a variable, particularly if the variables it interacts with are hard to measure or difficult to control.

The notion of "interaction" is closely related to that of moderation that is common in social and health science research: the interaction between an explanatory variable and an environmental variable suggests that the effect of the explanatory variable has been moderated or modified by the environmental variable.^[1]

Introduction

An interaction variable or interaction feature is a variable constructed from an original set of variables to try to represent either all of the interaction present or some part of it. In exploratory statistical analyses it is common to use products of original variables as the basis of testing whether interaction is present with the possibility of substituting other more realistic interaction variables at a later stage. When there are more than two explanatory variables, several interaction variables are constructed, with pairwise-products representing pairwise-interactions and higher order products representing higher order interactions.

Thus, for a response Y and two variables x₁ and x₂ an additive model would be:

Y=c+ax_{1}+bx_{2}+{\text{error}}\,

In contrast to this,

Y=c+ax_{1}+bx_{2}+d(x_{1}\times x_{2})+{\text{error}}\,

is an example of a model with an interaction between variables x₁ and x₂ ("error" refers to the random variable whose value is that by which Y differs from the expected value of Y; see errors and residuals in statistics). Often, models are presented without the interaction term $d(x_{1}\times x_{2})$ , but this confounds the main effect and interaction effect (i.e., without specifying the interaction term, it is possible that any main effect found is actually due to an interaction).

Moreover, the hierarchical principle rules that if a model includes interaction between variables, it is also necessary to include the main effects, regardless of their own statistical significance.^[3]

In modeling

In ANOVA

A simple setting in which interactions can arise is a two-factor experiment analyzed using Analysis of Variance (ANOVA). Suppose we have two binary factors A and B. For example, these factors might indicate whether either of two treatments were administered to a patient, with the treatments applied either singly, or in combination. We can then consider the average treatment response (e.g. the symptom levels following treatment) for each patient, as a function of the treatment combination that was administered. The following table shows one possible situation:

	B = 0	B = 1
A = 0	6	7
A = 1	4	5

In this example, there is no interaction between the two treatments — their effects are additive. The reason for this is that the difference in mean response between those subjects receiving treatment A and those not receiving treatment A is −2 regardless of whether treatment B is administered (−2 = 4 − 6) or not (−2 = 5 − 7). Note that it automatically follows that the difference in mean response between those subjects receiving treatment B and those not receiving treatment B is the same regardless of whether treatment A is administered (7 − 6 = 5 − 4).

In contrast, if the following average responses are observed

	B = 0	B = 1
A = 0	1	4
A = 1	7	6

then there is an interaction between the treatments — their effects are not additive. Supposing that greater numbers correspond to a better response, in this situation treatment B is helpful on average if the subject is not also receiving treatment A, but is detrimental on average if given in combination with treatment A. Treatment A is helpful on average regardless of whether treatment B is also administered, but it is more helpful in both absolute and relative terms if given alone, rather than in combination with treatment B. Similar observations are made for this particular example in the next section.

Qualitative and quantitative interactions

In many applications it is useful to distinguish between qualitative and quantitative interactions.^[4] A quantitative interaction between A and B is a situation where the magnitude of the effect of B depends on the value of A, but the direction of the effect of B is constant for all A. A qualitative interaction between A and B refers to a situation where both the magnitude and direction of each variable's effect can depend on the value of the other variable.

The table of means on the left, below, shows a quantitative interaction — treatment A is beneficial both when B is given, and when B is not given, but the benefit is greater when B is not given (i.e. when A is given alone). The table of means on the right shows a qualitative interaction. A is harmful when B is given, but it is beneficial when B is not given. Note that the same interpretation would hold if we consider the benefit of B based on whether A is given.

	B = 0	B = 1							B = 0	B = 1
A = 0	2	1						A = 0	2	6
A = 1	5	3						A = 1	5	3

The distinction between qualitative and quantitative interactions depends on the order in which the variables are considered (in contrast, the property of additivity is invariant to the order of the variables). In the following table, if we focus on the effect of treatment A, there is a quantitative interaction — giving treatment A will improve the outcome on average regardless of whether treatment B is or is not already being given (although the benefit is greater if treatment A is given alone). However, if we focus on the effect of treatment B, there is a qualitative interaction — giving treatment B to a subject who is already receiving treatment A will (on average) make things worse, whereas giving treatment B to a subject who is not receiving treatment A will improve the outcome on average.

	B = 0	B = 1
A = 0	1	4
A = 1	7	6

Unit treatment additivity

In its simplest form, the assumption of treatment unit additivity states that the observed response y_ij from experimental unit i when receiving treatment j can be written as the sum y_ij = y_i + t_j.^[5]^[6]^[7] The assumption of unit treatment additivity implies that every treatment has exactly the same additive effect on each experimental unit. Since any given experimental unit can only undergo one of the treatments, the assumption of unit treatment additivity is a hypothesis that is not directly falsifiable, according to Cox^{[ citation needed ]} and Kempthorne.^{[ citation needed ]}

However, many consequences of treatment-unit additivity can be falsified.^{[ citation needed ]} For a randomized experiment, the assumption of treatment additivity implies that the variance is constant for all treatments. Therefore, by contraposition, a necessary condition for unit treatment additivity is that the variance is constant.^{[ citation needed ]}

The property of unit treatment additivity is not invariant under a change of scale,^{[ citation needed ]} so statisticians often use transformations to achieve unit treatment additivity. If the response variable is expected to follow a parametric family of probability distributions, then the statistician may specify (in the protocol for the experiment or observational study) that the responses be transformed to stabilize the variance.^[8] In many cases, a statistician may specify that logarithmic transforms be applied to the responses, which are believed to follow a multiplicative model.^[6]^[9]

The assumption of unit treatment additivity was enunciated in experimental design by Kempthorne^{[ citation needed ]} and Cox^{[ citation needed ]}. Kempthorne's use of unit treatment additivity and randomization is similar to the design-based analysis of finite population survey sampling.

In recent years, it has become common^{[ citation needed ]} to use the terminology of Donald Rubin, which uses counterfactuals. Suppose we are comparing two groups of people with respect to some attribute y. For example, the first group might consist of people who are given a standard treatment for a medical condition, with the second group consisting of people who receive a new treatment with unknown effect. Taking a "counterfactual" perspective, we can consider an individual whose attribute has value y if that individual belongs to the first group, and whose attribute has value τ(y) if the individual belongs to the second group. The assumption of "unit treatment additivity" is that τ(y) = τ, that is, the "treatment effect" does not depend on y. Since we cannot observe both y and τ(y) for a given individual, this is not testable at the individual level. However, unit treatment additivity implies that the cumulative distribution functions F₁ and F₂ for the two groups satisfy F₂(y) = F₁(y − τ), as long as the assignment of individuals to groups 1 and 2 is independent of all other factors influencing y (i.e. there are no confounders). Lack of unit treatment additivity can be viewed as a form of interaction between the treatment assignment (e.g. to groups 1 or 2), and the baseline, or untreated value of y.

Categorical variables

Sometimes the interacting variables are categorical variables rather than real numbers and the study might then be dealt with as an analysis of variance problem. For example, members of a population may be classified by religion and by occupation. If one wishes to predict a person's height based only on the person's religion and occupation, a simple additive model, i.e., a model without interaction, would add to an overall average height an adjustment for a particular religion and another for a particular occupation. A model with interaction, unlike an additive model, could add a further adjustment for the "interaction" between that religion and that occupation. This example may cause one to suspect that the word interaction is something of a misnomer.

Statistically, the presence of an interaction between categorical variables is generally tested using a form of analysis of variance (ANOVA). If one or more of the variables is continuous in nature, however, it would typically be tested using moderated multiple regression.^[10] This is so-called because a moderator is a variable that affects the strength of a relationship between two other variables.

Designed experiments

Genichi Taguchi contended^[11] that interactions could be eliminated from a system by appropriate choice of response variable and transformation. However George Box and others have argued that this is not the case in general.^[12]

Model size

Given n predictors, the number of terms in a linear model that includes a constant, every predictor, and every possible interaction is ${\tbinom {n}{0}}+{\tbinom {n}{1}}+{\tbinom {n}{2}}+\cdots +{\tbinom {n}{n}}=2^{n}$ . Since this quantity grows exponentially, it readily becomes impractically large. One method to limit the size of the model is to limit the order of interactions. For example, if only two-way interactions are allowed, the number of terms becomes ${\tbinom {n}{0}}+{\tbinom {n}{1}}+{\tbinom {n}{2}}=1+{\tfrac {1}{2}}n+{\tfrac {1}{2}}n^{2}$ . The below table shows the number of terms for each number of predictors and maximum order of interaction.

Number of terms
Predictors	Including up to m-way interactions
Predictors	2	3	4	5	∞
1	2	2	2	2	2
2	4	4	4	4	4
3	7	8	8	8	8
4	11	15	16	16	16
5	16	26	31	32	32
6	22	42	57	63	64
7	29	64	99	120	128
8	37	93	163	219	256
9	46	130	256	382	512
10	56	176	386	638	1,024
11	67	232	562	1,024	2,048
12	79	299	794	1,586	4,096
13	92	378	1,093	2,380	8,192
14	106	470	1,471	3,473	16,384
15	121	576	1,941	4,944	32,768
20	211	1,351	6,196	21,700	1,048,576
25	326	2,626	15,276	68,406	33,554,432
50	1,276	20,876	251,176	2,369,936	10¹⁵
100	5,051	166,751	4,087,976	79,375,496	10³⁰
1,000	500,501	166,667,501	10¹⁰	10¹²	10³⁰⁰

In regression

The most general approach to modeling interaction effects involves regression, starting from the elementary version given above:

Y=c+ax_{1}+bx_{2}+d(x_{1}\times x_{2})+{\text{error}}\,

where the interaction term $(x_{1}\times x_{2})$ could be formed explicitly by multiplying two (or more) variables, or implicitly using factorial notation in modern statistical packages such as Stata. The components x₁ and x₂ might be measurements or {0,1} dummy variables in any combination. Interactions involving a dummy variable multiplied by a measurement variable are termed slope dummy variables,^[13] because they estimate and test the difference in slopes between groups 0 and 1.

When measurement variables are employed in interactions, it is often desirable to work with centered versions, where the variable's mean (or some other reasonably central value) is set as zero. Centering can make the main effects in interaction models more interpretable, as it reduces the multicollinearity between the interaction term and the main effects.^[14] The coefficient a in the equation above, for example, represents the effect of x₁ when x₂ equals zero.

Regression approaches to interaction modeling are very general because they can accommodate additional predictors, and many alternative specifications or estimation strategies beyond ordinary least squares. Robust, quantile, and mixed-effects (multilevel) models are among the possibilities, as is generalized linear modeling encompassing a wide range of categorical, ordered, counted or otherwise limited dependent variables. The graph depicts an education*politics interaction, from a probability-weighted logit regression analysis of survey data.^[15]

Interaction plots

Interaction plots, also called simple-slope plots, show possible interactions among variables.

Example: Interaction of species and air temperature and their effect on body temperature

Consider a study of the body temperature of different species at different air temperatures, in degrees Fahrenheit. The data are shown in the table below.

The interaction plot may use either the air temperature or the species as the x axis. The second factor is represented by lines on the interaction plot.

There is an interaction between the two factors (air temperature and species) in their effect on the response (body temperature), because the effect of the air temperature depends on the species. The interaction is indicated on the plot because the lines are not parallel.

Example: effect of stroke severity and treatment on recovery

As a second example, consider a clinical trial on the interaction between stroke severity and the efficacy of a drug on patient survival. The data are shown in the table below.

In the interaction plot, the lines for the mild and moderate stroke groups are parallel, indicating that the drug has the same effect in both groups, so there is no interaction. The line for the severe stroke group is not parallel to the other lines, indicating that there is an interaction between stroke severity and drug effect on survival. The line for the severe stroke group is flat, indicating that, among these patients, there is no difference in survival between the drug and placebo treatments. In contrast, the lines for the mild and moderate stroke groups slope down to the right, indicating that, among these patients, the placebo group has lower survival than drug-treated group.

Hypothesis tests for interactions

Analysis of variance and regression analysis are used to test for significant interactions.

Example: Interaction of temperature and time in cookie baking

Is the yield of good cookies affected by the baking temperature and time in the oven? The table shows data for 8 batches of cookies.

The data show that the yield of good cookies is best when either (i) temperature is high and time in the oven is short, or (ii) temperature is low and time in the oven is long. If the cookies are left in the oven for a long time at a high temperature, there are burnt cookies and the yield is low.

From the graph and the data, it is clear that the lines are not parallel, indicating that there is an interaction. This can be tested using analysis of variance (ANOVA). The first ANOVA model will not include the interaction term. That is, the first ANOVA model ignores possible interaction. The second ANOVA model will include the interaction term. That is, the second ANOVA model explicitly performs a hypothesis test for interaction.

ANOVA model 1: no interaction term; yield ~ temperature + time

In the ANOVA model that ignores interaction, neither temperature nor time has a significant effect on yield (p=0.91), which is clearly the incorrect conclusion. The more appropriate ANOVA model should test for possible interaction.

ANOVA model 2: include interaction term; yield ~ temperature * time

The temperature:time interaction term is significant (p=0.000180). Based on the interaction test and the interaction plot, it appears that the effect of time on yield depends on temperature and vice versa.

Examples

Real-world examples of interaction include:

Interaction between adding sugar to coffee and stirring the coffee. Neither of the two individual variables has much effect on sweetness but a combination of the two does.
Interaction between adding carbon to steel and quenching. Neither of the two individually has much effect on strength but a combination of the two has a dramatic effect.
Interaction between smoking and inhaling asbestos fibres: Both raise lung carcinoma risk, but exposure to asbestos multiplies the cancer risk in smokers and non-smokers. Here, the joint effect of inhaling asbestos and smoking is higher than the sum of both effects.^[16]
Interaction between genetic risk factors for type 2 diabetes and diet (specifically, a "western" dietary pattern). The western dietary pattern was shown to increase diabetes risk for subjects with a high "genetic risk score", but not for other subjects.^[17]
Interaction between education and political orientation, affecting general-public perceptions about climate change. For example, US surveys often find that acceptance of the reality of anthropogenic climate change rises with education among moderate or liberal survey respondents, but declines with education among the most conservative.^[18]^[19] Similar interactions have been observed to affect some non-climate science or environmental perceptions,^[20] and to operate with science literacy or other knowledge indicators in place of education.^[21]^[22]

Related Research Articles

Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures used to analyze the differences among means. ANOVA was developed by the statistician Ronald Fisher. ANOVA is based on the law of total variance, where the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether two or more population means are equal, and therefore generalizes the t-test beyond two means. In other words, the ANOVA is used to test the difference between two or more means.

Heritability is a statistic used in the fields of breeding and genetics that estimates the degree of variation in a phenotypic trait in a population that is due to genetic variation between individuals in that population. The concept of heritability can be expressed in the form of the following question: "What is the proportion of the variation in a given trait within a population that is not explained by the environment or random chance?"

An F-test is any statistical test used to compare the variances of two samples or the ratio of variances between multiple samples. The test statistic, random variable F, is used to determine if the tested data has an F-distribution under the true null hypothesis, and true customary assumptions about the error term (ε). It is most often used when comparing statistical models that have been fitted to a data set, in order to identify the model that best fits the population from which the data were sampled. Exact "F-tests" mainly arise when the models have been fitted to the data using least squares. The name was coined by George W. Snedecor, in honour of Ronald Fisher. Fisher initially developed the statistic as the variance ratio in the 1920s.

Analysis of covariance (ANCOVA) is a general linear model that blends ANOVA and regression. ANCOVA evaluates whether the means of a dependent variable (DV) are equal across levels of one or more categorical independent variables (IV) and across one or more continuous variables. For example, the categorical variable(s) might describe treatment and the continuous variable(s) might be covariates (CV)'s, typically nuisance variables; or vice versa. Mathematically, ANCOVA decomposes the variance in the DV into variance explained by the CV(s), variance explained by the categorical IV, and residual variance. Intuitively, ANCOVA can be thought of as 'adjusting' the DV by the group means of the CV(s).

Linear trend estimation is a statistical technique used to analyze data patterns. Data patterns, or trends, occur when the information gathered tends to increase or decrease over time or is influenced by changes in an external factor. Linear trend estimation essentially creates a straight line on a graph of data that models the general direction that the data is heading.

In the statistical theory of the design of experiments, blocking is the arranging of experimental units that are similar to one another in groups (blocks) based on one or more variables. These variables are chosen carefully to minimize the impact of their variability on the observed outcomes. There are different ways that blocking can be implemented, resulting in different confounding effects. However, the different methods share the same purpose: to control variability introduced by specific factors that could influence the outcome of an experiment. The roots of blocking originated from the statistician, Ronald Fisher, following his development of ANOVA.

In statistics, a full factorial experiment is an experiment whose design consists of two or more factors, each with discrete possible values or "levels", and whose experimental units take on all possible combinations of these levels across all such factors. A full factorial design may also be called a fully crossed design. Such an experiment allows the investigator to study the effect of each factor on the response variable, as well as the effects of interactions between factors on the response variable.

Multilevel models are statistical models of parameters that vary at more than one level. An example could be a model of student performance that contains measures for individual students as well as measures for classrooms within which the students are grouped. These models can be seen as generalizations of linear models, although they can also extend to non-linear models. These models became much more popular after sufficient computing power and software became available.

A mixed model, mixed-effects model or mixed error-component model is a statistical model containing both fixed effects and random effects. These models are useful in a wide variety of disciplines in the physical, biological and social sciences. They are particularly useful in settings where repeated measurements are made on the same statistical units, or where measurements are made on clusters of related statistical units. Mixed models are often preferred over traditional analysis of variance regression models because they don't rely on the independent observations assumption. Further, they have their flexibility in dealing with missing values and uneven spacing of repeated measurements. The Mixed model analysis allows measurements to be explicitly modeled in a wider variety of correlation and variance-covariance avoiding biased estimations structures.

In science, randomized experiments are the experiments that allow the greatest reliability and validity of statistical estimates of treatment effects. Randomization-based inference is especially important in experimental design and in survey sampling.

In statistics, a mediation model seeks to identify and explain the mechanism or process that underlies an observed relationship between an independent variable and a dependent variable via the inclusion of a third hypothetical variable, known as a mediator variable. Rather than a direct causal relationship between the independent variable and the dependent variable, a mediation model proposes that the independent variable influences the mediator variable, which in turn influences the dependent variable. Thus, the mediator variable serves to clarify the nature of the causal relationship between the independent and dependent variables.

In computational biology and bioinformatics, analysis of variance – simultaneous component analysis is a method that partitions variation and enables interpretation of these partitions by SCA, a method that is similar to principal components analysis (PCA). Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures used to analyze differences. Statistical coupling analysis (SCA) is a technique used in bioinformatics to measure covariation between pairs of amino acids in a protein multiple sequence alignment (MSA).

In statistics, one-way analysis of variance is a technique to compare whether two or more samples' means are significantly different. This analysis of variance technique requires a numeric response variable "Y" and a single explanatory variable "X", hence "one-way".

Repeated measures design is a research design that involves multiple measures of the same variable taken on the same or matched subjects either under different conditions or over two or more time periods. For instance, repeated measurements are collected in a longitudinal study in which change over time is assessed.

A glossary of terms used in experimental research.

In statistics and regression analysis, moderation occurs when the relationship between two variables depends on a third variable. The third variable is referred to as the moderator variable or simply the moderator. The effect of a moderating variable is characterized statistically as an interaction; that is, a categorical or continuous variable that is associated with the direction and/or magnitude of the relation between dependent and independent variables. Specifically within a correlational analysis framework, a moderator is a third variable that affects the zero-order correlation between two other variables, or the value of the slope of the dependent variable on the independent variable. In analysis of variance (ANOVA) terms, a basic moderator effect can be represented as an interaction between a focal independent variable and a factor that specifies the appropriate conditions for its operation.

In statistics, Tukey's test of additivity, named for John Tukey, is an approach used in two-way ANOVA to assess whether the factor variables are additively related to the expected value of the response variable. It can be applied when there are no replicated values in the data set, a situation in which it is impossible to directly estimate a fully general non-additive regression structure and still have information left to estimate the error variance. The test statistic proposed by Tukey has one degree of freedom under the null hypothesis, hence this is often called "Tukey's one-degree-of-freedom test."

In randomized statistical experiments, generalized randomized block designs (GRBDs) are used to study the interaction between blocks and treatments. For a GRBD, each treatment is replicated at least two times in each block; this replication allows the estimation and testing of an interaction term in the linear model.

In statistics, the two-way analysis of variance (ANOVA) is an extension of the one-way ANOVA that examines the influence of two different categorical independent variables on one continuous dependent variable. The two-way ANOVA not only aims at assessing the main effect of each independent variable but also if there is any interaction between them.

In statistics, linear regression is a model that estimates the linear relationship between a scalar response and one or more explanatory variables. A model with exactly one explanatory variable is a simple linear regression; a model with two or more explanatory variables is a multiple linear regression. This term is distinct from multivariate linear regression, which predicts multiple correlated dependent variables rather than a single dependent variable.

References

1 2 Dodge, Y. (2003). The Oxford Dictionary of Statistical Terms . Oxford University Press. ISBN 978-0-19-920613-1.
↑ Cox, D.R. (1984). "Interaction". International Statistical Review. 52 (1): 1–25. doi:10.2307/1403235. JSTOR 1403235.
↑ James, Gareth; Witten, Daniela; Hastie, Trevor; Tibshirani, Robert (2021). An introduction to statistical learning: with applications in R (Second ed.). New York, NY: Springer. p. 103. ISBN 978-1-0716-1418-1 . Retrieved 29 October 2024.
↑ Peto, D. P. (1982). "Statistical aspects of cancer trials". Treatment of Cancer (First ed.). London: Chapman and Hall. ISBN 0-412-21850-X.
↑ Kempthorne, Oscar (1979). The Design and Analysis of Experiments (Corrected reprint of (1952) Wiley ed.). Robert E. Krieger. ISBN 978-0-88275-105-4.
1 2 Cox, David R. (1958). Planning of experiments. Wiley. Chapter 2. ISBN 0-471-57429-5.
↑ Hinkelmann, Klaus and Kempthorne, Oscar (2008). Design and Analysis of Experiments, Volume I: Introduction to Experimental Design (Second ed.). Wiley. Chapters 5-6. ISBN 978-0-471-72756-9.{{cite book}}: CS1 maint: multiple names: authors list (link)
↑ Hinkelmann, Klaus and Kempthorne, Oscar (2008). Design and Analysis of Experiments, Volume I: Introduction to Experimental Design (Second ed.). Wiley. Chapters 7-8. ISBN 978-0-471-72756-9.{{cite book}}: CS1 maint: multiple names: authors list (link)
↑ Bailey, R. A. (2008). Design of Comparative Experiments. Cambridge University Press. ISBN 978-0-521-68357-9. Pre-publication chapters are available on-line.
↑ Overton, R. C. (2001). "Moderated multiple regression for interactions involving categorical variables: a statistical control for heterogeneous variance across two groups". Psychol Methods. 6 (3): 218–33. doi:10.1037/1082-989X.6.3.218. PMID 11570229.
↑ "Design of Experiments - Taguchi Experiments". www.qualitytrainingportal.com. Retrieved 2015-11-27.
↑ George E. P. Box (1990). "Do interactions matter?" (PDF). Quality Engineering. 2: 365–369. doi:10.1080/08982119008962728. Archived from the original (PDF) on 2010-06-10. Retrieved 2009-07-28.
↑ Hamilton, L.C. 1992. Regression with Graphics: A Second Course in Applied Statistics. Pacific Grove, CA: Brooks/Cole. ISBN 978-0534159009
↑ Iacobucci, Dawn; Schneider, Matthew J.; Popovich, Deidre L.; Bakamitsos, Georgios A. (2016). "Mean centering helps alleviate "micro" but not "macro" multicollinearity". Behavior Research Methods. 48 (4): 1308–1317. doi: 10.3758/s13428-015-0624-x . ISSN 1554-3528. PMID 26148824.
↑ Hamilton, L.C.; Saito, K. (2015). "A four-party view of U.S. environmental concern". Environmental Politics. 24 (2): 212–227. Bibcode:2015EnvPo..24..212H. doi:10.1080/09644016.2014.976485. S2CID 154762226.
↑ Lee, P. N. (2001). "Relation between exposure to asbestos and smoking jointly and the risk of lung cancer". Occupational and Environmental Medicine. 58 (3): 145–53. doi:10.1136/oem.58.3.145. PMC 1740104 . PMID 11171926.
↑ Lu, Q.; et al. (2009). "Genetic predisposition, Western dietary pattern, and the risk of type 2 diabetes in men". Am J Clin Nutr. 89 (5): 1453–1458. doi:10.3945/ajcn.2008.27249. PMC 2676999 . PMID 19279076.
↑ Hamilton, L.C. (2011). "Education, politics and opinions about climate change: Evidence for interaction effects". Climatic Change . 104 (2): 231–242. Bibcode:2011ClCh..104..231H. doi:10.1007/s10584-010-9957-8. S2CID 16481640.
↑ McCright, A. M. (2011). "Political orientation moderates Americans' beliefs and concern about climate change". Climatic Change . 104 (2): 243–253. Bibcode:2011ClCh..104..243M. doi:10.1007/s10584-010-9946-y. S2CID 152795205.
↑ Hamilton, Lawrence C.; Saito, Kei (2015). "A four-party view of US environmental concern". Environmental Politics. 24 (2): 212–227. Bibcode:2015EnvPo..24..212H. doi:10.1080/09644016.2014.976485. S2CID 154762226.
↑ Kahan, D.M.; Jenkins-Smith, H.; Braman, D. (2011). "Cultural cognition of scientific consensus". Journal of Risk Research. 14 (2): 147–174. doi:10.1080/13669877.2010.511246. hdl: 10.1080/13669877.2010.511246 . S2CID 216092368.
↑ Hamilton, L.C.; Cutler, M.J.; Schaefer, A. (2012). "Public knowledge and concern about polar-region warming". Polar Geography . 35 (2): 155–168. Bibcode:2012PolGe..35..155H. doi:10.1080/1088937X.2012.684155. S2CID 12437794.

External links

"Using Indicator and Interaction Variables" (PDF). Archived from the original (PDF) on 2016-03-03. Retrieved 2010-02-03. (158 KiB)
Credibility and the Statistical Interaction Variable: Speaking Up for Multiplication as a Source of Understanding
Fundamentals of Statistical Interactions: What is the difference between "main effects" and "interaction effects"?

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Dodge-1] 1 2 Dodge, Y. (2003). The Oxford Dictionary of Statistical Terms . Oxford University Press. ISBN 978-0-19-920613-1.

[2] Cox, D.R. (1984). "Interaction". International Statistical Review. 52 (1): 1–25. doi:10.2307/1403235. JSTOR 1403235.

[3] James, Gareth; Witten, Daniela; Hastie, Trevor; Tibshirani, Robert (2021). An introduction to statistical learning: with applications in R (Second ed.). New York, NY: Springer. p. 103. ISBN 978-1-0716-1418-1 . Retrieved 29 October 2024.

[4] Peto, D. P. (1982). "Statistical aspects of cancer trials". Treatment of Cancer (First ed.). London: Chapman and Hall. ISBN 0-412-21850-X.

[Kempthorne_(1979)-5] Kempthorne, Oscar (1979). The Design and Analysis of Experiments (Corrected reprint of (1952) Wiley ed.). Robert E. Krieger. ISBN 978-0-88275-105-4.

[Cox1958_2-6] 1 2 Cox, David R. (1958). Planning of experiments. Wiley. Chapter 2. ISBN 0-471-57429-5.

[7] Hinkelmann, Klaus and Kempthorne, Oscar (2008). Design and Analysis of Experiments, Volume I: Introduction to Experimental Design (Second ed.). Wiley. Chapters 5-6. ISBN 978-0-471-72756-9.{{cite book}}: CS1 maint: multiple names: authors list (link)

[8] Hinkelmann, Klaus and Kempthorne, Oscar (2008). Design and Analysis of Experiments, Volume I: Introduction to Experimental Design (Second ed.). Wiley. Chapters 7-8. ISBN 978-0-471-72756-9.{{cite book}}: CS1 maint: multiple names: authors list (link)

[Bailey_on_eelworms-9] Bailey, R. A. (2008). Design of Comparative Experiments. Cambridge University Press. ISBN 978-0-521-68357-9. Pre-publication chapters are available on-line.

[Overton2001-10] Overton, R. C. (2001). "Moderated multiple regression for interactions involving categorical variables: a statistical control for heterogeneous variance across two groups". Psychol Methods. 6 (3): 218–33. doi:10.1037/1082-989X.6.3.218. PMID 11570229.

[11] "Design of Experiments - Taguchi Experiments". www.qualitytrainingportal.com. Retrieved 2015-11-27.

[12] George E. P. Box (1990). "Do interactions matter?" (PDF). Quality Engineering. 2: 365–369. doi:10.1080/08982119008962728. Archived from the original (PDF) on 2010-06-10. Retrieved 2009-07-28.

[13] Hamilton, L.C. 1992. Regression with Graphics: A Second Course in Applied Statistics. Pacific Grove, CA: Brooks/Cole. ISBN 978-0534159009

[14] Iacobucci, Dawn; Schneider, Matthew J.; Popovich, Deidre L.; Bakamitsos, Georgios A. (2016). "Mean centering helps alleviate "micro" but not "macro" multicollinearity". Behavior Research Methods. 48 (4): 1308–1317. doi: 10.3758/s13428-015-0624-x . ISSN 1554-3528. PMID 26148824.

[15] Hamilton, L.C.; Saito, K. (2015). "A four-party view of U.S. environmental concern". Environmental Politics. 24 (2): 212–227. Bibcode:2015EnvPo..24..212H. doi:10.1080/09644016.2014.976485. S2CID 154762226.

[16] Lee, P. N. (2001). "Relation between exposure to asbestos and smoking jointly and the risk of lung cancer". Occupational and Environmental Medicine. 58 (3): 145–53. doi:10.1136/oem.58.3.145. PMC 1740104 . PMID 11171926.

[17] Lu, Q.; et al. (2009). "Genetic predisposition, Western dietary pattern, and the risk of type 2 diabetes in men". Am J Clin Nutr. 89 (5): 1453–1458. doi:10.3945/ajcn.2008.27249. PMC 2676999 . PMID 19279076.

[18] Hamilton, L.C. (2011). "Education, politics and opinions about climate change: Evidence for interaction effects". Climatic Change . 104 (2): 231–242. Bibcode:2011ClCh..104..231H. doi:10.1007/s10584-010-9957-8. S2CID 16481640.

[19] McCright, A. M. (2011). "Political orientation moderates Americans' beliefs and concern about climate change". Climatic Change . 104 (2): 243–253. Bibcode:2011ClCh..104..243M. doi:10.1007/s10584-010-9946-y. S2CID 152795205.

[20] Hamilton, Lawrence C.; Saito, Kei (2015). "A four-party view of US environmental concern". Environmental Politics. 24 (2): 212–227. Bibcode:2015EnvPo..24..212H. doi:10.1080/09644016.2014.976485. S2CID 154762226.

[21] Kahan, D.M.; Jenkins-Smith, H.; Braman, D. (2011). "Cultural cognition of scientific consensus". Journal of Risk Research. 14 (2): 147–174. doi:10.1080/13669877.2010.511246. hdl: 10.1080/13669877.2010.511246 . S2CID 216092368.

[22] Hamilton, L.C.; Cutler, M.J.; Schaefer, A. (2012). "Public knowledge and concern about polar-region warming". Polar Geography . 35 (2): 155–168. Bibcode:2012PolGe..35..155H. doi:10.1080/1088937X.2012.684155. S2CID 12437794.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

v t e Design of experiments
Scientific method	Scientific experiment Statistical design Control Internal and external validity Experimental unit Blinding Optimal design : Bayesian Random assignment Randomization Restricted randomization Replication versus subsampling Sample size
Treatment and blocking	Treatment Effect size Contrast Interaction Confounding Orthogonality Blocking Covariate Nuisance variable
Models and inference	Linear regression Ordinary least squares Bayesian Random effect Mixed model Hierarchical model: Bayesian Analysis of variance (Anova) Cochran's theorem Manova (multivariate) Ancova (covariance) Compare means Multiple comparison
Designs Completely randomized	Factorial Fractional factorial Plackett–Burman Taguchi Response surface methodology Polynomial and rational modeling Box–Behnken Central composite Block Generalized randomized block design (GRBD) Latin square Graeco-Latin square Orthogonal array Latin hypercube Repeated measures design Crossover study Randomized controlled trial Sequential analysis Sequential probability ratio test
Glossary Category Mathematicsportal Statistical outline Statistical topics