Main effect

Last updated

In the design of experiments and analysis of variance, a main effect is the effect of an independent variable on a dependent variable averaged across the levels of any other independent variables. The term is frequently used in the context of factorial designs and regression models to distinguish main effects from interaction effects.

Contents

Relative to a factorial design, under an analysis of variance, a main effect test will test the hypotheses expected such as H0, the null hypothesis. Running a hypothesis for a main effect will test whether there is evidence of an effect of different treatments. However a main effect test is nonspecific and will not allow for a localization of specific mean pairwise comparisons (simple effects). A main effect test will merely look at whether overall there is something about a particular factor that is making a difference. In other words, it is a test examining differences amongst the levels of a single factor (averaging over the other factor and/or factors). Main effects are essentially the overall effect of a factor.

Definition

A factor averaged over all other levels of the effects of other factors is termed as main effect (also known as marginal effect). The contrast of a factor between levels over all levels of other factors is the main effect. The difference between the marginal means of all the levels of a factor is the main effect of the response variable on that factor . [1] Main effects are the primary independent variables or factors tested in the experiment. [2] Main effect is the specific effect of a factor or independent variable regardless of other parameters in the experiment. [3] In design of experiment, it is referred to as a factor but in regression analysis it is referred to as the independent variable.

Estimating Main Effects

In factorial designs, thus two levels each of factor A and B in a factorial design, the main effects of two factors say A and B be can be calculated. The main effect of A is given by

The main effect of B is given by

Where n is total number of replicates. We use factor level 1 to denote the low level, and level 2 to denote the high level. The letter "a" represent the factor combination of level 2 of A and level 1 of B and "b" represents the factor combination of level 1 of A and level 2 of B. "ab" is the represents both factors at level 2. Finally, 1 represents when both factors are set to level 1. [2]

Hypothesis Testing for Two Way Factorial Design.

Consider a two-way factorial design in which factor A has 3 levels and factor B has 2 levels with only 1 replicate. There are 6 treatments with 5 degrees of freedom. in this example, we have two null hypotheses. The first for Factor A is: and the second for Factor B is: . [4] The main effect for factor A can be computed with 2 degrees of freedom. This variation is summarized by the sum of squares denoted by the term SSA. Likewise the variation from factor B can be computed as SSB with 1 degree of freedom. The expected value for the mean of the responses in column i is while the expected value for the mean of the responses in row j is where i corresponds to the level of factor in factor A and j corresponds to the level of factor in factor B. and are main effects. SSA and SSB are main-effects sums of squares. The two remaining degrees of freedom can be used to describe the variation that comes from the interaction between the two factors and can be denoted as SSAB. [4] A table can show the layout of this particular design with the main effects (where is the observation of the ith level of factor B and the jth level of factor A):

3x2 Factorial Experiment
Factor/Levels

Example

Take a factorial design (2 levels of two factors) testing the taste ranking of fried chicken at two fast food restaurants. Let taste testers rank the chicken from 1 to 10 (best tasting), for factor X: "spiciness" and factor Y: "crispiness." Level X1 is for "not spicy" chicken and X2 is for "spicy" chicken. Level Y1 is for "not crispy" and level Y2 is for "crispy" chicken. Suppose that five people (5 replicates) tasted all four kinds of chicken and gave a ranking of 1-10 for each. The hypotheses of interest would be: Factor X is: and for Factor Y is: . The table of hypothetical results is given here:

(Replicates)
Factor CombinationIIIIIIIVVTotal
Not Spicy, Not Crispy (X1,Y1)3261921
Not Spicy, Crispy (X1, Y2)7242823
Spicy, Not Crispy (X2, Y1)5561825
Spicy, Crispy (X2, Y2)91086841

The "Main Effect" of X (spiciness) when we are at Y1 (not crunchy) is given as:

where n is the number of replicates. Likewise, the "Main Effect" of X at Y2 (crunchy) is given as:

, upon which we can take the simple average of these two to determine the overall main effect of the Factor X, which results as the above

formula, written here as:

=

Likewise, for Y, the overall main effect will be: [5]

=

For the Chicken tasting experiment, we would have the resulting main effects:

Related Research Articles

Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures used to analyze the differences among means. ANOVA was developed by the statistician Ronald Fisher. ANOVA is based on the law of total variance, where the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether two or more population means are equal, and therefore generalizes the t-test beyond two means.

Design of experiments Design of tasks set to uncover from

The design of experiments is the design of any task that aims to describe and explain the variation of information under conditions that are hypothesized to reflect the variation. The term is generally associated with experiments in which the design introduces conditions that directly affect the variation, but may also refer to the design of quasi-experiments, in which natural conditions that influence the variation are selected for observation.

Pareto distribution Probability distribution

The Pareto distribution, named after the Italian civil engineer, economist, and sociologist Vilfredo Pareto,, is a power-law probability distribution that is used in description of social, quality control, scientific, geophysical, actuarial, and many other types of observable phenomena. Originally applied to describing the distribution of wealth in a society, fitting the trend that a large portion of wealth is held by a small fraction of the population. The Pareto principle or "80-20 rule" stating that 80% of outcomes are due to 20% of causes was named in honour of Pareto, but the concepts are distinct, and only Pareto distributions with shape value of log45 ≈ 1.16 precisely reflect it. Empirical observation has shown that this 80-20 distribution fits a wide range of cases, including natural phenomena and human activities.

The statistical power of a binary hypothesis test is the probability that the test correctly rejects the null hypothesis when a specific alternative hypothesis is true. It is commonly denoted by , and represents the chances of a "true positive" detection conditional on the actual existence of an effect to detect. Statistical power ranges from 0 to 1, and as the power of a test increases, the probability of making a type II error by wrongly failing to reject the null hypothesis decreases.

Analysis of covariance (ANCOVA) is a general linear model which blends ANOVA and regression. ANCOVA evaluates whether the means of a dependent variable (DV) are equal across levels of a categorical independent variable (IV) often called a treatment, while statistically controlling for the effects of other continuous variables that are not of primary interest, known as covariates (CV) or nuisance variables. Mathematically, ANCOVA decomposes the variance in the DV into variance explained by the CV(s), variance explained by the categorical IV, and residual variance. Intuitively, ANCOVA can be thought of as 'adjusting' the DV by the group means of the CV(s).

Dependent and independent variables Concept in mathematical modeling, statistical modeling and experimental sciences

Dependent and Independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or demand that they depend, by some law or rule, on the values of other variables. Independent variables, in turn, are not seen as depending on any other variable in the scope of the experiment in question. In this sense, some common independent variables are time, space, density, mass, fluid flow rate, and previous values of some observed value of interest to predict future values.

Interaction (statistics)

In statistics, an interaction may arise when considering the relationship among three or more variables, and describes a situation in which the effect of one causal variable on an outcome depends on the state of a second causal variable. Although commonly thought of in terms of causal relationships, the concept of an interaction can also describe non-causal associations. Interactions are often considered in the context of regression analyses or factorial experiments.

In probability theory, the factorial moment is a mathematical quantity defined as the expectation or average of the falling factorial of a random variable. Factorial moments are useful for studying non-negative integer-valued random variables, and arise in the use of probability-generating functions to derive the moments of discrete random variables.

Factorial experiment Experimental design in statistics

In statistics, a full factorial experiment is an experiment whose design consists of two or more factors, each with discrete possible values or "levels", and whose experimental units take on all possible combinations of these levels across all such factors. A full factorial design may also be called a fully crossed design. Such an experiment allows the investigator to study the effect of each factor on the response variable, as well as the effects of interactions between factors on the response variable.

In statistics, a fixed effects model is a statistical model in which the model parameters are fixed or non-random quantities. This is in contrast to random effects models and mixed models in which all or some of the model parameters are random variables. In many applications including econometrics and biostatistics a fixed effects model refers to a regression model in which the group means are fixed (non-random) as opposed to a random effects model in which the group means are a random sample from a population. Generally, data can be grouped according to several observed factors. The group means could be modeled as fixed or random effects for each grouping. In a fixed effects model each group mean is a group-specific fixed quantity.

In statistics, a central composite design is an experimental design, useful in response surface methodology, for building a second order (quadratic) model for the response variable without needing to use a complete three-level factorial experiment.

In statistics, fractional factorial designs are experimental designs consisting of a carefully chosen subset (fraction) of the experimental runs of a full factorial design. The subset is chosen so as to exploit the sparsity-of-effects principle to expose information about the most important features of the problem studied, while using a fraction of the effort of a full factorial design in terms of experimental runs and resources. In other words, it makes use of the fact that many experiments in full factorial design are often redundant, giving little or no new information about the system.

In statistics, one-way analysis of variance is a technique that can be used to compare whether two samples means are significantly different or not. This technique can be used only for numerical response data, the "Y", usually one variable, and numerical or (usually) categorical input data, the "X", always one variable, hence "one-way".

Tukey's range test, also known as Tukey's test, Tukey method, Tukey's honest significance test, or Tukey's HSDtest, is a single-step multiple comparison procedure and statistical test. It can be used to find means that are significantly different from each other.

In statistics, a Yates analysis is an approach to analyzing data obtained from a designed experiment, where a factorial design has been used. Full- and fractional-factorial designs are common in designed experiments for engineering and scientific applications. In these designs, each factor is assigned two levels. These are typically called the low and high levels. For computational purposes, the factors are scaled so that the low level is assigned a value of -1 and the high level is assigned a value of +1. These are also commonly referred to as "-" and "+".

In the design of experiments, completely randomized designs are for studying the effects of one primary factor without the need to take other nuisance variables into account. This article describes completely randomized designs that have one primary factor. The experiment compares the values of a response variable based on the different levels of that primary factor. For completely randomized designs, the levels of the primary factor are randomly assigned to the experimental units.

The following is a glossary of terms. It is not intended to be all-inclusive.

In statistics, a paired difference test is a type of location test that is used when comparing two sets of measurements to assess whether their population means differ. A paired difference test uses additional information about the sample that is not present in an ordinary unpaired testing situation, either to increase the statistical power, or to reduce the effects of confounders.

In statistics, Tukey's test of additivity, named for John Tukey, is an approach used in two-way ANOVA to assess whether the factor variables are additively related to the expected value of the response variable. It can be applied when there are no replicated values in the data set, a situation in which it is impossible to directly estimate a fully general non-additive regression structure and still have information left to estimate the error variance. The test statistic proposed by Tukey has one degree of freedom under the null hypothesis, hence this is often called "Tukey's one-degree-of-freedom test."

In statistics, the two-way analysis of variance (ANOVA) is an extension of the one-way ANOVA that examines the influence of two different categorical independent variables on one continuous dependent variable. The two-way ANOVA not only aims at assessing the main effect of each independent variable but also if there is any interaction between them.

References

  1. Kuehl, Robert (1999). Design of Experiment: Statistical Principles of Research Design and Analysis. Cengage Learning. p. 178. ISBN   9780534368340.
  2. 1 2 Montgomery, Douglas C. (1976). Design and Analysis of Experiments. Wiley, 1976. p. 180. ISBN   9780471614210.
  3. kotz, johnson (2005). encyclopedia of statistical sciences. p. 181. ISBN   978-0-471-15044-2.
  4. 1 2 Oehlert, Gary (2010). A First Course in Design and Analysis of Experiments. p. 181. ISBN   0-7167-3510-5.
  5. Montgomery, Douglas (2005). DESIGN AND ANALYSIS OF EXPERIMENTS. 6th: Wiley and Sons. pp. 205–206.{{cite book}}: CS1 maint: location (link)