# Factorial experiment

Last updated

In statistics, a full factorial experiment is an experiment whose design consists of two or more factors, each with discrete possible values or "levels", and whose experimental units take on all possible combinations of these levels across all such factors. A full factorial design may also be called a fully crossed design. Such an experiment allows the investigator to study the effect of each factor on the response variable, as well as the effects of interactions between factors on the response variable.

## Contents

For the vast majority of factorial experiments, each factor has only two levels. For example, with two factors each taking two levels, a factorial experiment would have four treatment combinations in total, and is usually called a 2×2 factorial design. In such a design, the interaction between the variables is often the most important. This applies even to scenarios where a main effect and an interaction is present.

If the number of combinations in a full factorial design is too high to be logistically feasible, a fractional factorial design may be done, in which some of the possible combinations (usually at least half) are omitted.

## History

Factorial designs were used in the 19th century by John Bennet Lawes and Joseph Henry Gilbert of the Rothamsted Experimental Station. [1]

Ronald Fisher argued in 1926 that "complex" designs (such as factorial designs) were more efficient than studying one factor at a time. [2] Fisher wrote,

"No aphorism is more frequently repeated in connection with field trials, than that we must ask Nature few questions, or, ideally, one question, at a time. The writer is convinced that this view is wholly mistaken."

Nature, he suggests, will best respond to "a logical and carefully thought out questionnaire". A factorial design allows the effect of several factors and even interactions between them to be determined with the same number of trials as are necessary to determine any one of the effects by itself with the same degree of accuracy.

Frank Yates made significant contributions, particularly in the analysis of designs, by the Yates analysis.

The term "factorial" may not have been used in print before 1935, when Fisher used it in his book The Design of Experiments . [3]

Many people examine the effect of only a single factor or variable. Compared to such one-factor-at-a-time (OFAT) experiments, factorial experiments offer several advantages [4] [5]

• Factorial designs are more efficient than OFAT experiments. They provide more information at similar or lower cost. They can find optimal conditions faster than OFAT experiments.
• Factorial designs allow additional factors to be examined at no additional cost.
• When the effect of one factor is different for different levels of another factor, it cannot be detected by an OFAT experiment design. Factorial designs are required to detect such interactions. Use of OFAT when interactions are present can lead to serious misunderstanding of how the response changes with the factors.
• Factorial designs allow the effects of a factor to be estimated at several levels of the other factors, yielding conclusions that are valid over a range of experimental conditions.

### Example of advantages of factorial experiments

In his book, Improving Almost Anything: Ideas and Essays, statistician George Box gives many examples of the benefits of factorial experiments. Here is one. [6] Engineers at the bearing manufacturer SKF wanted to know if changing to a less expensive "cage" design would affect bearing life. The engineers asked Christer Hellstrand, a statistician, for help in designing the experiment. [7]

Box reports the following. "The results were assessed by an accelerated life test. … The runs were expensive because they needed to be made on an actual production line and the experimenters were planning to make four runs with the standard cage and four with the modified cage. Christer asked if there were other factors they would like to test. They said there were, but that making added runs would exceed their budget. Christer showed them how they could test two additional factors "for free" – without increasing the number of runs and without reducing the accuracy of their estimate of the cage effect. In this arrangement, called a 2×2×2 factorial design, each of the three factors would be run at two levels and all the eight possible combinations included. The various combinations can conveniently be shown as the vertices of a cube ... " "In each case, the standard condition is indicated by a minus sign and the modified condition by a plus sign. The factors changed were heat treatment, outer ring osculation, and cage design. The numbers show the relative lengths of lives of the bearings. If you look at [the cube plot], you can see that the choice of cage design did not make a lot of difference. … But, if you average the pairs of numbers for cage design, you get the [table below], which shows what the two other factors did. … It led to the extraordinary discovery that, in this particular application, the life of a bearing can be increased fivefold if the two factor(s) outer ring osculation and inner ring heat treatments are increased together."

Bearing life vs. heat and osculation
Osculation −Osculation +
Heat −1823
Heat +21106

"Remembering that bearings like this one have been made for decades, it is at first surprising that it could take so long to discover so important an improvement. A likely explanation is that, because most engineers have, until recently, employed only one factor at a time experimentation, interaction effects have been missed."

## Example

The simplest factorial experiment contains two levels for each of two factors. Suppose an engineer wishes to study the total power used by each of two different motors, A and B, running at each of two different speeds, 2000 or 3000 RPM. The factorial experiment would consist of four experimental units: motor A at 2000 RPM, motor B at 2000 RPM, motor A at 3000 RPM, and motor B at 3000 RPM. Each combination of a single level selected from every factor is present once.

This experiment is an example of a 22 (or 2×2) factorial experiment, so named because it considers two levels (the base) for each of two factors (the power or superscript), or #levels#factors, producing 22=4 factorial points.

Designs can involve many independent variables. As a further example, the effects of three input variables can be evaluated in eight experimental conditions shown as the corners of a cube.

This can be conducted with or without replication, depending on its intended purpose and available resources. It will provide the effects of the three independent variables on the dependent variable and possible interactions.

## Notation

2×2 factorial experiment
AB
(1)
a+
b+
ab++

The notation used to denote factorial experiments conveys a lot of information. When a design is denoted a 23 factorial, this identifies the number of factors (3); how many levels each factor has (2); and how many experimental conditions there are in the design (23 = 8). Similarly, a 25 design has five factors, each with two levels, and 25 = 32 experimental conditions. Factorial experiments can involve factors with different numbers of levels. A 243 design has five factors, four with two levels and one with three levels, and has 16 × 3 = 48 experimental conditions. [8]

To save space, the points in a two-level factorial experiment are often abbreviated with strings of plus and minus signs. The strings have as many symbols as factors, and their values dictate the level of each factor: conventionally, ${\displaystyle -}$ for the first (or low) level, and ${\displaystyle +}$ for the second (or high) level. The points in this experiment can thus be represented as ${\displaystyle --}$, ${\displaystyle +-}$, ${\displaystyle -+}$, and ${\displaystyle ++}$.

The factorial points can also be abbreviated by (1), a, b, and ab, where the presence of a letter indicates that the specified factor is at its high (or second) level and the absence of a letter indicates that the specified factor is at its low (or first) level (for example, "a" indicates that factor A is on its high setting, while all other factors are at their low (or first) setting). (1) is used to indicate that all factors are at their lowest (or first) values.

## Implementation

For more than two factors, a 2k factorial experiment can usually be recursively designed from a 2k−1 factorial experiment by replicating the 2k−1 experiment, assigning the first replicate to the first (or low) level of the new factor, and the second replicate to the second (or high) level. This framework can be generalized to, e.g., designing three replicates for three level factors, etc.

A factorial experiment allows for estimation of experimental error in two ways. The experiment can be replicated, or the sparsity-of-effects principlecan often be exploited. Replication is more common for small experiments and is a very reliable way of assessing experimental error. When the number of factors is large (typically more than about 5 factors, but this does vary by application), replication of the design can become operationally difficult. In these cases, it is common to only run a single replicate of the design, and to assume that factor interactions of more than a certain order (say, between three or more factors) are negligible. Under this assumption, estimates of such high order interactions are estimates of an exact zero, thus really an estimate of experimental error.

When there are many factors, many experimental runs will be necessary, even without replication. For example, experimenting with 10 factors at two levels each produces 210=1024 combinations. At some point this becomes infeasible due to high cost or insufficient resources. In this case, fractional factorial designs may be used.

As with any statistical experiment, the experimental runs in a factorial experiment should be randomized to reduce the impact that bias could have on the experimental results. In practice, this can be a large operational challenge.

Factorial experiments can be used when there are more than two levels of each factor. However, the number of experimental runs required for three-level (or more) factorial designs will be considerably greater than for their two-level counterparts. Factorial designs are therefore less attractive if a researcher wishes to consider more than two levels.

## Analysis

A factorial experiment can be analyzed using ANOVA or regression analysis. [9] To compute the main effect of a factor "A", subtract the average response of all experimental runs for which A was at its low (or first) level from the average response of all experimental runs for which A was at its high (or second) level.

Other useful exploratory analysis tools for factorial experiments include main effects plots, interaction plots, Pareto plots, and a normal probability plot of the estimated effects.

When the factors are continuous, two-level factorial designs assume that the effects are linear. If a quadratic effect is expected for a factor, a more complicated experiment should be used, such as a central composite design. Optimization of factors that could have quadratic effects is the primary goal of response surface methodology.

### Analysis example

Montgomery [4] gives the following example of analysis of a factorial experiment:.

An engineer would like to increase the filtration rate (output) of a process to produce a chemical, and to reduce the amount of formaldehyde used in the process. Previous attempts to reduce the formaldehyde have lowered the filtration rate. The current filtration rate is 75 gallons per hour. Four factors are considered: temperature (A), pressure (B), formaldehyde concentration (C), and stirring rate (D). Each of the four factors will be tested at two levels.

Onwards, the minus (−) and plus (+) signs will indicate whether the factor is run at a low or high level, respectively.

Design matrix and resulting filtration rate
ABCDFiltration rate
45
+71
+48
++65
+68
++60
++80
+++65
+43
++100
++45
+++104
++75
+++86
+++70
++++96

The non-parallel lines in the A:C interaction plot indicate that the effect of factor A depends on the level of factor C. A similar results holds for the A:D interaction. The graphs indicate that factor B has little effect on filtration rate. The analysis of variance (ANOVA) including all 4 factors and all possible interaction terms between them yields the coefficient estimates shown in the table below.

ANOVA results
CoefficientsEstimate
Intercept70.063
A10.813
B1.563
C4.938
D7.313
A:B0.063
A:C−9.063
B:C1.188
A:D8.313
B:D−0.188
C:D−0.563
A:B:C0.938
A:B:D2.063
A:C:D−0.813
B:C:D−1.313
A:B:C:D0.688

Because there are 16 observations and 16 coefficients (intercept, main effects, and interactions), p-values cannot be calculated for this model. The coefficient values and the graphs suggest that the important factors are A, C, and D, and the interaction terms A:C and A:D.

The coefficients for A, C, and D are all positive in the ANOVA, which would suggest running the process with all three variables set to the high value. However, the main effect of each variable is the average over the levels of the other variables. The A:C interaction plot above shows that the effect of factor A depends on the level of factor C, and vice versa. Factor A (temperature) has very little effect on filtration rate when factor C is at the + level. But Factor A has a large effect on filtration rate when factor C (formaldehyde) is at the − level. The combination of A at the + level and C at the − level gives the highest filtration rate. This observation indicates how one-factor-at-a-time analyses can miss important interactions. Only by varying both factors A and C at the same time could the engineer discover that the effect of factor A depends on the level of factor C.

The best filtration rate is seen when A and D are at the high level, and C is at the low level. This result also satisfies the objective of reducing formaldehyde (factor C). Because B does not appear to be important, it can be dropped from the model. Performing the ANOVA using factors A, C, and D, and the interaction terms A:C and A:D, gives the result shown in the following table, in which all the terms are significant (p-value < 0.05).

ANOVA results
CoefficientEstimateStandard errort valuep-value
Intercept70.0621.10463.4442.3 × 10−14
A10.8121.1049.7911.9 × 10−6
C4.9381.1044.4711.2 × 10−3
D7.3131.1046.6225.9 × 10−5
A:C−9.0631.104−8.2069.4 × 10−6
A:D8.3121.1047.5272 × 10−5

## Notes

1. Yates, Frank; Mather, Kenneth (1963). "Ronald Aylmer Fisher". Biographical Memoirs of Fellows of the Royal Society . London, England: Royal Society. 9: 91–120. doi:. Archived from the original (PDF) on February 18, 2009.
2. Fisher, Ronald (1926). "The Arrangement of Field Experiments" (PDF). Journal of the Ministry of Agriculture of Great Britain . London, England: Ministry of Agriculture and Fisheries. 33: 503–513.
3. "Earliest Known Uses of Some of the Words of Mathematics (F)". jeff560.tripod.com.
4. Montgomery, Douglas C. (2013). Design and Analysis of Experiments (8th ed.). Hoboken, New Jersey: Wiley. ISBN   978-1119320937.
5. Oehlert, Gary (2000). A First Course in Design and Analysis of Experiments (Revised ed.). New York City: W. H. Freeman and Company. ISBN   978-0716735106.
6. George E.P., Box (2006). Improving Almost Anything: Ideas and Essays (Revised ed.). Hoboken, New Jersey: Wiley. ASIN   B01FKSM9VY.
7. Hellstrand, C.; Oosterhoorn, A. D.; Sherwin, D. J.; Gerson, M. (24 February 1989). "The Necessity of Modern Quality Improvement and Some Experience with its Implementation in the Manufacture of Rolling Bearings [and Discussion]". Philosophical Transactions of the Royal Society. 327 (1596): 529–537. doi:10.1098/rsta.1989.0008.
8. Penn State University College of Health and Human Development (2011-12-22). "Introduction to Factorial Experimental Designs".
9. Cohen, J (1968). "Multiple regression as a general data-analytic system". Psychological Bulletin. 70 (6): 426–443. CiteSeerX  . doi:10.1037/h0026714.

## Related Research Articles

Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures used to analyze the differences among means. ANOVA was developed by the statistician Ronald Fisher. ANOVA is based on the law of total variance, where the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether two or more population means are equal, and therefore generalizes the t-test beyond two means. In other words, the ANOVA is used to test the difference between two or more means.

The design of experiments is the design of any task that aims to describe and explain the variation of information under conditions that are hypothesized to reflect the variation. The term is generally associated with experiments in which the design introduces conditions that directly affect the variation, but may also refer to the design of quasi-experiments, in which natural conditions that influence the variation are selected for observation.

In statistics, an interaction may arise when considering the relationship among three or more variables, and describes a situation in which the effect of one causal variable on an outcome depends on the state of a second causal variable. Although commonly thought of in terms of causal relationships, the concept of an interaction can also describe non-causal associations. Interactions are often considered in the context of regression analyses or factorial experiments.

The one-factor-at-a-time method, also known as one-variable-at-a-time, OFAT, OF@T, OFaaT, OVAT, OV@T, OVaaT, or monothetic analysis is a method of designing experiments involving the testing of factors, or causes, one at a time instead of multiple factors simultaneously.

In the statistical theory of the design of experiments, blocking is the arranging of experimental units in groups (blocks) that are similar to one another. Blocking can be used to tackle the problem of pseudoreplication.

In statistics, a central composite design is an experimental design, useful in response surface methodology, for building a second order (quadratic) model for the response variable without needing to use a complete three-level factorial experiment.

In statistics, fractional factorial designs are experimental designs consisting of a carefully chosen subset (fraction) of the experimental runs of a full factorial design. The subset is chosen so as to exploit the sparsity-of-effects principle to expose information about the most important features of the problem studied, while using a fraction of the effort of a full factorial design in terms of experimental runs and resources. In other words, it makes use of the fact that many experiments in full factorial design are often redundant, giving little or no new information about the system.

In the design of experiments and analysis of variance, a main effect is the effect of an independent variable on a dependent variable averaged across the levels of any other independent variables. The term is frequently used in the context of factorial designs and regression models to distinguish main effects from interaction effects.

Plackett–Burman designs are experimental designs presented in 1946 by Robin L. Plackett and J. P. Burman while working in the British Ministry of Supply. Their goal was to find experimental designs for investigating the dependence of some measured quantity on a number of independent variables (factors), each taking L levels, in such a way as to minimize the variance of the estimates of these dependencies using a limited number of experiments. Interactions between the factors were considered negligible. The solution to this problem is to find an experimental design where each combination of levels for any pair of factors appears the same number of times, throughout all the experimental runs. A complete factorial design would satisfy this criterion, but the idea was to find smaller designs.

In computational biology and bioinformatics, analysis of variance – simultaneous component analysis is a method that partitions variation and enables interpretation of these partitions by SCA, a method that is similar to principal components analysis (PCA). Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures used to analyze differences. Statistical coupling analysis (SCA) is a technique used in bioinformatics to measure covariation between pairs of amino acids in a protein multiple sequence alignment (MSA).

In statistics, a Yates analysis is an approach to analyzing data obtained from a designed experiment, where a factorial design has been used. Full- and fractional-factorial designs are common in designed experiments for engineering and scientific applications. In these designs, each factor is assigned two levels. These are typically called the low and high levels. For computational purposes, the factors are scaled so that the low level is assigned a value of -1 and the high level is assigned a value of +1. These are also commonly referred to as "-" and "+".

Repeated measures design is a research design that involves multiple measures of the same variable taken on the same or matched subjects either under different conditions or over two or more time periods. For instance, repeated measurements are collected in a longitudinal study in which change over time is assessed.

In statistics, restricted randomization occurs in the design of experiments and in particular in the context of randomized experiments and randomized controlled trials. Restricted randomization allows intuitively poor allocations of treatments to experimental units to be avoided, while retaining the theoretical benefits of randomization. For example, in a clinical trial of a new proposed treatment of obesity compared to a control, an experimenter would want to avoid outcomes of the randomization in which the new treatment was allocated only to the heaviest patients.

In the design of experiments, completely randomized designs are for studying the effects of one primary factor without the need to take other nuisance variables into account. This article describes completely randomized designs that have one primary factor. The experiment compares the values of a response variable based on the different levels of that primary factor. For completely randomized designs, the levels of the primary factor are randomly assigned to the experimental units.

The following is a glossary of terms. It is not intended to be all-inclusive.

In statistics, a mixed-design analysis of variance model, also known as a split-plot ANOVA, is used to test for differences between two or more independent groups whilst subjecting participants to repeated measures. Thus, in a mixed-design ANOVA model, one factor is a between-subjects variable and the other is a within-subjects variable. Thus, overall, the model is a type of mixed-effects model.

Software that is used for designing factorial experiments plays an important role in scientific experiments and represents a route to the implementation of design of experiments procedures that derive from statistical and combinatorial theory. In principle, easy-to-use design of experiments (DOE) software should be available to all experimenters to foster use of DOE.

In the design of experiments, a between-group design is an experiment that has two or more groups of subjects each being tested by a different testing factor simultaneously. This design is usually used in place of, or in some cases in conjunction with, the within-subject design, which applies the same variations of conditions to each subject to observe the reactions. The simplest between-group design occurs with two groups; one is generally regarded as the treatment group, which receives the ‘special’ treatment, and the control group, which receives no variable treatment and is used as a reference The between-group design is widely used in psychological, economic, and sociological experiments, as well as in several other fields in the natural or social sciences.

A robust parameter design, introduced by Genichi Taguchi, is an experimental design used to exploit the interaction between control and uncontrollable noise variables by robustification—finding the settings of the control factors that minimize response variation from uncontrollable factors. Control variables are variables of which the experimenter has full control. Noise variables lie on the other side of the spectrum. While these variables may be easily controlled in an experimental setting, outside of the experimental world they are very hard, if not impossible, to control. Robust parameter designs use a naming convention similar to that of FFDs. A 2(m1+m2)-(p1-p2) is a 2-level design where m1 is the number of control factors, m2 is the number of noise factors, p1 is the level of fractionation for control factors, and p2 is the level of fractionation for noise factors.

Design–Expert is a statistical software package from Stat-Ease Inc. that is specifically dedicated to performing design of experiments (DOE). Design–Expert offers comparative tests, screening, characterization, optimization, robust parameter design, mixture designs and combined designs. Design–Expert provides test matrices for screening up to 50 factors. Statistical significance of these factors is established with analysis of variance (ANOVA). Graphical tools help identify the impact of each factor on the desired outcomes and reveal abnormalities in the data.

## References

• Box, G. E.; Hunter, W. G.; Hunter, J. S. (2005). Statistics for Experimenters: Design, Innovation, and Discovery (2nd ed.). Wiley. ISBN   978-0-471-71813-0.