Optimal experimental design

Last updated
Gustav Elfving developed the optimal design of experiments, and so minimized surveyors' need for theodolite measurements (pictured), while trapped in his tent in storm-ridden Greenland. Theb1982.jpg
Gustav Elfving developed the optimal design of experiments, and so minimized surveyors' need for theodolite measurements (pictured), while trapped in his tent in storm-ridden Greenland.

In the design of experiments, optimal experimental designs (or optimum designs [2] ) are a class of experimental designs that are optimal with respect to some statistical criterion. The creation of this field of statistics has been credited to Danish statistician Kirstine Smith. [3] [4]

Contents

In the design of experiments for estimating statistical models, optimal designs allow parameters to be estimated without bias and with minimum variance. A non-optimal design requires a greater number of experimental runs to estimate the parameters with the same precision as an optimal design. In practical terms, optimal experiments can reduce the costs of experimentation.

The optimality of a design depends on the statistical model and is assessed with respect to a statistical criterion, which is related to the variance-matrix of the estimator. Specifying an appropriate model and specifying a suitable criterion function both require understanding of statistical theory and practical knowledge with designing experiments.

Advantages

Optimal designs offer three advantages over sub-optimal experimental designs: [5]

  1. Optimal designs reduce the costs of experimentation by allowing statistical models to be estimated with fewer experimental runs.
  2. Optimal designs can accommodate multiple types of factors, such as process, mixture, and discrete factors.
  3. Designs can be optimized when the design-space is constrained, for example, when the mathematical process-space contains factor-settings that are practically infeasible (e.g. due to safety concerns).

Minimizing the variance of estimators

Experimental designs are evaluated using statistical criteria. [6]

It is known that the least squares estimator minimizes the variance of mean-unbiased estimators (under the conditions of the Gauss–Markov theorem). In the estimation theory for statistical models with one real parameter, the reciprocal of the variance of an ("efficient") estimator is called the "Fisher information" for that estimator. [7] Because of this reciprocity, minimizing the variance corresponds to maximizing the information.

When the statistical model has several parameters, however, the mean of the parameter-estimator is a vector and its variance is a matrix. The inverse matrix of the variance-matrix is called the "information matrix". Because the variance of the estimator of a parameter vector is a matrix, the problem of "minimizing the variance" is complicated. Using statistical theory, statisticians compress the information-matrix using real-valued summary statistics; being real-valued functions, these "information criteria" can be maximized. [8] The traditional optimality-criteria are invariants of the information matrix; algebraically, the traditional optimality-criteria are functionals of the eigenvalues of the information matrix.

Other optimality-criteria are concerned with the variance of predictions:

Contrasts

In many applications, the statistician is most concerned with a "parameter of interest" rather than with "nuisance parameters". More generally, statisticians consider linear combinations of parameters, which are estimated via linear combinations of treatment-means in the design of experiments and in the analysis of variance; such linear combinations are called contrasts. Statisticians can use appropriate optimality-criteria for such parameters of interest and for contrasts. [12]

Implementation

Catalogs of optimal designs occur in books and in software libraries.

In addition, major statistical systems like SAS and R have procedures for optimizing a design according to a user's specification. The experimenter must specify a model for the design and an optimality-criterion before the method can compute an optimal design. [13]

Practical considerations

Some advanced topics in optimal design require more statistical theory and practical knowledge in designing experiments.

Model dependence and robustness

Since the optimality criterion of most optimal designs is based on some function of the information matrix, the 'optimality' of a given design is model dependent: While an optimal design is best for that model, its performance may deteriorate on other models. On other models, an optimal design can be either better or worse than a non-optimal design. [14] Therefore, it is important to benchmark the performance of designs under alternative models. [15]

Choosing an optimality criterion and robustness

The choice of an appropriate optimality criterion requires some thought, and it is useful to benchmark the performance of designs with respect to several optimality criteria. Cornell writes that

since the [traditional optimality] criteria . . . are variance-minimizing criteria, . . . a design that is optimal for a given model using one of the . . . criteria is usually near-optimal for the same model with respect to the other criteria.

[16]

Indeed, there are several classes of designs for which all the traditional optimality-criteria agree, according to the theory of "universal optimality" of Kiefer. [17] The experience of practitioners like Cornell and the "universal optimality" theory of Kiefer suggest that robustness with respect to changes in the optimality-criterion is much greater than is robustness with respect to changes in the model.

Flexible optimality criteria and convex analysis

High-quality statistical software provide a combination of libraries of optimal designs or iterative methods for constructing approximately optimal designs, depending on the model specified and the optimality criterion. Users may use a standard optimality-criterion or may program a custom-made criterion.

All of the traditional optimality-criteria are convex (or concave) functions, and therefore optimal-designs are amenable to the mathematical theory of convex analysis and their computation can use specialized methods of convex minimization. [18] The practitioner need not select exactly one traditional, optimality-criterion, but can specify a custom criterion. In particular, the practitioner can specify a convex criterion using the maxima of convex optimality-criteria and nonnegative combinations of optimality criteria (since these operations preserve convex functions). For convex optimality criteria, the Kiefer-Wolfowitz equivalence theorem allows the practitioner to verify that a given design is globally optimal. [19] The Kiefer-Wolfowitz equivalence theorem is related with the Legendre-Fenchel conjugacy for convex functions. [20]

If an optimality-criterion lacks convexity, then finding a global optimum and verifying its optimality often are difficult.

Model uncertainty and Bayesian approaches

Model selection

When scientists wish to test several theories, then a statistician can design an experiment that allows optimal tests between specified models. Such "discrimination experiments" are especially important in the biostatistics supporting pharmacokinetics and pharmacodynamics, following the work of Cox and Atkinson. [21]

Bayesian experimental design

When practitioners need to consider multiple models, they can specify a probability-measure on the models and then select any design maximizing the expected value of such an experiment. Such probability-based optimal-designs are called optimal Bayesian designs. Such Bayesian designs are used especially for generalized linear models (where the response follows an exponential-family distribution). [22]

The use of a Bayesian design does not force statisticians to use Bayesian methods to analyze the data, however. Indeed, the "Bayesian" label for probability-based experimental-designs is disliked by some researchers. [23] Alternative terminology for "Bayesian" optimality includes "on-average" optimality or "population" optimality.

Iterative experimentation

Scientific experimentation is an iterative process, and statisticians have developed several approaches to the optimal design of sequential experiments.

Sequential analysis

Sequential analysis was pioneered by Abraham Wald. [24] In 1972, Herman Chernoff wrote an overview of optimal sequential designs, [25] while adaptive designs were surveyed later by S. Zacks. [26] Of course, much work on the optimal design of experiments is related to the theory of optimal decisions, especially the statistical decision theory of Abraham Wald. [27]

Response-surface methodology

Optimal designs for response-surface models are discussed in the textbook by Atkinson, Donev and Tobias, and in the survey of Gaffke and Heiligers and in the mathematical text of Pukelsheim. The blocking of optimal designs is discussed in the textbook of Atkinson, Donev and Tobias and also in the monograph by Goos.

The earliest optimal designs were developed to estimate the parameters of regression models with continuous variables, for example, by J. D. Gergonne in 1815 (Stigler). In English, two early contributions were made by Charles S. Peirce and Kirstine Smith.

Pioneering designs for multivariate response-surfaces were proposed by George E. P. Box. However, Box's designs have few optimality properties. Indeed, the Box–Behnken design requires excessive experimental runs when the number of variables exceeds three. [28] Box's "central-composite" designs require more experimental runs than do the optimal designs of Kôno. [29]

System identification and stochastic approximation

The optimization of sequential experimentation is studied also in stochastic programming and in systems and control. Popular methods include stochastic approximation and other methods of stochastic optimization. Much of this research has been associated with the subdiscipline of system identification. [30] In computational optimal control, D. Judin & A. Nemirovskii and Boris Polyak has described methods that are more efficient than the (Armijo-style) step-size rules introduced by G. E. P. Box in response-surface methodology. [31]

Adaptive designs are used in clinical trials, and optimal adaptive designs are surveyed in the Handbook of Experimental Designs chapter by Shelemyahu Zacks.

Specifying the number of experimental runs

Using a computer to find a good design

There are several methods of finding an optimal design, given an a priori restriction on the number of experimental runs or replications. Some of these methods are discussed by Atkinson, Donev and Tobias and in the paper by Hardin and Sloane. Of course, fixing the number of experimental runs a priori would be impractical. Prudent statisticians examine the other optimal designs, whose number of experimental runs differ.

Discretizing probability-measure designs

In the mathematical theory on optimal experiments, an optimal design can be a probability measure that is supported on an infinite set of observation-locations. Such optimal probability-measure designs solve a mathematical problem that neglected to specify the cost of observations and experimental runs. Nonetheless, such optimal probability-measure designs can be discretized to furnish approximately optimal designs. [32]

In some cases, a finite set of observation-locations suffices to support an optimal design. Such a result was proved by Kôno and Kiefer in their works on response-surface designs for quadratic models. The Kôno–Kiefer analysis explains why optimal designs for response-surfaces can have discrete supports, which are very similar as do the less efficient designs that have been traditional in response surface methodology. [33]

History

In 1815, an article on optimal designs for polynomial regression was published by Joseph Diaz Gergonne, according to Stigler.

Charles S. Peirce proposed an economic theory of scientific experimentation in 1876, which sought to maximize the precision of the estimates. Peirce's optimal allocation immediately improved the accuracy of gravitational experiments and was used for decades by Peirce and his colleagues. In his 1882 published lecture at Johns Hopkins University, Peirce introduced experimental design with these words:

Logic will not undertake to inform you what kind of experiments you ought to make in order best to determine the acceleration of gravity, or the value of the Ohm; but it will tell you how to proceed to form a plan of experimentation.

[....] Unfortunately practice generally precedes theory, and it is the usual fate of mankind to get things done in some boggling way first, and find out afterward how they could have been done much more easily and perfectly. [34]

Kirstine Smith proposed optimal designs for polynomial models in 1918. (Kirstine Smith had been a student of the Danish statistician Thorvald N. Thiele and was working with Karl Pearson in London.)

See also

Notes

  1. Nordström (1999 , p. 176)
  2. The adjective "optimum" (and not "optimal") "is the slightly older form in English and avoids the construction 'optim(um) + al´—there is no 'optimalis' in Latin" (page x in Optimum Experimental Designs, with SAS, by Atkinson, Donev, and Tobias).
  3. Guttorp, P.; Lindgren, G. (2009). "Karl Pearson and the Scandinavian school of statistics". International Statistical Review. 77: 64. CiteSeerX   10.1.1.368.8328 . doi:10.1111/j.1751-5823.2009.00069.x. S2CID   121294724.
  4. Smith, Kirstine (1918). "On the standard deviations of adjusted and interpolated values of an observed polynomial function and its constants and the guidance they give towards a proper choice of the distribution of observations". Biometrika. 12 (1/2): 1–85. doi:10.2307/2331929. JSTOR   2331929.
  5. These three advantages (of optimal designs) are documented in the textbook by Atkinson, Donev, and Tobias.
  6. Such criteria are called objective functions in optimization theory.
  7. The Fisher information and other "information" functionals are fundamental concepts in statistical theory.
  8. Traditionally, statisticians have evaluated estimators and designs by considering some summary statistic of the covariance matrix (of a mean-unbiased estimator), usually with positive real values (like the determinant or matrix trace). Working with positive real-numbers brings several advantages: If the estimator of a single parameter has a positive variance, then the variance and the Fisher information are both positive real numbers; hence they are members of the convex cone of nonnegative real numbers (whose nonzero members have reciprocals in this same cone).
    For several parameters, the covariance-matrices and information-matrices are elements of the convex cone of nonnegative-definite symmetric matrices in a partially ordered vector space, under the Loewner (Löwner) order. This cone is closed under matrix-matrix addition, under matrix-inversion, and under the multiplication of positive real-numbers and matrices. An exposition of matrix theory and the Loewner-order appears in Pukelsheim.
  9. Atkinson, A. C.; Fedorov, V. V. (1975). "The design of experiments for discriminating between two rival models". Biometrika. 62 (1): 57–70. doi:10.1093/biomet/62.1.57. ISSN   0006-3444.
  10. The above optimality-criteria are convex functions on domains of symmetric positive-semidefinite matrices: See an on-line textbook for practitioners, which has many illustrations and statistical applications: Boyd and Vandenberghe discuss optimal experimental designs on pages 384–396.
  11. Optimality criteria for "parameters of interest" and for contrasts are discussed by Atkinson, Donev and Tobias.
  12. Iterative methods and approximation algorithms are surveyed in the textbook by Atkinson, Donev and Tobias and in the monographs of Fedorov (historical) and Pukelsheim, and in the survey article by Gaffke and Heiligers.
  13. See Kiefer ("Optimum Designs for Fitting Biased Multiresponse Surfaces" pages 289–299).
  14. Such benchmarking is discussed in the textbook by Atkinson et al. and in the papers of Kiefer. Model-robust designs (including "Bayesian" designs) are surveyed by Chang and Notz.
  15. Cornell, John (2002). Experiments with Mixtures: Designs, Models, and the Analysis of Mixture Data (third ed.). Wiley. ISBN   978-0-471-07916-3. (Pages 400-401)
  16. An introduction to "universal optimality" appears in the textbook of Atkinson, Donev, and Tobias. More detailed expositions occur in the advanced textbook of Pukelsheim and the papers of Kiefer.
  17. Computational methods are discussed by Pukelsheim and by Gaffke and Heiligers.
  18. The Kiefer-Wolfowitz equivalence theorem is discussed in Chapter 9 of Atkinson, Donev, and Tobias.
  19. Pukelsheim uses convex analysis to study Kiefer-Wolfowitz equivalence theorem in relation to the Legendre-Fenchel conjugacy for convex functions The minimization of convex functions on domains of symmetric positive-semidefinite matrices is explained in an on-line textbook for practitioners, which has many illustrations and statistical applications: Boyd and Vandenberghe discuss optimal experimental designs on pages 384–396.
  20. See Chapter 20 in Atkinison, Donev, and Tobias.
  21. Bayesian designs are discussed in Chapter 18 of the textbook by Atkinson, Donev, and Tobias. More advanced discussions occur in the monograph by Fedorov and Hackl, and the articles by Chaloner and Verdinelli and by DasGupta. Bayesian designs and other aspects of "model-robust" designs are discussed by Chang and Notz.
  22. As an alternative to "Bayesian optimality", "on-average optimality" is advocated in Fedorov and Hackl.
  23. Wald, Abraham (June 1945). "Sequential Tests of Statistical Hypotheses". The Annals of Mathematical Statistics. 16 (2): 117–186. doi: 10.1214/aoms/1177731118 . JSTOR   2235829.
  24. Chernoff, H. (1972) Sequential Analysis and Optimal Design, SIAM Monograph.
  25. Zacks, S. (1996) "Adaptive Designs for Parametric Models". In: Ghosh, S. and Rao, C. R., (Eds) (1996). Design and Analysis of Experiments, Handbook of Statistics, Volume 13. North-Holland. ISBN   0-444-82061-2. (pages 151–180)
  26. Henry P. Wynn wrote, "the modern theory of optimum design has its roots in the decision theory school of U.S. statistics founded by Abraham Wald" in his introduction "Jack Kiefer's Contributions to Experimental Design", which is pages xvii–xxiv in the following volume: Kiefer acknowledges Wald's influence and results on many pages – 273 (page 55 in the reprinted volume), 280 (62), 289-291 (71-73), 294 (76), 297 (79), 315 (97) 319 (101) – in this article:
    • Kiefer, J. (1959). "Optimum Experimental Designs". Journal of the Royal Statistical Society, Series B. 21: 272–319.
  27. In the field of response surface methodology, the inefficiency of the Box–Behnken design is noted by Wu and Hamada (page 422).
    • Wu, C. F. Jeff & Hamada, Michael (2002). Experiments: Planning, Analysis, and Parameter Design Optimization. Wiley. ISBN   978-0-471-25511-6.
    Optimal designs for "follow-up" experiments are discussed by Wu and Hamada.
  28. The inefficiency of Box's "central-composite" designs are discussed by according to Atkinson, Donev, and Tobias (page 165). These authors also discuss the blocking of Kôno-type designs for quadratic response-surfaces.
  29. In system identification, the following books have chapters on optimal experimental design:
  30. Some step-size rules for of Judin & Nemirovskii and of Polyak Archived 2007-10-31 at the Wayback Machine are explained in the textbook by Kushner and Yin:
  31. The discretization of optimal probability-measure designs to provide approximately optimal designs is discussed by Atkinson, Donev, and Tobias and by Pukelsheim (especially Chapter 12).
  32. Regarding designs for quadratic response-surfaces, the results of Kôno and Kiefer are discussed in Atkinson, Donev, and Tobias. Mathematically, such results are associated with Chebyshev polynomials, "Markov systems", and "moment spaces": See
  33. Peirce, C. S. (1882), "Introductory Lecture on the Study of Logic" delivered September 1882, published in Johns Hopkins University Circulars, v. 2, n. 19, pp. 11–12, November 1882, see p. 11, Google Books Eprint. Reprinted in Collected Papers v. 7, paragraphs 59–76, see 59, 63, Writings of Charles S. Peirce v. 4, pp. 378–82, see 378, 379, and The Essential Peirce v. 1, pp. 210–14, see 210–1, also lower down on 211.

Related Research Articles

<span class="mw-page-title-main">Design of experiments</span> Design of tasks

The design of experiments, also known as experiment design or experimental design, is the design of any task that aims to describe and explain the variation of information under conditions that are hypothesized to reflect the variation. The term is generally associated with experiments in which the design introduces conditions that directly affect the variation, but may also refer to the design of quasi-experiments, in which natural conditions that influence the variation are selected for observation.

Engineering statistics combines engineering and statistics using scientific methods for analyzing data. Engineering statistics involves data concerning manufacturing processes such as: component dimensions, tolerances, type of material, and fabrication process control. There are many methods used in engineering analysis and they are often displayed as histograms to give a visual of the data as opposed to being just numerical. Examples of methods are:

  1. Design of Experiments (DOE) is a methodology for formulating scientific and engineering problems using statistical models. The protocol specifies a randomization procedure for the experiment and specifies the primary data-analysis, particularly in hypothesis testing. In a secondary analysis, the statistical analyst further examines the data to suggest other questions and to help plan future experiments. In engineering applications, the goal is often to optimize a process or product, rather than to subject a scientific hypothesis to test of its predictive adequacy. The use of optimal designs reduces the cost of experimentation.
  2. Quality control and process control use statistics as a tool to manage conformance to specifications of manufacturing processes and their products.
  3. Time and methods engineering use statistics to study repetitive operations in manufacturing in order to set standards and find optimum manufacturing procedures.
  4. Reliability engineering which measures the ability of a system to perform for its intended function and has tools for improving performance.
  5. Probabilistic design involving the use of probability in product and system design
  6. System identification uses statistical methods to build mathematical models of dynamical systems from measured data. System identification also includes the optimal design of experiments for efficiently generating informative data for fitting such models.
<span class="mw-page-title-main">Statistical inference</span> Process of using data analysis

Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability. Inferential statistical analysis infers properties of a population, for example by testing hypotheses and deriving estimates. It is assumed that the observed data set is sampled from a larger population.

The theory of statistics provides a basis for the whole range of techniques, in both study design and data analysis, that are used within applications of statistics. The theory covers approaches to statistical-decision problems and to statistical inference, and the actions and deductions that satisfy the basic principles stated for these different approaches. Within a given approach, statistical theory gives ways of comparing statistical procedures; it can find a best possible procedure within a given context for given statistical problems, or can provide guidance on the choice between alternative procedures.

The following outline is provided as an overview of and topical guide to statistics:

<span class="mw-page-title-main">Least squares</span> Approximation method in statistics

The method of least squares is a standard approach in regression analysis to approximate the solution of overdetermined systems by minimizing the sum of the squares of the residuals made in the results of each individual equation.

In statistics, point estimation involves the use of sample data to calculate a single value which is to serve as a "best guess" or "best estimate" of an unknown population parameter. More formally, it is the application of a point estimator to the data to obtain a point estimate.

Taguchi methods are statistical methods, sometimes called robust design methods, developed by Genichi Taguchi to improve the quality of manufactured goods, and more recently also applied to engineering, biotechnology, marketing and advertising. Professional statisticians have welcomed the goals and improvements brought about by Taguchi methods, particularly by Taguchi's development of designs for studying variation, but have criticized the inefficiency of some of Taguchi's proposals.

In mathematical statistics, the Fisher information is a way of measuring the amount of information that an observable random variable X carries about an unknown parameter θ of a distribution that models X. Formally, it is the variance of the score, or the expected value of the observed information.

Bayesian experimental design provides a general probability-theoretical framework from which other theories on experimental design can be derived. It is based on Bayesian inference to interpret the observations/data acquired during the experiment. This allows accounting for both any prior knowledge on the parameters to be determined as well as uncertainties in observations.

Model selection is the task of selecting a model from among various candidates on the basis of performance criterion to choose the best one. In the context of machine learning and more generally statistical analysis, this may be the selection of a statistical model from a set of candidate models, given data. In the simplest cases, a pre-existing set of data is considered. However, the task can also involve the design of experiments such that the data collected is well-suited to the problem of model selection. Given candidate models of similar predictive or explanatory power, the simplest model is most likely to be the best choice.

<span class="mw-page-title-main">Response surface methodology</span> Statistical approach

In statistics, response surface methodology (RSM) explores the relationships between several explanatory variables and one or more response variables. The method was introduced by George E. P. Box and K. B. Wilson in 1951. The main idea of RSM is to use a sequence of designed experiments to obtain an optimal response. Box and Wilson suggest using a second-degree polynomial model to do this. They acknowledge that this model is only an approximation, but they use it because such a model is easy to estimate and apply, even when little is known about the process.

Statistics, in the modern sense of the word, began evolving in the 18th century in response to the novel needs of industrializing sovereign states.

Oscar Kempthorne was a British statistician and geneticist known for his research on randomization-analysis and the design of experiments, which had wide influence on research in agriculture, genetics, and other areas of science.

Linear least squares (LLS) is the least squares approximation of linear functions to data. It is a set of formulations for solving statistical problems involved in linear regression, including variants for ordinary (unweighted), weighted, and generalized (correlated) residuals. Numerical methods for linear least squares include inverting the matrix of the normal equations and orthogonal decomposition methods.

In statistics, efficiency is a measure of quality of an estimator, of an experimental design, or of a hypothesis testing procedure. Essentially, a more efficient estimator needs fewer input data or observations than a less efficient one to achieve the Cramér–Rao bound. An efficient estimator is characterized by having the smallest possible variance, indicating that there is a small deviance between the estimated value and the "true" value in the L2 norm sense.

In statistics, the focused information criterion (FIC) is a method for selecting the most appropriate model among a set of competitors for a given data set. Unlike most other model selection strategies, like the Akaike information criterion (AIC), the Bayesian information criterion (BIC) and the deviance information criterion (DIC), the FIC does not attempt to assess the overall fit of candidate models but focuses attention directly on the parameter of primary interest with the statistical analysis, say , for which competing models lead to different estimates, say for model . The FIC method consists in first developing an exact or approximate expression for the precision or quality of each estimator, say for , and then use data to estimate these precision measures, say . In the end the model with best estimated precision is selected. The FIC methodology was developed by Gerda Claeskens and Nils Lid Hjort, first in two 2003 discussion articles in Journal of the American Statistical Association and later on in other papers and in their 2008 book.

<span class="mw-page-title-main">Gustav Elfving</span> Finnish mathematician and statistician

Erik Gustav Elfving was a Finnish mathematician and statistician. He wrote pioneering works in mathematical statistics, especially on the design of experiments.

References

Further reading

Textbooks for practitioners and students

Textbooks emphasizing regression and response-surface methodology

The textbook by Atkinson, Donev and Tobias has been used for short courses for industrial practitioners as well as university courses.

Textbooks emphasizing block designs

Optimal block designs are discussed by Bailey and by Bapat. The first chapter of Bapat's book reviews the linear algebra used by Bailey (or the advanced books below). Bailey's exercises and discussion of randomization both emphasize statistical concepts (rather than algebraic computations).

Optimal block designs are discussed in the advanced monograph by Shah and Sinha and in the survey-articles by Cheng and by Majumdar.

Books for professional statisticians and researchers

Articles and chapters

Historical