Roderick J. Little | |
---|---|
Nationality | British |
Education | University of Cambridge Imperial College London |
Scientific career | |
Fields | Statistics |
Institutions | USEPA USCB George Washington University University of California, Los Angeles University of Michigan |
Thesis | Missing Values in Multivariate Statistical Analysis (1974) |
Doctoral advisors | |
Doctoral students |
Roderick Joseph Alexander Little is an academic statistician, whose main research contributions lie in the statistical analysis of data with missing values and the analysis of complex sample survey data. Little is Richard D. Remington Distinguished University Professor of Biostatistics in the Department of Biostatistics at the University of Michigan, where he also holds academic appointments in the Department of Statistics and the Institute for Social Research.
Little was born near London, England, and attended secondary school at Glasgow Academy in Scotland. He received a BA in Mathematics from Gonville and Caius College, Cambridge University, and an M.Sc. in Statistics and Operational Research and Ph.D. in Statistics at Imperial College of Science and Technology, University of London. His doctoral dissertation was on the analysis of data with missing values, [1] and was supervised by Professors Martin Beale and Sir David R. Cox.
After a two-year post-doc in the Department of Statistics at the University of Chicago in 1974-76, Little worked at the World Fertility Survey [2] from 1976–80, under the leadership of Sir Maurice Kendall. In 1980-82 he joined a group formed by Donald Rubin at the U.S. Environmental Protection Agency in Washington DC, and in 1982-3 he was an ASA/Census/NSF Fellow at the U.S. Census Bureau and an Adjunct Associate Professor at George Washington University. In 1983-93 he was Associate Professor and later Professor in the Department of Biomathematics at UCLA. He was appointed Professor and Chair of the Biostatistics Department at the University of Michigan in 1993 and chaired the department for 11 years between 1993 and 2009, a period of intensive departmental growth.
Little’s primary research interest is the analysis of data sets with missing values. Many statistical techniques are designed for complete, rectangular data sets, but in practice many data sets contain missing values, either by design or accident. In 1987, Little co-authored a book [3] [4] with Donald Rubin that was one of the earliest systematic treatments of the topic; the 2nd edition was published in 2002 and the 3rd edition in 2019. As detailed in that book, initial statistical approaches to missing values were relatively ad-hoc, such as discarding incomplete cases or substituting means. The main focus of the book is on likelihood-based inferential techniques, such as maximum likelihood and Bayesian inference, based on statistical models for the data and missing-data mechanism. The 1st edition focused mainly on maximum likelihood via the expectation-maximization (EM) algorithm, but later editions emphasize Bayesian methods and the related technique of multiple imputation. Little and Rubin were awarded the prestigious Karl Pearson Prize in 2017 by the International Statistical Institute (ISI), the leading international statistics society, for a research contribution that has had “profound influence on statistical theory, methodology or applications.” The citation for the award was as follows: “The work of Roderick J. Little and Donald B. Rubin, laid out in their seminal 1978 Biometrika papers and 1987 book, updated in 2002, has been no less than defining and transforming. Earlier missing data work was ad hoc at best. Little and Rubin defined the field and provided the methodological and applied communities with a useful and usable taxonomy and a set of key results. Today, their terminology and methodology is used more than ever. Their work has been transforming for the deep impact it had and has on both statistical practice and theory. It is one of the rare topics that has continued for the past thirty years to be studied and developed in academia, government and industry. For example, it plays a key role in the current work on sensitivity analysis with incomplete data.”
Little’s main methodological contributions to missing-data methods, in collaboration with his students and colleagues, include methods for missing data for mixtures of continuous and categorical data using the general location model, [5] pattern-mixture models [6] for data that are missing not at random, penalized spline of propensity models for missing data [7] and causal inference, [8] subsample ignorable likelihood methods [9] in regression, proxy pattern-mixture models [10] for survey nonresponse, models for longitudinal data, [11] [12] [13] partially missing at random models, [14] and review papers on missing data in regression, [15] hot-deck imputation, [16] and masking data for confidentiality protection. [17]
Another research area is the analysis of data collected by complex sampling designs involving stratification and clustering of units. Since working as a statistician for the World Fertility Survey, Little worked on the development of model-based methods for survey analysis that are robust to misspecification, reasonably efficient, and capable of implementation in applied settings. Contributions with students and colleagues in this area include articles on survey nonresponse, [18] [19] [20] [21] [22] Bayesian methods for survey inference, [23] [24] poststratification, [25] assessing selection bias, [26] and survey weighting from a Bayesian perspective. [27] [28]
Little advocates the calibrated Bayesian approach to statistical analysis, [29] [30] as proposed by George Box and Donald Rubin, among others. The idea is to develop Bayesian models for analysis that yield Bayesian inferences with good frequentist properties, such as posterior credible intervals that have close to nominal coverage when viewed as confidence intervals in repeated sampling. In the survey sampling arena, this leads to models that incorporate features of the sample design in the Bayesian model. Little argues that this Bayesian framework yields a more unified approach to survey sample inference than the design-based approach, which relies on the randomization distribution underlying sample selection as the basis for inference. Little’s applied interests in statistics are broad, including mental health, demography, environmental statistics, biology, economics, medicine, public health and the social sciences, as well as biostatistics.
Little is a strong advocate of the importance of independent government statistical agencies for democracy. He served two terms on the Committee on National Statistics of the National Academy of Sciences, and in 2010-12 was the inaugural Associate Director for Survey Research and Methodology and Chief Scientist at the U.S. Bureau of the Census, a position that has elevated scientific aspects of Census Bureau operations. He has participated in many National Academy of the Sciences panels, in particular chairing a studies on multiple sclerosis and other neurologic disorders in veterans of the Persian Gulf and Post 9/11 wars, and on the treatment of missing data in clinical trials. He has been active in advising the U.S. Food and Drug Administration and pharmaceutical companies on methods for handling missing data in clinical studies [31] [32] [33] [34] [35]
Little served two terms on the Board of Directors of the American Statistical Association (ASA), first as Editorial Representative and then as a Vice President. Editorially, he was Coordinating and Applications Editor of the Journal of the American Statistical Association in 1992-4, and later, as Chair of the Survey Research Methods section of the ASA, helped to start a new academic journal on survey statistics, the Journal of Survey Statistics and Methodology. He served as the Statistics Co-Editor in Chief for that journal in 2016-18. In 2016, Little received a Founder’s Award [36] from the ASA for his contributions to the statistics profession.
Little is a Fellow of the American Statistical Association and the American Academy of Arts and Sciences, and a Member of the International Statistical Institute and the U.S. National Academy of Medicine. In 2005 he received the ASA Wilks’ memorial award for contributions to statistics. Plenary talks include the 2005 President’s Invited Address and the 2012 COPSS Fisher Lecture at the Joint Statistical Meetings, and the President’s Invited Address at the 2018 Eastern North American Region Meeting of the International Biometric Society. In 2020 he received the Marvin Zelen Leadership Award in Statistical Science from Harvard University.
Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model to be studied. Populations can be diverse groups of people or objects such as "all people living in a country" or "every atom composing a crystal". Statistics deals with every aspect of data, including the planning of data collection in terms of the design of surveys and experiments.
Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability. Inferential statistical analysis infers properties of a population, for example by testing hypotheses and deriving estimates. It is assumed that the observed data set is sampled from a larger population.
The theory of statistics provides a basis for the whole range of techniques, in both study design and data analysis, that are used within applications of statistics. The theory covers approaches to statistical-decision problems and to statistical inference, and the actions and deductions that satisfy the basic principles stated for these different approaches. Within a given approach, statistical theory gives ways of comparing statistical procedures; it can find a best possible procedure within a given context for given statistical problems, or can provide guidance on the choice between alternative procedures.
A statistical hypothesis test is a method of statistical inference used to decide whether the data sufficiently support a particular hypothesis. A statistical hypothesis test typically involves a calculation of a test statistic. Then a decision is made, either by comparing the test statistic to a critical value or equivalently by evaluating a p-value computed from the test statistic. Roughly 100 specialized statistical tests have been defined.
In statistics, Markov chain Monte Carlo (MCMC) is a class of algorithms used to draw samples from a probability distribution. Given a probability distribution, one can construct a Markov chain whose elements' distribution approximates it – that is, the Markov chain's equilibrium distribution matches the target distribution. The more steps that are included, the more closely the distribution of the sample matches the actual desired distribution.
Social statistics is the use of statistical measurement systems to study human behavior in a social environment. This can be accomplished through polling a group of people, evaluating a subset of data obtained about a group of people, or by observation and statistical analysis of a set of data that relates to people and their behaviors.
Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability, where probability expresses a degree of belief in an event. The degree of belief may be based on prior knowledge about the event, such as the results of previous experiments, or on personal beliefs about the event. This differs from a number of other interpretations of probability, such as the frequentist interpretation, which views probability as the limit of the relative frequency of an event after many trials. More concretely, analysis in Bayesian methods codifies prior knowledge in the form of a prior distribution.
In statistics, imputation is the process of replacing missing data with substituted values. When substituting for a data point, it is known as "unit imputation"; when substituting for a component of a data point, it is known as "item imputation". There are three main problems that missing data causes: missing data can introduce a substantial amount of bias, make the handling and analysis of the data more arduous, and create reductions in efficiency. Because missing data can create problems for analyzing data, imputation is seen as a way to avoid pitfalls involved with listwise deletion of cases that have missing values. That is to say, when one or more values are missing for a case, most statistical packages default to discarding any case that has a missing value, which may introduce bias or affect the representativeness of the results. Imputation preserves all cases by replacing missing data with an estimated value based on other available information. Once all missing values have been imputed, the data set can then be analysed using standard techniques for complete data. There have been many theories embraced by scientists to account for missing data but the majority of them introduce bias. A few of the well known attempts to deal with missing data include: hot deck and cold deck imputation; listwise and pairwise deletion; mean imputation; non-negative matrix factorization; regression imputation; last observation carried forward; stochastic imputation; and multiple imputation.
In statistics, ignorability is a feature of an experiment design whereby the method of data collection does not depend on the missing data. A missing data mechanism such as a treatment assignment or survey sampling strategy is "ignorable" if the missing data matrix, which indicates which variables are observed or missing, is independent of the missing data conditional on the observed data.
In science, randomized experiments are the experiments that allow the greatest reliability and validity of statistical estimates of treatment effects. Randomization-based inference is especially important in experimental design and in survey sampling.
In statistics, missing data, or missing values, occur when no data value is stored for the variable in an observation. Missing data are a common occurrence and can have a significant effect on the conclusions that can be drawn from the data.
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics that can be used to estimate the posterior distributions of model parameters.
The foundations of statistics consists of the mathematical and philosophical basis for arguments and inferences made using statistics. This includes the justification for methods of statistical inference, estimation and hypothesis testing, the quantification of uncertainty in the conclusions of statistical arguments, and the interpretation of those conclusions in probabilistic terms. A valid foundation can be used to explain statistical paradoxes such as Simpson's paradox, provide a precise description of observed statistical laws, and guide the application of statistical conclusions in social and scientific applications.
Ziheng Yang FRS is a Chinese biologist. He holds the R.A. Fisher Chair of Statistical Genetics at University College London, and is the Director of R.A. Fisher Centre for Computational Biology at UCL. He was elected a Fellow of the Royal Society in 2006.
Alan Enoch Gelfand is an American statistician, and is currently the James B. Duke Professor of Statistics and Decision Sciences at Duke University. Gelfand’s research includes substantial contributions to the fields of Bayesian statistics, spatial statistics and hierarchical modeling.
Dipak Kumar Dey is an Indian-American statistician best known for his work on Bayesian methodologies. He is currently the Board of Trustees Distinguished Professor in the Department of Statistics at the University of Connecticut. Dey has an international reputation as a statistician as well as a data scientist. Since he earned a Ph.D. degree in statistics from Purdue University in 1980, Dey has made tremendous contributions to the development of modern statistics, especially in Bayesian analysis, decision science and model selection. Dey has published more than 10 books and edited volumes, and over 260 research articles in peer-refereed national and international journals. In addition, the statistical methodologies that he has developed has found wide applications in a plethora of interdisciplinary and applied fields, such as biometry and bioinformatics, genetics, econometrics, environmental science, and social science. Dey has supervised 40 Ph.D. students, and presented more than 200 professional talks in colloquia, seminars and conferences all over the world. During his career, Dey has been a visiting professor or scholar at many institutions or research centers around the world, such as Macquarie University, Pontificia Universidad Católica de Chile,, University of São Paulo, University of British Columbia, Statistical and Applied Mathematical Sciences Institute, etc. Dey is an elected fellow of the American Association for the Advancement of Science, the American Statistical Association, the Institute of Mathematical Statistics, the International Society for Bayesian Analysis and the International Statistical Institute.
Likelihoodist statistics or likelihoodism is an approach to statistics that exclusively or primarily uses the likelihood function. Likelihoodist statistics is a more minor school than the main approaches of Bayesian statistics and frequentist statistics, but has some adherents and applications. The central idea of likelihoodism is the likelihood principle: data are interpreted as evidence, and the strength of the evidence is measured by the likelihood function. Beyond this, there are significant differences within likelihood approaches: "orthodox" likelihoodists consider data only as evidence, and do not use it as the basis of statistical inference, while others make inferences based on likelihood, but without using Bayesian inference or frequentist inference. Likelihoodism is thus criticized for either not providing a basis for belief or action, or not satisfying the requirements of these other schools.
Siddhartha Chib is an econometrician and statistician, the Harry C. Hartkopf Professor of Econometrics and Statistics at Washington University in St. Louis. His work is primarily in Bayesian statistics, econometrics, and Markov chain Monte Carlo methods.
In the field of epidemiology, source attribution refers to a category of methods with the objective of reconstructing the transmission of an infectious disease from a specific source, such as a population, individual, or location. For example, source attribution methods may be used to trace the origin of a new pathogen that recently crossed from another host species into humans, or from one geographic region to another. It may be used to determine the common source of an outbreak of a foodborne infectious disease, such as a contaminated water supply. Finally, source attribution may be used to estimate the probability that an infection was transmitted from one specific individual to another, i.e., "who infected whom".
{{cite book}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link)