Miriam Gasko Donoho

Last updated

Miriam (Miki) Gasko Donoho (also published as Miriam Gasko-Green) is an American statistician whose research topics have included data visualization, [A] [1] equivalences between binary regression and survival analysis, [B] [2] and robust regression. [C] [3]

Contents

Education and career

Gasko completed her Ph.D. in statistics at Harvard University in 1981. [4] Her dissertation was Testing Sequentially Selected Outliers from Linear Models. [5] She became a professor of marketing and quantitative studies in the College of Business at San Jose State University, [6] director of the Silicon Valley Consumer Confidence Survey, [7] and treasurer of the Institute of Mathematical Statistics. [6]

Recognition

She is a Fellow of the Institute of Mathematical Statistics. [8] MacSpin, a program for three-dimensional data visualization that she developed with her husband David Donoho and brother-in-law Andrew Donoho, was named as the best scientific/engineering software of 1987 by MacUser magazine. [9]

Selected publications

Related Research Articles

Statistics is a field of inquiry that studies the collection, analysis, interpretation, and presentation of data. It is applicable to a wide variety of academic disciplines, from the physical and social sciences to the humanities; it is also used and misused for making informed decisions in all areas of business and government.

In statistics, exploratory data analysis (EDA) is an approach of analyzing data sets to summarize their main characteristics, often using statistical graphics and other data visualization methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling and thereby contrasts traditional hypothesis testing. Exploratory data analysis has been promoted by John Tukey since 1970 to encourage statisticians to explore the data, and possibly formulate hypotheses that could lead to new data collection and experiments. EDA is different from initial data analysis (IDA), which focuses more narrowly on checking assumptions required for model fitting and hypothesis testing, and handling missing values and making transformations of variables as needed. EDA encompasses IDA.

The Mahalanobis distance is a measure of the distance between a point P and a distribution D, introduced by P. C. Mahalanobis in 1936. Mahalanobis's definition was prompted by the problem of identifying the similarities of skulls based on measurements in 1927.

<span class="mw-page-title-main">Regression analysis</span> Set of statistical processes for estimating the relationships among variables

In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable and one or more independent variables. The most common form of regression analysis is linear regression, in which one finds the line that most closely fits the data according to a specific mathematical criterion. For example, the method of ordinary least squares computes the unique line that minimizes the sum of squared differences between the true data and that line. For specific mathematical reasons, this allows the researcher to estimate the conditional expectation of the dependent variable when the independent variables take on a given set of values. Less common forms of regression use slightly different procedures to estimate alternative location parameters or estimate the conditional expectation across a broader collection of non-linear models.

The general linear model or general multivariate regression model is a compact way of simultaneously writing several multiple linear regression models. In that sense it is not a separate statistical linear model. The various multiple linear regression models may be compactly written as

<span class="mw-page-title-main">Mathematical statistics</span> Branch of statistics

Mathematical statistics is the application of probability theory, a branch of mathematics, to statistics, as opposed to techniques for collecting statistical data. Specific mathematical techniques which are used for this include mathematical analysis, linear algebra, stochastic analysis, differential equations, and measure theory.

In statistics, classification is the problem of identifying which of a set of categories (sub-populations) an observation belongs to. Examples are assigning a given email to the "spam" or "non-spam" class, and assigning a diagnosis to a given patient based on observed characteristics of the patient.

In robust statistics, robust regression seeks to overcome some limitations of traditional regression analysis. A regression analysis models the relationship between one or more independent variables and a dependent variable. Standard types of regression, such as ordinary least squares, have favourable properties if their underlying assumptions are true, but can give misleading results otherwise. Robust regression methods are designed to limit the effect that violations of assumptions by the underlying data-generating process have on regression estimates.

Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal. Robust statistical methods have been developed for many common problems, such as estimating location, scale, and regression parameters. One motivation is to produce statistical methods that are not unduly affected by outliers. Another motivation is to provide methods with good performance when there are small departures from a parametric distribution. For example, robust methods work well for mixtures of two normal distributions with different standard deviations; under this model, non-robust methods like a t-test work poorly.

<span class="mw-page-title-main">Local regression</span> Moving average and polynomial regression method for smoothing data

Local regression or local polynomial regression, also known as moving regression, is a generalization of the moving average and polynomial regression. Its most common methods, initially developed for scatterplot smoothing, are LOESS and LOWESS, both pronounced. They are two strongly related non-parametric regression methods that combine multiple regression models in a k-nearest-neighbor-based meta-model. In some fields, LOESS is known and commonly referred to as Savitzky–Golay filter.

<span class="mw-page-title-main">Anscombe's quartet</span> Four data sets with the same descriptive statistics, yet very different distributions

Anscombe's quartet comprises four data sets that have nearly identical simple descriptive statistics, yet have very different distributions and appear very different when graphed. Each dataset consists of eleven (x,y) points. They were constructed in 1973 by the statistician Francis Anscombe to demonstrate both the importance of graphing data when analyzing it, and the effect of outliers and other influential observations on statistical properties. He described the article as being intended to counter the impression among statisticians that "numerical calculations are exact, but graphs are rough."

<span class="mw-page-title-main">Michael Friendly</span>

Michael Louis Friendly is an American-Canadian psychologist, Professor of Psychology at York University in Ontario, Canada, and director of its Statistical Consulting Service, especially known for his contributions to graphical methods for categorical and multivariate data, and on the history of data and information visualisation.

<span class="mw-page-title-main">David Donoho</span> American statistician

David Leigh Donoho is an American statistician. He is a professor of statistics at Stanford University, where he is also the Anne T. and Robert M. Bass Professor in the Humanities and Sciences. His work includes the development of effective methods for the construction of low-dimensional representations for high-dimensional data problems, development of wavelets for denoising and compressed sensing. He was elected a Member of the American Philosophical Society in 2019.

Peter J. Rousseeuw is a statistician known for his work on robust statistics and cluster analysis. He obtained his PhD in 1981 at the Vrije Universiteit Brussel, following research carried out at the ETH in Zurich, which led to a book on influence functions. Later he was professor at the Delft University of Technology, The Netherlands, at the University of Fribourg, Switzerland, and at the University of Antwerp, Belgium. Next he was a senior researcher at Renaissance Technologies. He then returned to Belgium as professor at KU Leuven, until becoming emeritus in 2022. His former PhD students include Annick Leroy, Hendrik Lopuhaä, Geert Molenberghs, Christophe Croux, Mia Hubert, Stefan Van Aelst, Tim Verdonck and Jakob Raymaekers.

In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables. The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable.

Robust Regression and Outlier Detection is a book on robust statistics, particularly focusing on the breakdown point of methods for robust regression. It was written by Peter Rousseeuw and Annick M. Leroy, and published in 1987 by Wiley.

<span class="mw-page-title-main">Homoscedasticity and heteroscedasticity</span> Statistical property

In statistics, a sequence of random variables is homoscedastic if all its random variables have the same finite variance. This is also known as homogeneity of variance. The complementary notion is called heteroscedasticity. The spellings homoskedasticity and heteroskedasticity are also frequently used.

References

  1. Marchak, Frank M.; Whitney, David A. (March 1990), "Dynamic graphics in the exploratory analysis of multivariate data", Behavior Research Methods, Instruments, & Computers, 22 (2): 176–178, doi: 10.3758/bf03203141
  2. Jewell, Nicholas P. (March 2007), "Correspondences between regression models for complex binary outcomes and those for structured multivariate survival analyses", in Nair, Vijay (ed.), Advances in Statistical Modeling and Inference: Essays in Honor of Kjell A. Doksum, pp. 45–64, doi:10.1142/9789812708298_0003
  3. Mosler, Karl (2013), "Depth statistics", in Becker, Claudia; Fried, Roland; Kuhnt, Sonja (eds.), Robustness and Complex Data Structures: Festschrift in Honour of Ursula Gather, Springer, pp. 17–34, arXiv: 1207.4988 , doi:10.1007/978-3-642-35494-6_2, MR   3135871
  4. Alumni, Harvard Statistics, retrieved 2021-05-07
  5. WorldCat entry for Testing Sequentially Selected Outliers from Linear Models, retrieved 2021-05-07
  6. 1 2 "Back matter", Statistical Science, 10 (4), November 1995, JSTOR   2246137
  7. Zuckerman, Sam (23 January 2003), "Technology sector not as gloomy", San Francisco Chronicle
  8. Honored IMS Fellows, Institute of Mathematical Statistics, retrieved 2021-05-07
  9. David Donoho named MacArthur Fellow, Stanford University, 18 June 1991, retrieved 2021-05-07