Mia Hubert

Last updated

Mia Hubert is a Belgian mathematical statistician known for her research on topics in robust statistics including medoid-based clustering, [a] regression depth, [b] the medcouple for robustly measuring skewness, [c] box plots for skewed data, [f] and robust principal component analysis, [d] and for her implementations of robust statistical algorithms in the R statistical software system, MATLAB, [e] and S-PLUS. [a] She is a professor in the statistics and data science section of the department of mathematics at KU Leuven. [1]

Contents

Education and career

Hubert earned a diploma in mathematics in 1992 from the University of Antwerp, [2] and obtained her Ph.D. in 1997 at the same university. Her dissertation, Robust Regression for Data Analysis, was supervised by Peter Rousseeuw. [3] She joined the KU Leuven faculty in 2001. [2]

She was the original developer of the R package cluster along with Peter Rousseeuw and Anja Struyf. [4]

Recognition

Hubert became an Elected Member of the International Statistical Institute in 2013. [2] [5]

Selected publications

Related Research Articles

<span class="mw-page-title-main">Median</span> Middle quantile of a data set or probability distribution

In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value. The basic feature of the median in describing data compared to the mean is that it is not skewed by a small proportion of extremely large or small values, and therefore provides a better representation of a "typical" value. Median income, for example, may be a better way to suggest what a "typical" income is, because income distribution can be very skewed. The median is of central importance in robust statistics, as it is the most resistant statistic, having a breakdown point of 50%: so long as no more than half the data are contaminated, the median is not an arbitrarily large or small result.

Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable. Multivariate statistics concerns understanding the different aims and background of each of the different forms of multivariate analysis, and how they relate to each other. The practical application of multivariate statistics to a particular problem may involve several types of univariate and multivariate analyses in order to understand the relationships between variables and their relevance to the problem being studied.

Statistics is a field of inquiry that studies the collection, analysis, interpretation, and presentation of data. It is applicable to a wide variety of academic disciplines, from the physical and social sciences to the humanities; it is also used and misused for making informed decisions in all areas of business and government.

In statistics, exploratory data analysis (EDA) is an approach of analyzing data sets to summarize their main characteristics, often using statistical graphics and other data visualization methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling and thereby contrasts traditional hypothesis testing. Exploratory data analysis has been promoted by John Tukey since 1970 to encourage statisticians to explore the data, and possibly formulate hypotheses that could lead to new data collection and experiments. EDA is different from initial data analysis (IDA), which focuses more narrowly on checking assumptions required for model fitting and hypothesis testing, and handling missing values and making transformations of variables as needed. EDA encompasses IDA.

SUDAAN is a proprietary statistical software package for the analysis of correlated data, including correlated data encountered in complex sample surveys. SUDAAN originated in 1972 at RTI International. Individual commercial licenses are sold for $1,460 a year, or $3,450 permanently.

In robust statistics, robust regression seeks to overcome some limitations of traditional regression analysis. A regression analysis models the relationship between one or more independent variables and a dependent variable. Standard types of regression, such as ordinary least squares, have favourable properties if their underlying assumptions are true, but can give misleading results otherwise. Robust regression methods are designed to limit the effect that violations of assumptions by the underlying data-generating process have on regression estimates.

Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal. Robust statistical methods have been developed for many common problems, such as estimating location, scale, and regression parameters. One motivation is to produce statistical methods that are not unduly affected by outliers. Another motivation is to provide methods with good performance when there are small departures from a parametric distribution. For example, robust methods work well for mixtures of two normal distributions with different standard deviations; under this model, non-robust methods like a t-test work poorly.

<span class="mw-page-title-main">John Nelder</span> British statistician

John Ashworth Nelder was a British statistician known for his contributions to experimental design, analysis of variance, computational statistics, and statistical theory.

The topic of heteroskedasticity-consistent (HC) standard errors arises in statistics and econometrics in the context of linear regression and time series analysis. These are also known as heteroskedasticity-robust standard errors, Eicker–Huber–White standard errors, to recognize the contributions of Friedhelm Eicker, Peter J. Huber, and Halbert White.

In statistics, a generalized estimating equation (GEE) is used to estimate the parameters of a generalized linear model with a possible unmeasured correlation between observations from different timepoints. Although some believe that Generalized estimating equations are robust in everything even with the wrong choice of working-correlation matrix, Generalized estimating equations are only robust to loss of consistency with the wrong choice.

Peter J. Rousseeuw is a statistician known for his work on robust statistics and cluster analysis. He obtained his PhD in 1981 at the Vrije Universiteit Brussel, following research carried out at the ETH in Zurich, which led to a book on influence functions. Later he was professor at the Delft University of Technology, The Netherlands, at the University of Fribourg, Switzerland, and at the University of Antwerp, Belgium. Next he was a senior researcher at Renaissance Technologies. He then returned to Belgium as professor at KU Leuven, until becoming emeritus in 2022. His former PhD students include Annick Leroy, Hendrik Lopuhaä, Geert Molenberghs, Christophe Croux, Mia Hubert, Stefan Van Aelst, Tim Verdonck and Jakob Raymaekers.

Pranab Kumar Sen is a statistician, a professor of statistics and the Cary C. Boshamer Professor of Biostatistics at the University of North Carolina at Chapel Hill.

<span class="mw-page-title-main">Jacqueline Meulman</span> Dutch statistician

Jacqueline Meulman is a Dutch statistician and professor emerita of Applied Statistics at the Mathematical Institute of Leiden University.

Aurore Delaigle is a Professor and ARC Future Fellow in the Department of Mathematics and Statistics at the University of Melbourne, Australia. Her research interests include nonparametric statistics, deconvolution and functional data analysis.

Gerda Claeskens is a Belgian statistician. She is a professor of statistics in the Faculty of Economics and Business at KU Leuven, associated with the KU Research Centre for Operations Research and Business Statistics (ORSTAT).

Ingrid Van Keilegom is a Belgian statistician. She is a professor of operations research and business statistics at KU Leuven, and an extraordinary professor at the Université catholique de Louvain. Her research interests include survival analysis, observational error, econometrics, and nonparametric statistics.

Robust Regression and Outlier Detection is a book on robust statistics, particularly focusing on the breakdown point of methods for robust regression. It was written by Peter Rousseeuw and Annick M. Leroy, and published in 1987 by Wiley.

<span class="mw-page-title-main">Rousseeuw Prize for Statistics</span> Statistical research award

The Rousseeuw Prize for Statistics awards innovations in statistical research with impact on society. This biennial prize is awarded in even years, and consists of a medal, a certificate, and a monetary reward of US$1,000,000, similar to the Nobel Prize in other disciplines. The home institution of the Prize is the King Baudouin Foundation (KBF) in Belgium, which appoints the international jury and carries out the selection procedure. The award money comes from the Rousseeuw Foundation created by the statistician Peter Rousseeuw.

References

  1. "Mia Hubert", KU Leuven Who's Who, KU Leuven, retrieved 2019-12-14
  2. 1 2 3 Curriculum vitae (PDF), March 21, 2017, retrieved 2019-12-14
  3. Mia Hubert at the Mathematics Genealogy Project
  4. cluster: "Finding Groups in Data": Cluster Analysis Extended Rousseeuw et al., 2021-04-17, retrieved 2021-05-27
  5. Individual members, International Statistical Institute, retrieved 2019-12-14