Yingying Fan

Last updated

Yingying Fan is a Chinese-American statistician and Centennial Chair in Business Administration and Professor in Data Sciences and Operations Department of the Marshall School of Business at the University of Southern California. [1] She is currently the Associate Dean for the PhD Program at USC Marshall. She also holds joint appointments at the USC Dana and David Dornsife College of Letters, Arts and Sciences, and Keck Medicine of USC. Her contributions to statistics and data science were recognized by the Royal Statistical Society Guy Medal in Bronze in 2017 [2] and the Institute of Mathematical Statistics Medallion Lecture in 2023. [3] She was elected Fellow of American Statistical Association in 2019 [4] and Fellow of Institute of Mathematical Statistics for seminal contributions to high-dimensional inference, variable selection, classification, networks, and nonparametric methodology, particularly in the field of financial econometrics, and for conscientious professional service in 2020. [5]

Fan, along with her collaborators, has developed some popular statistical and data science tools including the generalized information criterion (GIC), the model-X knockoffs (MXK), the deep learning inference using knockoffs (DeepLINK), and the statistical inference on membership profiles in large networks (SIMPLE) as well as some fundamental asymptotic theory for the eigenvectors of large random matrices and high-dimensional random forests.

Some of her representative publications include:

Related Research Articles

<span class="mw-page-title-main">Design of experiments</span> Design of tasks

The design of experiments, also known as experiment design or experimental design, is the design of any task that aims to describe and explain the variation of information under conditions that are hypothesized to reflect the variation. The term is generally associated with experiments in which the design introduces conditions that directly affect the variation, but may also refer to the design of quasi-experiments, in which natural conditions that influence the variation are selected for observation.

<span class="mw-page-title-main">Statistical inference</span> Process of using data analysis

Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability. Inferential statistical analysis infers properties of a population, for example by testing hypotheses and deriving estimates. It is assumed that the observed data set is sampled from a larger population.

The likelihood function is the joint probability mass of observed data viewed as a function of the parameters of a statistical model. Intuitively, the likelihood function is the probability of observing data assuming is the actual parameter.

The Akaike information criterion (AIC) is an estimator of prediction error and thereby relative quality of statistical models for a given set of data. Given a collection of models for the data, AIC estimates the quality of each model, relative to each of the other models. Thus, AIC provides a means for model selection.

<span class="mw-page-title-main">Mathematical statistics</span> Branch of statistics

Mathematical statistics is the application of probability theory, a branch of mathematics, to statistics, as opposed to techniques for collecting statistical data. Specific mathematical techniques which are used for this include mathematical analysis, linear algebra, stochastic analysis, differential equations, and measure theory.

Functional data analysis (FDA) is a branch of statistics that analyses data providing information about curves, surfaces or anything else varying over a continuum. In its most general form, under an FDA framework, each sample element of functional data is considered to be a random function. The physical continuum over which these functions are defined is often time, but may also be spatial location, wavelength, probability, etc. Intrinsically, functional data are infinite dimensional. The high intrinsic dimensionality of these data brings challenges for theory as well as computation, where these challenges vary with how the functional data were sampled. However, the high or infinite dimensional structure of the data is a rich source of information and there are many interesting challenges for research and data analysis.

Bootstrapping is any test or metric that uses random sampling with replacement, and falls under the broader class of resampling methods. Bootstrapping assigns measures of accuracy to sample estimates. This technique allows estimation of the sampling distribution of almost any statistic using random sampling methods.

In statistical theory, the field of high-dimensional statistics studies data whose dimension is larger than typically considered in classical multivariate analysis. The area arose owing to the emergence of many modern data sets in which the dimension of the data vectors may be comparable to, or even larger than, the sample size, so that justification for the use of traditional techniques, often based on asymptotic arguments with the dimension held fixed as the sample size increased, was lacking.

<span class="mw-page-title-main">David A. Freedman</span>

David Amiel Freedman was Professor of Statistics at the University of California, Berkeley. He was a distinguished mathematical statistician whose wide-ranging research included the analysis of martingale inequalities, Markov processes, de Finetti's theorem, consistency of Bayes estimators, sampling, the bootstrap, and procedures for testing and evaluating models. He published extensively on methods for causal inference and the behavior of standard statistical models under non-standard conditions – for example, how regression models behave when fitted to data from randomized experiments. Freedman also wrote widely on the application—and misapplication—of statistics in the social sciences, including epidemiology, public policy, and law.

<span class="mw-page-title-main">Jayanta Kumar Ghosh</span>

Jayanta Kumar Ghosh was an Indian statistician, an emeritus professor at Indian Statistical Institute and a professor of statistics at Purdue University.

<span class="mw-page-title-main">Richard Samworth</span> British statistician

Richard John Samworth is the Professor of Statistical Science and the Director of the Statistical Laboratory, University of Cambridge, and a Teaching Fellow of St John's College, Cambridge. He was educated at St John's College, Cambridge. His main research interests are in nonparametric and high-dimensional statistics. Particular topics include shape-constrained density estimation and other nonparametric function estimation problems, nonparametric classification, clustering and regression, the bootstrap and high-dimensional variable selection problems.

In statistics, asymptotic theory, or large sample theory, is a framework for assessing properties of estimators and statistical tests. Within this framework, it is often assumed that the sample size n may grow indefinitely; the properties of estimators and tests are then evaluated under the limit of n → ∞. In practice, a limit evaluation is considered to be approximately valid for large finite sample sizes too.

<span class="mw-page-title-main">Julian Besag</span> British statistician (1945–2010)

Julian Ernst Besag FRS was a British statistician known chiefly for his work in spatial statistics, and Bayesian inference.

<span class="mw-page-title-main">Nancy Reid</span> Canadian statistician

Nancy Margaret Reid is a Canadian theoretical statistician. She is a professor at the University of Toronto where she holds a Canada Research Chair in Statistical Theory. In 2015 Reid became Director of the Canadian Institute for Statistical Sciences.

Grace Yun Yi is a professor of the University of Western Ontario where she currently holds a Tier I Canada Research Chair in Data Science. She was a professor at the University of Waterloo, Canada, where she holds a University Research Chair in Statistical and Actuarial Science. Her research concerns event history analysis with missing data and its applications in medicine, engineering, and social science.

Kathryn M. Roeder is an American statistician known for her development of statistical methods to uncover the genetic basis of complex disease and her contributions to mixture models, semiparametric inference, and multiple testing. Roeder holds positions as professor of statistics and professor of computational biology at Carnegie Mellon University, where she leads a project focused on discovering genes associated with autism.

<span class="mw-page-title-main">Sofia Olhede</span> British-Swedish mathematical statistician

Sofia Charlotta Olhede is a British-Swedish mathematical statistician known for her research on wavelets, graphons, and high-dimensional statistics and for her columns on algorithmic bias. She is a professor of statistical science at the EPFL.

Rina Foygel Barber is an American statistician whose research includes works on the Bayesian statistics of graphical models, false discovery rates, and regularization. She is the Louis Block Professor of statistics at the University of Chicago.

<span class="mw-page-title-main">Peter Bühlmann</span> Swiss mathematician

Peter Lukas Bühlmann is a Swiss mathematician and statistician.

In statistics, the knockoff filter, or simply knockoffs, is a framework for variable selection. It was originally introduced for linear regression by Rina Barber and Emmanuel Candès, and later generalized to other regression models in the random design setting. Knockoffs has found application in many practical areas, notably in genome-wide association studies.

References

  1. "Yingying Fan | USC Marshall". www.marshall.usc.edu.
  2. "RSS announces honours for 2017 | StatsLife". www.statslife.org.uk.
  3. "Institute of Mathematical Statistics | Honored Special Awards & Lecturers Recipient List".
  4. "The 2019 ASA Fellows" (PDF). Retrieved 2023-08-31.
  5. "Institute of Mathematical Statistics | Congratulations to the 2020 IMS Fellows!".