John D. Storey

Last updated
John D. Storey
NationalityAmerican
Alma mater Stanford University Ph.D. (2002)
Known for Q-value
Awards COPSS Presidents' Award (2015)
Mortimer Spiegelman Award (2015)
Scientific career
Fields Statistics
Statistical genetics
Genomics
Institutions Princeton University
Doctoral advisor Robert Tibshirani
Doctoral students Jeffrey T. Leek
Website storeylab.org

John D. Storey is the William R. Harman '63 and Mary-Love Harman Professor in Genomics at Princeton University. [1] His research is focused on statistical inference of high-dimensional data, particularly genomic data. Storey was the founding director of the Princeton University Center for Statistics and Machine Learning. [2]

Contents

Research

Storey's early research focused on the false discovery rate. At the time the false discovery rate had only been studied in the context of sequential p-value methods and it was not yet in widespread use. However, Storey showed that false discovery rates can be approached through point estimation [3] opening up this very active branch of statistics to false discovery rates. He simultaneously proved a result showing that the positive false discovery rate (pFDR) is exactly equal to a Bayesian posterior probability, thereby providing the first direct connection between false discovery rates and Bayesian theory. [4] In these works, he also invented the q-value, which is a false discovery rate analogue of the p-value. Storey then introduced false discovery rates and q-values as widely applicable measures of statistical significance in genomics, shifting the focus from false positive control to false discovery rate control. [5] With Jeff Leek, Storey discovered that "expression heterogeneity", or unmodeled sources of systematic variation in gene expression data, are very prevalent and need to be modeled and corrected when analyzing genome-wide gene expression data. [6] Leek and Storey introduced "surrogate variable analysis", which is a high-dimensional regression model that includes both known and unknown covariates. He has developed a number of methods for estimating this model. Recently, Storey has shifted his focus to population genomics, where he has introduced genome-wide models of allele frequencies, Hardy–Weinberg equilibrium, and F-statistics that hold under arbitrary population structures.

Honors and awards

Related Research Articles

Biostatistics are the development and application of statistical methods to a wide range of topics in biology. It encompasses the design of biological experiments, the collection and analysis of data from those experiments and the interpretation of the results.

In statistics, the false discovery rate (FDR) is a method of conceptualizing the rate of type I errors in null hypothesis testing when conducting multiple comparisons. FDR-controlling procedures are designed to control the FDR, which is the expected proportion of "discoveries" that are false. Equivalently, the FDR is the expected ratio of the number of false positive classifications to the total number of positive classifications. The total number of rejections of the null include both the number of false positives (FP) and true positives (TP). Simply put, FDR = FP /. FDR-controlling procedures provide less stringent control of Type I errors compared to family-wise error rate (FWER) controlling procedures, which control the probability of at least one Type I error. Thus, FDR-controlling procedures have greater power, at the cost of increased numbers of Type I errors.

<span class="mw-page-title-main">Microarray analysis techniques</span>

Microarray analysis techniques are used in interpreting the data generated from experiments on DNA, RNA, and protein microarrays, which allow researchers to investigate the expression state of a large number of genes - in many cases, an organism's entire genome - in a single experiment. Such experiments can generate very large amounts of data, allowing researchers to assess the overall state of a cell or organism. Data in such large quantities is difficult - if not impossible - to analyze without the help of computer programs.

<span class="mw-page-title-main">Multiple comparisons problem</span> Problem where one considers a set of inferences simultaneously based on the observed values

In statistics, the multiple comparisons, multiplicity or multiple testing problem occurs when one considers a set of statistical inferences simultaneously or infers a subset of parameters selected based on the observed values.

The COPSS Presidents' Award is given annually by the Committee of Presidents of Statistical Societies to a young statistician in recognition of outstanding contributions to the profession of statistics.

<span class="mw-page-title-main">T. Tony Cai</span> Chinese statistician

Tianwen Tony Cai is a Chinese statistician. He is the Daniel H. Silberberg Professor of Statistics and Vice Dean at the Wharton School of the University of Pennsylvania. He is also professor of Applied Math & Computational Science Graduate Group, and associate scholar at the Department of Biostatistics, Epidemiology & Informatics, Perelman School of Medicine, University of Pennsylvania. In 2008 Tony Cai was awarded the COPSS Presidents' Award.

Ziheng Yang FRS is a Chinese biologist. He holds the R.A. Fisher Chair of Statistical Genetics at University College London, and is the Director of R.A. Fisher Centre for Computational Biology at UCL. He was elected a Fellow of the Royal Society in 2006.

<span class="mw-page-title-main">Olga Troyanskaya</span> American academic

Olga G. Troyanskaya is a Professor in the Department of Computer Science and the Lewis-Sigler Institute for Integrative Genomics at Princeton University and the Deputy Director for Genomics at the Flatiron Institute's Center for Computational Biology in NYC. She studies protein function and interactions in biological pathways by analyzing genomic data using computational tools.

Jun S. Liu is a Chinese-American statistician focusing on Bayesian statistical inference and computational biology. He received the COPSS Presidents' Award in 2002. Liu is a professor in the Department of Statistics at Harvard University and has written many research papers and a book about Markov chain Monte Carlo algorithms, including their applications in biology. He is also co-author of the Tmod software for sequence motif discovery.

Michael Abbott Newton is a Canadian statistician. He is a Professor in the Department of Statistics and the Department of Biostatistics and Medical Informatics at the University of Wisconsin–Madison, and he received the COPSS Presidents' Award in 2004. He has written many research papers about the statistical analysis of cancer biology, including linkage analysis and signal identification.

<span class="mw-page-title-main">Larry A. Wasserman</span> Canadian statistician

Larry Alan Wasserman is a Canadian-American statistician and a professor in the Department of Statistics & Data Science and the Machine Learning Department at Carnegie Mellon University.

Xihong Lin is a Chinese-American statistician known for her contributions to mixed models, nonparametric and semiparametric regression, and statistical genetics and genomics. As of 2015, she is the Henry Pickering Walcott Professor and Chair of the Department of Biostatistics at Harvard T.H. Chan School of Public Health and Coordinating Director of the Program in Quantitative Genomics.

Raymond James Carroll is an American statistician, and Distinguished Professor of statistics, nutrition and toxicology at Texas A&M University. He is a recipient of 1988 COPSS Presidents' Award and 2002 R. A. Fisher Lectureship. He has made fundamental contributions to measurement error model, nonparametric and semiparametric modeling.

Mark Johannes van der Laan is the Jiann-Ping Hsu/Karl E. Peace Professor of Biostatistics and Statistics at the University of California, Berkeley. He has made contributions to survival analysis, semiparametric statistics, multiple testing, and causal inference. He also developed the targeted maximum likelihood estimation methodology. He is a founding editor of the Journal of Causal Inference.

David Brian Dunson is an American statistician who is Arts and Sciences Distinguished Professor of Statistical Science, Mathematics and Electrical & Computer Engineering at Duke University. His research focuses on developing statistical methods for complex and high-dimensional data. Particular themes of his work include the use of Bayesian hierarchical models, methods for learning latent structure in complex data, and the development of computationally efficient algorithms for uncertainty quantification. He is currently serving as joint Editor of the Journal of the Royal Statistical Society, Series B.

<span class="mw-page-title-main">Nilanjan Chatterjee</span> Biostatistician

Nilanjan Chatterjee is a Bloomberg Distinguished Professor of Biostatistics and Genetic Epidemiology at Johns Hopkins University, with appointments in the Department of Biostatistics in the Bloomberg School of Public Health and in the Department of Oncology in the Sidney Kimmel Comprehensive Cancer Center in the Johns Hopkins School of Medicine. He was formerly the chief of the Biostatistics Branch of the National Cancer Institute's Division of Cancer Epidemiology and Genetics.

Sudipto Banerjee is an Indian-American statistician best known for his work on Bayesian hierarchical modeling and inference for spatial data analysis. He is Professor and Chair of the Department of Biostatistics in the School of Public Health at the University of California, Los Angeles. He served as the 2022 President of the International Society for Bayesian Analysis.

<span class="mw-page-title-main">Jeffrey T. Leek</span> American biostatistician

Jeffrey Tullis Leek is an American biostatistician and data scientist working as a Vice President, Chief Data Officer, and Professor at Fred Hutchinson Cancer Research Center. He is an author of the Simply Statistics blog, and runs several online courses through Coursera, as part of their Data Science Specialization. His most popular course is The Data Scientist's Toolbox, which he instructed along with Roger Peng and Brian Caffo. Leek is best known for his contributions to genomic data analysis and critical view of research and the accuracy of popular statistical methods.

<i>q</i>-value (statistics) Statistical hypothesis testing measure

In statistical hypothesis testing, specifically multiple hypothesis testing, the q-value in the Storey-Tibshirani procedure provides a means to control the positive false discovery rate (pFDR). Just as the p-value gives the expected false positive rate obtained by rejecting the null hypothesis for any result with an equal or smaller p-value, the q-value gives the expected pFDR obtained by rejecting the null hypothesis for any result with an equal or smaller q-value.

Hilary S. Parker is an American biostatistician and data scientist. She was formerly a senior data analyst at the fashion merchandising company Stitch Fix. Parker co-hosts the data analytics podcast Not So Standard Deviations with Roger Peng. She received her PhD in biostatistics from the Johns Hopkins Bloomberg School of Public Health and has formerly been employed by Etsy.

References

  1. "Faculty chosen for endowed professorships". News, Office of Communications, Princeton University. October 8, 2014.
  2. "Storey to head new Center for Statistics and Machine Learning".
  3. Storey, John D. (2002). "A direct approach to false discovery rates". Journal of the Royal Statistical Society, Series B (Statistical Methodology). 64 (3): 479–498. CiteSeerX   10.1.1.320.7131 . doi:10.1111/1467-9868.00346. S2CID   122987911.
  4. Storey, John D. (2003). "The positive false discovery rate: a Bayesian interpretation and the q-value". The Annals of Statistics. 31 (6): 2013–2035. doi: 10.1214/aos/1074290335 .
  5. Storey, John D.; Tibshirani, Robert (2003). "Statistical significance for genomewide studies". PNAS. 100 (16): 9440–9445. Bibcode:2003PNAS..100.9440S. doi: 10.1073/pnas.1530509100 . PMC   170937 . PMID   12883005.
  6. Leek, Jeff; Storey, John (2007-09-28). "Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis". PLOS Genetics. 3 (9): 1724–35. doi: 10.1371/journal.pgen.0030161 . PMC   1994707 . PMID   17907809.
  7. "FACULTY AWARD: Six professors named 2011 AAAS fellows".
  8. "IMS Fellows announced « IMS Bulletin".
  9. "Storey receives COPSS Presidents' Award for outstanding statisticians 40 or younger".
  10. "FACULTY AWARD: Storey receives Mortimer Spiegelman Award for health statisticians under 40".