Jeffrey T. Leek

Last updated
Jeffrey T. Leek
Jeffrey Leek in 2017 (cropped).jpg
Education Utah State University (B.S.)
University of Washington (Ph.D., M.S.)
Known forBiostatistics and Data Science
Scientific career
Fields Biostatistics
Institutions Fred Hutchinson Cancer Research Center
Doctoral advisor John D. Storey
Doctoral students Hilary S. Parker

Jeffrey Tullis Leek is an American biostatistician and data scientist working as a Vice President, Chief Data Officer, and Professor at Fred Hutchinson Cancer Research Center. [1] [2] He is an author of the Simply Statistics blog, and runs several online courses through Coursera, as part of their Data Science Specialization. [3] [4] [5] His most popular course is The Data Scientist's Toolbox, [6] which he instructed along with Roger Peng and Brian Caffo. Leek is best known for his contributions to genomic data analysis and critical view of research and the accuracy of popular statistical methods.

Contents

Education

Leek graduated from Utah State University in 2003 with his Bachelors of Science. Then went on to study at the University of Washington achieving a Master's degree in 2005 and completed a PhD in Biostatistics in 2007 with John Storey as his doctoral advisor. [2] [7]

Research and career

Leek joined Johns Hopkins University as an assistant professor in Biostatistics in 2009, working at the Bloomberg School of Public Health. In 2014 he became an associate professor in Biostatistics and Oncology. [8]

Leek works in The Center for Computational Biology [9] at Johns Hopkins University creating statistical packages [10] [11] for analysis of genomes.

He also co-edits a blog, Simply Statistics [12] with Roger Peng and Rafa Irizarry, which contains a mix of articles on statistics and meta-research.

Leek has conducted several talks at prestigious universities and locations such as a colloquium series at Harvard [13] and a lecture at the New York Genome Center titled “Building a Comprehensive Resource for the Study of Human Gene Expression with Machine Learning and Data Science” [14] as a part of their lecture series.

He is an expert in reproducibility, and his work and opinions have been published in notable scientific and medical journals such as Nature [15] [16] and the Proceedings of the National Academy of Sciences. Leek wrote a self-published book, The Elements of Data Analytic Style and is considered an expert on replication. [17] [18]

He is currently Vice President and Chief Data Officer at Fred Hutchinson Cancer Center in Seattle, WA. [2]

Recognition

Leek was elected as a Fellow of the American Statistical Association in 2020. [19]

Selected publications

Leek's highly cited works include

Related Research Articles

Biostatistics are the development and application of statistical methods to a wide range of topics in biology. It encompasses the design of biological experiments, the collection and analysis of data from those experiments and the interpretation of the results.

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combines biology, chemistry, physics, computer science, information engineering, mathematics and statistics to analyze and interpret the biological data. Bioinformatics has been used for in silico analyses of biological queries using computational and statistical techniques.

<span class="mw-page-title-main">Computational biology</span> Branch of biology

Computational biology refers to the use of data analysis, mathematical modeling and computational simulations to understand biological systems and relationships. An intersection of computer science, biology, and big data, the field also has foundations in applied mathematics, chemistry, and genetics. It differs from biological computing, a subfield of computer engineering which uses bioengineering to build computers.

Computational genomics refers to the use of computational and statistical analysis to decipher biology from genome sequences and related data, including both DNA and RNA sequence as well as other "post-genomic" data. These, in combination with computational and statistical approaches to understanding the function of the genes and statistical association analysis, this field is also often referred to as Computational and Statistical Genetics/genomics. As such, computational genomics may be regarded as a subset of bioinformatics and computational biology, but with a focus on using whole genomes to understand the principles of how the DNA of a species controls its biology at the molecular level and beyond. With the current abundance of massive biological datasets, computational studies have become one of the most important means to biological discovery.

<span class="mw-page-title-main">Steven Salzberg</span> American biologist and computer scientist

Steven Lloyd Salzberg is an American computational biologist and computer scientist who is a Bloomberg Distinguished Professor of Biomedical Engineering, Computer Science, and Biostatistics at Johns Hopkins University, where he is also Director of the Center for Computational Biology.

<span class="mw-page-title-main">Microarray analysis techniques</span>

Microarray analysis techniques are used in interpreting the data generated from experiments on DNA, RNA, and protein microarrays, which allow researchers to investigate the expression state of a large number of genes - in many cases, an organism's entire genome - in a single experiment. Such experiments can generate very large amounts of data, allowing researchers to assess the overall state of a cell or organism. Data in such large quantities is difficult - if not impossible - to analyze without the help of computer programs.

<span class="mw-page-title-main">John Quackenbush</span> American bioinformatician

John Quackenbush is an American computational biologist and genome scientist. He is a professor of biostatistics and computational biology and a professor of cancer biology at the Dana–Farber Cancer Institute (DFCI), as well as the director of its Center for Cancer Computational Biology (CCCB). Quackenbush also holds an appointment as a professor of computational biology and bioinformatics in the Department of Biostatistics at the Harvard School of Public Health.

Brian Caffo is a professor in the Department of Biostatistics at the Johns Hopkins Bloomberg School of Public Health. He graduated from the Department of Statistics at the University of Florida in 2001, and from the Department of Mathematics at UF in 1995. His doctoral advisor was James G. Booth. He works in the fields of computational statistics and neuroinformatics and co-created the SMART working group. He has been the recipient of the Presidential Early Career Award for Scientists and Engineers, Johns Hopkins Bloomberg School of Public Health Golden Apple and AMTRA teaching awards.

<span class="mw-page-title-main">Nilanjan Chatterjee</span> Biostatistician

Nilanjan Chatterjee is a Bloomberg Distinguished Professor of Biostatistics and Genetic Epidemiology at Johns Hopkins University, with appointments in the Department of Biostatistics in the Bloomberg School of Public Health and in the Department of Oncology in the Sidney Kimmel Comprehensive Cancer Center in the Johns Hopkins School of Medicine. He was formerly the chief of the Biostatistics Branch of the National Cancer Institute's Division of Cancer Epidemiology and Genetics.

<span class="mw-page-title-main">Rafael Irizarry (scientist)</span> American professor of biostatistics

Rafael Irizarry is a professor of biostatistics at the Harvard T.H. Chan School of Public Health and professor of biostatistics and computational biology at the Dana–Farber Cancer Institute. Irizarry is known as one of the founders of the Bioconductor project.

<span class="mw-page-title-main">Mark Boguski</span> American pathologist (died 2021)

Mark S. Boguski was an American pathologist specializing in computational analysis and structural biology, He was elected in 2001 to the U.S. National Academy of Medicine, and was a Fellow of the American College of Medical Informatics.

Mei-Ling Ting Lee is a Taiwanese-American biostatistician known for her research on microarrays. She is a professor of epidemiology and biostatistics at the University of Maryland, College Park, and the founding editor-in-chief of the journal Lifetime Data Analysis. She was president of the International Chinese Statistical Association for 2016.

Jean Yee Hwa Yang is an Australian statistician known for her work on variance reduction for microarrays, and for inferring proteins from mass spectrometry data. Yang is a Professor in the School of Mathematics and Statistics at the University of Sydney.

Jennifer "Jenny" Bryan is a data scientist and an associate professor of statistics at the University of British Columbia where she developed the Master in Data Science Program. She is a statistician and software engineer at RStudio from Vancouver, Canada and is known for creating open source tools which connect R to Google Sheets and Google Drive.

John D. Storey is the William R. Harman '63 and Mary-Love Harman Professor in Genomics at Princeton University. His research is focused on statistical inference of high-dimensional data, particularly genomic data. Storey was the founding director of the Princeton University Center for Statistics and Machine Learning.

In molecular biology, a batch effect occurs when non-biological factors in an experiment cause changes in the data produced by the experiment. Such effects can lead to inaccurate conclusions when their causes are correlated with one or more outcomes of interest in an experiment. They are common in many types of high-throughput sequencing experiments, including those using microarrays, mass spectrometers, and single-cell RNA-sequencing data. They are most commonly discussed in the context of genomics and high-throughput sequencing research, but they exist in other fields of science as well.

Roger D. Peng is an author and professor of Statistics and Data Science at the University of Texas at Austin. Peng originally received a Bachelor of Science in Applied Mathematics from Yale University in 1999, before going on to study at the University of California, Los Angeles, where he completed a Master of Science in Statistics in 2001 and a PhD in Statistics in 2003. The focus of his research has been on environmental health, specifically focusing on air pollution and climate change in his research. Peng is also a software engineer who has authored numerous R packages focused on applying statistical methods necessary for a variety of topics. He has also created numerous resources including books, online courses, podcasts, blogs, and other articles to aid those learning data analysis.

Katherine Snowden Pollard is the Director of the Gladstone Institute of Data Science and Biotechnology and a professor at the University of California, San Francisco (UCSF). She is a Chan Zuckerberg Biohub Investigator. She was awarded Fellowship of the International Society for Computational Biology in 2020 and the American Institute for Medical and Biological Engineering in 2021 for outstanding contributions to computational biology and bioinformatics.

Hilary S. Parker is an American biostatistician and data scientist. She was formerly a senior data analyst at the fashion merchandising company Stitch Fix. Parker co-hosts the data analytics podcast Not So Standard Deviations with Roger Peng. She received her PhD in biostatistics from the Johns Hopkins Bloomberg School of Public Health and has formerly been employed by Etsy.

Eric R. Gamazon is a statistical geneticist in Vanderbilt University, with faculty affiliations in the Division of Genetic Medicine, Data Science Institute, and Center for Precision Medicine. He is a Life Member of Clare Hall, Cambridge University after election to a Visiting Fellowship (2018).

References

  1. "Dr. Jeffrey Leek named VP and Chief Data Officer".
  2. 1 2 3 "Jeff Leek, Ph.D." Fred Hutch. Retrieved 2023-02-19.
  3. "About". Simply Statistics.
  4. Diane Peters (2018-02-22). "MOOCs are not dead, but evolving". University Affairs.
  5. Steven Salzberg (2015-04-13). "How Disruptive Are MOOCs? Hopkins Genomics MOOC Launches In June". Forbes.
  6. "Coursera - Data Scientists Toolbox".
  7. "Simply Statistics: Interview with COPSS award Winner John Storey". Simply Statistics. Retrieved 2023-02-19.
  8. "Jeff Leek". LinkedIn.
  9. "Center for Computational Biology". Johns Hopkins University.
  10. "Software developed by Jeffrey Leek".
  11. "Software developed by The Center for Computation Biology".
  12. "Simply Statistics".
  13. "What Can 20,000+ RNA-seq Samples Tell Us About How Much Of The Genome Is Transcribed?". Harvard Colloquium Seminar. 16 February 2016.
  14. Jeff Leek. "Building a Comprehensive Resource for the Study of Human Gene Expression with Machine Learning and Data Science". New York Genome Center Lecture.
  15. Leek, Jeff; Peng, Roger (2015-04-28). "Statistics: P values are just the tip of the iceberg". Nature. 520 (7549): 612. Bibcode:2015Natur.520..612L. doi: 10.1038/520612a . PMID   25925460. S2CID   4465756.
  16. Leek, Jeff; McShane, Blakeley; Gelman, Andrew; Colquhoun, David; Nuijten, Michele; Goodman, Steven (2017-11-28). "Five Ways to Fix Statistics". Nature. 551 (7682): 557–559. doi: 10.1038/d41586-017-07522-z .
  17. The Elements of Data Analytic Style. Leanpub. 20 February 2014.
  18. Karen Nitkin (2017-11-07). "Could you repeat that? Fixing the 'replication crisis' in biomedical research has become top priority". Hub.
  19. "ASA Fellows list". American Statistical Association. Retrieved 2020-06-01.
  20. Leek, Jeff; Storey, John (2007-09-28). "Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis". PLOS Genetics. 3 (9): 1724–35. doi: 10.1371/journal.pgen.0030161 . PMC   1994707 . PMID   17907809. S2CID   151500.
  21. Leek, Jeff; Scharpf, Robert; Corrado Bravo, Hector; Simcha, David; Langmead, Benjamin; Johnson, Evan; Geman, Donald; Baggerly, Keith; Irizarry, Rafael (2010-10-01). "Tackling the Widespread and Critical Impact of Batch Effects in High-Throughput Data". Nature Reviews Genetics. 11 (10): 733–9. doi:10.1038/nrg2825. PMC   3880143 . PMID   20838408.