Robert Gentleman (statistician)

Last updated
Robert Gentleman
Born
Robert Clifford Gentleman
Alma mater University of Washington
University of British Columbia
Known for R (programming language)
Awards Benjamin Franklin Award (Bioinformatics)
Scientific career
Institutions Genentech
University of Washington
Harvard Medical School
University of Waterloo
The University of Auckland
Thesis Exploratory methods for censored data  (1988)
Doctoral advisor John James Crowley [1]

Robert Clifford Gentleman (born 1959) is a Canadian statistician and bioinformatician [2] who is currently the founding executive director of the Center for Computational Biomedicine at Harvard Medical School. He was previously the vice president of computational biology at 23andMe. [3] [4] Gentleman is recognized, along with Ross Ihaka, as one of the originators of the R programming language [5] [6] and the Bioconductor project. [7] [8]

Contents

Education

Gentleman was awarded a Bachelor of Science degree in mathematics from the University of British Columbia. [3] He was awarded a Ph.D. degree in statistics from University of Washington in 1988; his thesis title was Exploratory methods for censored data. [9]

Research and career

Gentleman worked as a statistics professor at the University of Auckland in the mid-1990s, where he developed the R programming language alongside Ross Ihaka. [5] [10] In 2001, he started work on the Bioconductor project to promote the development of open-source tools for bioinformatics and computational biology. In 2009, Gentleman joined the Genentech biotechnology corporation, where he worked as a senior director in bioinformatics and computational biology. [11] [12] Gentleman joined personal genomics and biotechnology company 23andMe as vice president in April 2015, [3] with the goal of bringing expertise on bioinformatics and computational drug discovery to the company. [4] Gentleman has also served on the board of the statistical software company Revolution Analytics (formerly known as REvolution Computing). [10]

Awards and honors

Gentleman won the Benjamin Franklin Award in 2008, recognising his work on the R programming language, the Bioconductor project and his commitment to data and methods sharing. [13] He was made a Fellow of the International Society for Computational Biology in 2014 for his contribution to computational biology and bioinformatics. [14] He became a fellow of the American Statistical Association in 2017. [15]

Related Research Articles

In bioinformatics and biochemistry, the FASTA format is a text-based format for representing either nucleotide sequences or amino acid (protein) sequences, in which nucleotides or amino acids are represented using single-letter codes.

Computational science, also known as scientific computing, technical computing or scientific computation (SC), is a division of science that uses advanced computing capabilities to understand and solve complex physical problems. This includes

Bioconductor is a free, open source and open development software project for the analysis and comprehension of genomic data generated by wet lab experiments in molecular biology.

<span class="mw-page-title-main">Microarray analysis techniques</span>

Microarray analysis techniques are used in interpreting the data generated from experiments on DNA, RNA, and protein microarrays, which allow researchers to investigate the expression state of a large number of genes – in many cases, an organism's entire genome – in a single experiment. Such experiments can generate very large amounts of data, allowing researchers to assess the overall state of a cell or organism. Data in such large quantities is difficult – if not impossible – to analyze without the help of computer programs.

<span class="mw-page-title-main">Carole Goble</span> British computer scientist

Carole Anne Goble, is a British academic who is Professor of Computer Science at the University of Manchester. She is principal investigator (PI) of the myGrid, BioCatalogue and myExperiment projects and co-leads the Information Management Group (IMG) with Norman Paton.

Mark Bender Gerstein is an American scientist working in bioinformatics and Data Science. As of 2009, he is co-director of the Yale Computational Biology and Bioinformatics program.

Within computational biology, an MA plot is an application of a Bland–Altman plot for visual representation of genomic data. The plot visualizes the differences between measurements taken in two samples, by transforming the data onto M and A scales, then plotting these values. Though originally applied in the context of two channel DNA microarray gene expression data, MA plots are also used to visualise high-throughput sequencing analysis.

<span class="mw-page-title-main">Ross Ihaka</span> New Zealand statistician

George Ross Ihaka is a New Zealand statistician who was an associate professor of statistics at the University of Auckland until his retirement in 2017. Alongside Robert Gentleman, he is one of the creators of the R programming language. In 2008, Ihaka received the Pickering Medal, awarded by the Royal Society of New Zealand, for his work on R.

<span class="mw-page-title-main">Lawrence Hunter</span>

Lawrence E. Hunter is a Professor and Director of the Center for Computational Pharmacology and of the Computational Bioscience Program at the University of Colorado School of Medicine and Professor of Computer Science at the University of Colorado Boulder. He is an internationally known scholar, focused on computational biology, knowledge-driven extraction of information from the primary biomedical literature, the semantic integration of knowledge resources in molecular biology, and the use of knowledge in the analysis of high-throughput data, as well as for his foundational work in computational biology, which led to the genesis of the major professional organization in the field and two international conferences.

<span class="mw-page-title-main">Richard Scheller</span> American neuroscientist

Richard H. Scheller is the former Chief Science Officer and Head of Therapeutics at 23andMe and the former Executive Vice President of Research and Early Development at Genentech. He was a professor at Stanford University from 1982 to 2001 before joining Genentech. He has been awarded the Alan T. Waterman Award in 1989, the W. Alden Spencer Award in 1993 and the NAS Award in Molecular Biology in 1997, won the 2010 Kavli Prize in Neuroscience with Thomas C. Südhof and James E. Rothman, and won the 2013 Albert Lasker Award for Basic Medical Research with Thomas Südhof. He was also given the Life Sciences Distinguished Alumni Award from University of Wisconsin–Madison. He is a Fellow of the American Academy of Arts and Sciences and a Member of the National Academy of Sciences.

<span class="mw-page-title-main">David Sankoff</span> Canadian scientist

David Sankoff is a Canadian mathematician, bioinformatician, computer scientist and linguist. He holds the Canada Research Chair in Mathematical Genomics in the Mathematics and Statistics Department at the University of Ottawa, and is cross-appointed to the Biology Department and the School of Information Technology and Engineering. He was founding editor of the scientific journal Language Variation and Change (Cambridge) and serves on the editorial boards of a number of bioinformatics, computational biology and linguistics journals. Sankoff is best known for his pioneering contributions in computational linguistics and computational genomics. He is considered to be one of the founders of bioinformatics. In particular, he had a key role in introducing dynamic programming for sequence alignment and other problems in computational biology. In Pavel Pevzner's words, "Michael Waterman and David Sankoff are responsible for transforming bioinformatics from a ‘stamp collection' of ill-defined problems into a rigorous discipline with important biological applications."

Flow cytometry bioinformatics is the application of bioinformatics to flow cytometry data, which involves storing, retrieving, organizing and analyzing flow cytometry data using extensive computational resources and tools. Flow cytometry bioinformatics requires extensive use of and contributes to the development of techniques from computational statistics and machine learning. Flow cytometry and related methods allow the quantification of multiple independent biomarkers on large numbers of single cells. The rapid growth in the multidimensionality and throughput of flow cytometry data, particularly in the 2000s, has led to the creation of a variety of computational analysis methods, data standards, and public databases for the sharing of results.

<span class="mw-page-title-main">Curtis Huttenhower</span> American biologist (born 1981)

Curtis Huttenhower is a Professor of Computational Biology and Bioinformatics in the Department of Biostatistics, School of Public Health, Harvard University.

<span class="mw-page-title-main">Pathway analysis</span>

Pathway is the term from molecular biology for a curated schematic representation of a well characterized segment of the molecular physiological machinery, such as a metabolic pathway describing an enzymatic process within a cell or tissue or a signaling pathway model representing a regulatory process that might, in its turn, enable a metabolic or another regulatory process downstream. A typical pathway model starts with an extracellular signaling molecule that activates a specific receptor, thus triggering a chain of molecular interactions. A pathway is most often represented as a relatively small graph with gene, protein, and/or small molecule nodes connected by edges of known functional relations. While a simpler pathway might appear as a chain, complex pathway topologies with loops and alternative routes are much more common. Computational analyses employ special formats of pathway representation. In the simplest form, however, a pathway might be represented as a list of member molecules with order and relations unspecified. Such a representation, generally called Functional Gene Set (FGS), can also refer to other functionally characterised groups such as protein families, Gene Ontology (GO) and Disease Ontology (DO) terms etc. In bioinformatics, methods of pathway analysis might be used to identify key genes/ proteins within a previously known pathway in relation to a particular experiment / pathological condition or building a pathway de novo from proteins that have been identified as key affected elements. By examining changes in e.g. gene expression in a pathway, its biological activity can be explored. However most frequently, pathway analysis refers to a method of initial characterization and interpretation of an experimental condition that was studied with omics tools or genome-wide association study. Such studies might identify long lists of altered genes. A visual inspection is then challenging and the information is hard to summarize, since the altered genes map to a broad range of pathways, processes, and molecular functions. In such situations, the most productive way of exploring the list is to identify enrichment of specific FGSs in it. The general approach of enrichment analyses is to identify FGSs, members of which were most frequently or most strongly altered in the given condition, in comparison to a gene set sampled by chance. In other words, enrichment can map canonical prior knowledge structured in the form of FGSs to the condition represented by altered genes.

<span class="mw-page-title-main">Desmond G. Higgins</span>

Desmond Gerard Higgins is a Professor of Bioinformatics at University College Dublin, widely known for CLUSTAL, a series of computer programs for performing multiple sequence alignment. According to Nature, Higgins' papers describing CLUSTAL are among the top ten most highly cited scientific papers of all time.

<span class="mw-page-title-main">Rafael Irizarry (scientist)</span> American professor of biostatistics

Rafael Irizarry is a professor of biostatistics at the Harvard T.H. Chan School of Public Health and professor of biostatistics and computational biology at the Dana–Farber Cancer Institute. Irizarry is known as one of the founders of the Bioconductor project.

Jean Yee Hwa Yang is an Australian statistician known for her work on variance reduction for microarrays, and for inferring proteins from mass spectrometry data. Yang is a Professor in the School of Mathematics and Statistics at the University of Sydney.

Sandrine Dudoit is a professor of statistics and public health at the University of California, Berkeley. Her research applies statistics to microarray and genetic data; she is known as one of the founders of the open-source Bioconductor project for the development of bioinformatics software.

<span class="mw-page-title-main">Manuel Corpas (scientist)</span> British bioinformatics researcher

Manuel Corpas is an Anglo-Spanish biologist and entrepreneur known primarily for his contributions to the field of Bioinformatics and Genomics. Currently Corpas is Chief Scientist of Cambridge startup Cambridge Precision Medicine, a tutor at the Institute for Continuing Education at the University of Cambridge and a lecturer at the Universidad Internacional de La Rioja. Manuel worked on the human genome from the beginning of his career, being one of the first consumers to sequence his own genome and that of close relatives, which he published as the Corpasome. He has held positions at the Earlham Institute as Project Leader, and the Wellcome Sanger Institute, developing the DECIPHER database, a database that aids in the diagnosis of patients with rare genomic disorders.

DESeq2 is a software package in the field of bioinformatics and computational biology for the statistical programming language R. It is primarily employed for the analysis of high-throughput RNA sequencing (RNA-seq) data to identify differentially expressed genes between different experimental conditions. DESeq2 employs statistical methods to normalize and analyze RNA-seq data, making it a valuable tool for researchers studying gene expression patterns and regulation. It is available through the Bioconductor repository.

References

  1. Robert Gentleman at the Mathematics Genealogy Project
  2. Gentleman, R. (2005). "Reproducible Research: A Bioinformatics Case Study". Statistical Applications in Genetics and Molecular Biology. 4: Article2. doi:10.2202/1544-6115.1034. PMID   16646837. S2CID   17729314.
  3. 1 2 3 "Bioinformatics Pioneer Robert Gentleman, Ph.D., Joins 23andMe Leadership Team" . Retrieved 10 August 2015.
  4. 1 2 "Robert Gentleman on His Goals for Drug Discovery at 23andMe" . Retrieved 10 August 2015.
  5. 1 2 Ihaka, R.; Gentleman, R. (1996). "R: A Language for Data Analysis and Graphics". Journal of Computational and Graphical Statistics. 5 (3): 299–314. doi:10.2307/1390807. JSTOR   1390807.
  6. Ashlee Vance (6 January 2009). "R, the Software, Finds Fans in Data Analysts – NYTimes.com". The New York Times. Retrieved 17 April 2011.
  7. Gentleman, R. C.; Carey, V. J.; Bates, D. M.; Bolstad, B.; Dettling, M.; Dudoit, S.; Ellis, B.; Gautier, L.; Ge, Y.; Gentry, J.; Hornik, K.; Hothorn, T.; Huber, W.; Iacus, S.; Irizarry, R.; Leisch, F.; Li, C.; Maechler, M.; Rossini, A. J.; Sawitzki, G.; Smith, C.; Smyth, G.; Tierney, L.; Yang, J. Y.; Zhang, J. (2004). "Bioconductor: Open software development for computational biology and bioinformatics". Genome Biology. 5 (10): R80. doi: 10.1186/gb-2004-5-10-r80 . PMC   545600 . PMID   15461798.
  8. Robert Gentleman at DBLP Bibliography Server OOjs UI icon edit-ltr-progressive.svg
  9. Gentleman, Robert Clifford (1988). Exploratory methods for censored data (PhD thesis). University of Washington. ProQuest   303589316.
  10. 1 2 Wolfson, Wendy. "A Bioinformatics Chief and a Gentleman" . Retrieved 10 August 2015.
  11. Gaudet, P.; Bairoch, A.; Field, D.; Sansone, S. -A.; Taylor, C.; Attwood, T. K.; Bateman, A.; Blake, J. A.; Bult, C. J.; Cherry, J. M.; Chisholm, R. L.; Cochrane, G.; Cook, C. E.; Eppig, J. T.; Galperin, M. Y.; Gentleman, R.; Goble, C. A.; Gojobori, T.; Hancock, J. M.; Howe, D. G.; Imanishi, T.; Kelso, J.; Landsman, D.; Lewis, S. E.; Karsch Mizrachi, I.; Orchard, S.; Ouellette, B. F. F.; Ranganathan, S.; Richardson, L.; Rocca-Serra, P. (2011). "Towards BioDBcore: A community-defined information specification for biological databases". Database . 2011: baq027. doi:10.1093/database/baq027. PMC   3017395 . PMID   21205783.
  12. "Genentech: Research: Robert C. Gentleman". Archived from the original on 2011-07-04. Retrieved 2011-04-17. Robert C. Gentleman Senior Director: Bioinformatics & Computational Biology
  13. "Benjamin Franklin Award – Bioinformatics.org" . Retrieved 10 December 2016.
  14. "ISCB Fellows" . Retrieved 10 August 2015.
  15. "ASA Fellows list". American Statistical Association. Archived from the original on 2017-12-01. Retrieved 2017-11-02.