Robert Gentleman | |
---|---|
Born | Robert Clifford Gentleman |
Alma mater | University of Washington University of British Columbia |
Known for | R (programming language) |
Awards | Benjamin Franklin Award (Bioinformatics) |
Scientific career | |
Institutions | Genentech University of Washington Harvard Medical School University of Waterloo The University of Auckland |
Thesis | Exploratory methods for censored data (1988) |
Doctoral advisor | John James Crowley [1] |
Robert Clifford Gentleman (born 1959) is a Canadian statistician and bioinformatician [2] who is currently the founding executive director of the Center for Computational Biomedicine at Harvard Medical School. He was previously the vice president of computational biology at 23andMe. [3] [4] Gentleman is recognized, along with Ross Ihaka, as one of the originators of the R programming language [5] [6] and the Bioconductor project. [7] [8]
Gentleman was awarded a Bachelor of Science degree in mathematics from the University of British Columbia. [3] He was awarded a Ph.D. degree in statistics from University of Washington in 1988; his thesis title was Exploratory methods for censored data. [9]
Gentleman worked as a statistics professor at the University of Auckland in the mid-1990s, where he developed the R programming language alongside Ross Ihaka. [5] [10] In 2001, he started work on the Bioconductor project to promote the development of open-source tools for bioinformatics and computational biology. In 2009, Gentleman joined the Genentech biotechnology corporation, where he worked as a senior director in bioinformatics and computational biology. [11] [12] Gentleman joined personal genomics and biotechnology company 23andMe as vice president in April 2015, [3] with the goal of bringing expertise on bioinformatics and computational drug discovery to the company. [4] Gentleman has also served on the board of the statistical software company Revolution Analytics (formerly known as REvolution Computing). [10]
Gentleman won the Benjamin Franklin Award in 2008, recognising his work on the R programming language, the Bioconductor project and his commitment to data and methods sharing. [13] He was made a Fellow of the International Society for Computational Biology in 2014 for his contribution to computational biology and bioinformatics. [14] He became a fellow of the American Statistical Association in 2017. [15]
Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The process of analyzing and interpreting data can sometimes be referred to as computational biology, however this distinction between the two terms is often disputed. To some, the term computational biology refers to building and using models of biological systems.
In bioinformatics and biochemistry, the FASTA format is a text-based format for representing either nucleotide sequences or amino acid (protein) sequences, in which nucleotides or amino acids are represented using single-letter codes.
Bioconductor is a free, open source and open development software project for the analysis and comprehension of genomic data generated by wet lab experiments in molecular biology.
Steven Lloyd Salzberg is an American computational biologist and computer scientist who is a Bloomberg Distinguished Professor of Biomedical Engineering, Computer Science, and Biostatistics at Johns Hopkins University, where he is also Director of the Center for Computational Biology.
Microarray analysis techniques are used in interpreting the data generated from experiments on DNA, RNA, and protein microarrays, which allow researchers to investigate the expression state of a large number of genes – in many cases, an organism's entire genome – in a single experiment. Such experiments can generate very large amounts of data, allowing researchers to assess the overall state of a cell or organism. Data in such large quantities is difficult – if not impossible – to analyze without the help of computer programs.
Mark Bender Gerstein is an American scientist working in bioinformatics and Data Science. As of 2009, he is co-director of the Yale Computational Biology and Bioinformatics program.
Within computational biology, an MA plot is an application of a Bland–Altman plot for visual representation of genomic data. The plot visualizes the differences between measurements taken in two samples, by transforming the data onto M and A scales, then plotting these values. Though originally applied in the context of two channel DNA microarray gene expression data, MA plots are also used to visualise high-throughput sequencing analysis.
Igor I. Goryanin is a systems biologist, who holds a Henrik Kacser Chair in Computational Systems Biology at the University of Edinburgh. He also heads the Biological Systems Unit at the Okinawa Institute of Science and Technology, Japan.
George Ross Ihaka is a New Zealand statistician who was an associate professor of statistics at the University of Auckland until his retirement in 2017. Alongside Robert Gentleman, he is one of the creators of the R programming language. In 2008, Ihaka received the Pickering Medal, awarded by the Royal Society of New Zealand, for his work on R.
Lawrence E. Hunter is a Professor and Director of the Center for Computational Pharmacology and of the Computational Bioscience Program at the University of Colorado School of Medicine and Professor of Computer Science at the University of Colorado Boulder. He is an internationally known scholar, focused on computational biology, knowledge-driven extraction of information from the primary biomedical literature, the semantic integration of knowledge resources in molecular biology, and the use of knowledge in the analysis of high-throughput data, as well as for his foundational work in computational biology, which led to the genesis of the major professional organization in the field and two international conferences.
David Sankoff is a Canadian mathematician, bioinformatician, computer scientist and linguist. He holds the Canada Research Chair in Mathematical Genomics in the Mathematics and Statistics Department at the University of Ottawa, and is cross-appointed to the Biology Department and the School of Information Technology and Engineering. He was founding editor of the scientific journal Language Variation and Change (Cambridge) and serves on the editorial boards of a number of bioinformatics, computational biology and linguistics journals. Sankoff is best known for his pioneering contributions in computational linguistics and computational genomics. He is considered to be one of the founders of bioinformatics. In particular, he had a key role in introducing dynamic programming for sequence alignment and other problems in computational biology. In Pavel Pevzner's words, "Michael Waterman and David Sankoff are responsible for transforming bioinformatics from a ‘stamp collection' of ill-defined problems into a rigorous discipline with important biological applications."
Flow cytometry bioinformatics is the application of bioinformatics to flow cytometry data, which involves storing, retrieving, organizing and analyzing flow cytometry data using extensive computational resources and tools. Flow cytometry bioinformatics requires extensive use of and contributes to the development of techniques from computational statistics and machine learning. Flow cytometry and related methods allow the quantification of multiple independent biomarkers on large numbers of single cells. The rapid growth in the multidimensionality and throughput of flow cytometry data, particularly in the 2000s, has led to the creation of a variety of computational analysis methods, data standards, and public databases for the sharing of results.
Curtis Huttenhower is a Professor of Computational Biology and Bioinformatics in the Department of Biostatistics, School of Public Health, Harvard University.
Pathway is the term from molecular biology for a curated schematic representation of a well characterized segment of the molecular physiological machinery, such as a metabolic pathway describing an enzymatic process within a cell or tissue or a signaling pathway model representing a regulatory process that might, in its turn, enable a metabolic or another regulatory process downstream. A typical pathway model starts with an extracellular signaling molecule that activates a specific receptor, thus triggering a chain of molecular interactions. A pathway is most often represented as a relatively small graph with gene, protein, and/or small molecule nodes connected by edges of known functional relations. While a simpler pathway might appear as a chain, complex pathway topologies with loops and alternative routes are much more common. Computational analyses employ special formats of pathway representation. In the simplest form, however, a pathway might be represented as a list of member molecules with order and relations unspecified. Such a representation, generally called Functional Gene Set (FGS), can also refer to other functionally characterised groups such as protein families, Gene Ontology (GO) and Disease Ontology (DO) terms etc. In bioinformatics, methods of pathway analysis might be used to identify key genes/ proteins within a previously known pathway in relation to a particular experiment / pathological condition or building a pathway de novo from proteins that have been identified as key affected elements. By examining changes in e.g. gene expression in a pathway, its biological activity can be explored. However most frequently, pathway analysis refers to a method of initial characterization and interpretation of an experimental condition that was studied with omics tools or genome-wide association study. Such studies might identify long lists of altered genes. A visual inspection is then challenging and the information is hard to summarize, since the altered genes map to a broad range of pathways, processes, and molecular functions. In such situations, the most productive way of exploring the list is to identify enrichment of specific FGSs in it. The general approach of enrichment analyses is to identify FGSs, members of which were most frequently or most strongly altered in the given condition, in comparison to a gene set sampled by chance. In other words, enrichment can map canonical prior knowledge structured in the form of FGSs to the condition represented by altered genes.
Desmond Gerard Higgins is a Professor of Bioinformatics at University College Dublin, widely known for CLUSTAL, a series of computer programs for performing multiple sequence alignment. According to Nature, Higgins' papers describing CLUSTAL are among the top ten most highly cited scientific papers of all time.
Rafael Irizarry is a professor of biostatistics at the Harvard T.H. Chan School of Public Health and professor of biostatistics and computational biology at the Dana–Farber Cancer Institute. Irizarry is known as one of the founders of the Bioconductor project.
Jean Yee Hwa Yang is an Australian statistician known for her work on variance reduction for microarrays, and for inferring proteins from mass spectrometry data. Yang is a Professor in the School of Mathematics and Statistics at the University of Sydney.
Sandrine Dudoit is a professor of statistics and public health at the University of California, Berkeley. Her research applies statistics to microarray and genetic data; she is known as one of the founders of the open-source Bioconductor project for the development of bioinformatics software.
Manuel Corpas is an Anglo-Spanish biologist and entrepreneur known primarily for his contributions to the field of Bioinformatics and Genomics. Currently Corpas is Chief Scientist of Cambridge startup Cambridge Precision Medicine, a tutor at the Institute for Continuing Education at the University of Cambridge and a lecturer at the Universidad Internacional de La Rioja. Manuel worked on the human genome from the beginning of his career, being one of the first consumers to sequence his own genome and that of close relatives, which he published as the Corpasome. He has held positions at the Earlham Institute as Project Leader, and the Wellcome Sanger Institute, developing the DECIPHER database, a database that aids in the diagnosis of patients with rare genomic disorders.
DESeq2 is a software package in the field of bioinformatics and computational biology for the statistical programming language R. It is primarily employed for the analysis of high-throughput RNA sequencing (RNA-seq) data to identify differentially expressed genes between different experimental conditions. DESeq2 employs statistical methods to normalize and analyze RNA-seq data, making it a valuable tool for researchers studying gene expression patterns and regulation. It is available through the Bioconductor repository.