Richard Durbin | |
---|---|
Born | Richard Michael Durbin 30 December 1960 [1] |
Nationality | British |
Education | Highgate School |
Alma mater | University of Cambridge (BA, PhD) |
Known for |
|
Spouse | [1] |
Awards |
|
Scientific career | |
Fields | |
Institutions | |
Thesis | Studies on the development and organisation of the nervous system of Caenorhabditis elegans (1987) |
Doctoral advisor | John G. White [5] |
Doctoral students | Ewan Birney [6] |
Other notable students | |
Website | www |
Richard Michael Durbin FRS [17] (born 1960) [1] is a British computational biologist [18] [19] [4] and Al-Kindi Professor of Genetics at the University of Cambridge. [20] [21] [22] [23] He also serves as an associate faculty member at the Wellcome Sanger Institute where he was previously a senior group leader. [24] [25] [26] [27]
Durbin was educated at The Hall School, Hampstead [ citation needed ] and Highgate School in London. [1] After competing in the 1978/9 International Mathematical Olympiad, [28] he went on to study at the University of Cambridge graduating in 1982 [29] with a second class honours degree in the Cambridge Mathematical Tripos. After graduating, he continued to study for a PhD [5] at St John's College, Cambridge [1] studying the development and organisation of the nervous system of Caenorhabditis elegans whilst working at the Laboratory of Molecular Biology (LMB) in Cambridge, supervised by John Graham White. [5]
Durbin's early work included developing the primary instrument software for one of the first X-ray crystallography area detectors [30] and the MRC Biorad confocal microscope, alongside contributions to neural modelling. [31] [32]
He then led the informatics for the Caenorhabditis elegans genome project, [33] and alongside Jean Thierry-Mieg developed the genome database AceDB, which evolved into the WormBase web resource. Following this he played an important role in data collection for and interpretation of the human genome sequence. [34]
He has developed numerous methods for computational sequence analysis. [35] [36] These include gene finding (e.g. GeneWise) with Ewan Birney [37] and Hidden Markov models for protein and nucleic acid alignment and matching (e.g. HMMER) with Sean Eddy and Graeme Mitchison. A standard textbook Biological Sequence analysis coauthored with Sean Eddy, Anders Krogh and Graeme Mitchison [2] describes some of this work. Using these methods Durbin worked with colleagues to build a series of important genomic data resources, including the protein family database Pfam, [38] the genome database Ensembl, [39] and the gene family database TreeFam. [11]
More recently Durbin has returned to sequencing and has developed low coverage approaches to population genome sequencing, applied first to yeast, [40] [41] and has been one of the leaders in the application of new sequencing technology to study human genome variation. [42] [43] Durbin currently co-leads the international 1000 Genomes Project to characterise variation down to 1% allele frequency as a foundation for human genetics.
Durbin was a joint winner of the Mullard Award of the Royal Society in 1994 (for work on the confocal microscope), won the Lord Lloyd of Kilgerran Award of the Foundation for Science and Technology in 2004, and was elected a Fellow of the Royal Society (FRS) in 2004 [17] and a member of the European Molecular Biology Organization (EMBO) in 2009. The Royal Society awarded its Gabor Medal to Durbin in 2017 for his contributions to computational biology. [44] In 2023 he received the International Prize for Biology for his work on the Biology of Genomes.
Durbin's certificate of election for the Royal Society reads:
Durbin is distinguished for his powerful contribution to computational biology. In particular, he played a leading role in establishing the new field of bioinformatics. This allows the handling of biological data on an unprecedented scale, enabling genomics to prosper. He led the analysis of the C. elegans genome, and with Thierry-Mieg developed the database software AceDB. In the international genome project he led the analysis of protein coding genes. He introduced key computational tools in software and data handling. His Pfam database allowed the identification of domains in new protein sequences; it used hidden Markov models to which approach generally he brought rigour and which led to covariance models for RNA sequence. [45]
Durbin is the son of James Durbin and is married to Julie Ahringer, a scientist at the Gurdon Institute. They have two children. [1]
UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from the research literature. It is maintained by the UniProt consortium, which consists of several European bioinformatics organisations and a foundation from Washington, DC, United States.
The Wellcome Sanger Institute, previously known as The Sanger Centre and Wellcome Trust Sanger Institute, is a non-profit British genomics and genetics research institute, primarily funded by the Wellcome Trust.
Pfam is a database of protein families that includes their annotations and multiple sequence alignments generated using hidden Markov models. The most recent version, Pfam 36.0, was released in September 2023 and contains 20,795 families.
John Frederick William Birney is joint director of EMBL's European Bioinformatics Institute (EMBL-EBI), in Hinxton, Cambridgeshire and deputy director general of the European Molecular Biology Laboratory (EMBL). He also serves as non-executive director of Genomics England, chair of the Global Alliance for Genomics and Health (GA4GH) and honorary professor of bioinformatics at the University of Cambridge. Birney has made significant contributions to genomics, through his development of innovative bioinformatics and computational biology tools. He previously served as an associate faculty member at the Wellcome Trust Sanger Institute.
Amos Bairoch is a Swiss bioinformatician and Professor of Bioinformatics at the Department of Human Protein Sciences of the University of Geneva where he leads the CALIPHO group at the Swiss Institute of Bioinformatics (SIB) combining bioinformatics, curation, and experimental efforts to functionally characterize human proteins.
InterPro is a database of protein families, protein domains and functional sites in which identifiable features found in known proteins can be applied to new protein sequences in order to functionally characterise them.
Rfam is a database containing information about non-coding RNA (ncRNA) families and other structured RNA elements. It is an annotated, open access database originally developed at the Wellcome Trust Sanger Institute in collaboration with Janelia Farm, and currently hosted at the European Bioinformatics Institute. Rfam is designed to be similar to the Pfam database for annotating protein families.
MicrobesOnline is a publicly and freely accessible website that hosts multiple comparative genomic tools for comparing microbial species at the genomic, transcriptomic and functional levels. MicrobesOnline was developed by the Virtual Institute for Microbial Stress and Survival, which is based at the Lawrence Berkeley National Laboratory in Berkeley, California. The site was launched in 2005, with regular updates until 2011.
Anders Krogh is a bioinformatician at the University of Copenhagen, where he leads the university's bioinformatics center. He is known for his pioneering work on the use of hidden Markov models in bioinformatics, and is co-author of a widely used textbook in bioinformatics. In addition, he also co-authored one of the early textbooks on neural networks. His current research interests include promoter analysis, non-coding RNA, gene prediction and protein structure prediction.
SUPERFAMILY is a database and search platform of structural and functional annotation for all proteins and genomes. It classifies amino acid sequences into known structural domains, especially into SCOP superfamilies. Domains are functional, structural, and evolutionary units that form proteins. Domains of common Ancestry are grouped into superfamilies. The domains and domain superfamilies are defined and described in SCOP. Superfamilies are groups of proteins which have structural evidence to support a common evolutionary ancestor but may not have detectable sequence homology.
A domain of unknown function (DUF) is a protein domain that has no characterised function. These families have been collected together in the Pfam database using the prefix DUF followed by a number, with examples being DUF2992 and DUF1220. As of 2019, there are almost 4,000 DUF families within the Pfam database representing over 22% of known families. Some DUFs are not named using the nomenclature due to popular usage but are nevertheless DUFs.
Sean Roberts Eddy is Professor of Molecular & Cellular Biology and of Applied Mathematics at Harvard University. Previously he was based at the Janelia Research Campus from 2006 to 2015 in Virginia. His research interests are in bioinformatics, computational biology and biological sequence analysis. As of 2016 projects include the use of Hidden Markov models in HMMER, Infernal Pfam and Rfam.
The Protein Common Interface Database (ProtCID) is a database of similar protein-protein interfaces in crystal structures of homologous proteins.
αr9 is a family of bacterial small non-coding RNAs with representatives in a broad group of α-proteobacteria from the order Hyphomicrobiales. The first member of this family (Smr9C) was found in a Sinorhizobium meliloti 1021 locus located in the chromosome (C). Further homology and structure conservation analysis have identified full-length Smr9C homologs in several nitrogen-fixing symbiotic rhizobia, in the plant pathogens belonging to Agrobacterium species as well as in a broad spectrum of Brucella species. αr9C RNA species are 144-158 nt long and share a well defined common secondary structure consisting of seven conserved regions. Most of the αr9 transcripts can be catalogued as trans-acting sRNAs expressed from well-defined promoter regions of independent transcription units within intergenic regions (IGRs) of the α-proteobacterial genomes.
The European Nucleotide Archive (ENA) is a repository providing free and unrestricted access to annotated DNA and RNA sequences. It also stores complementary information such as experimental procedures, details of sequence assembly and other metadata related to sequencing projects. The archive is composed of three main databases: the Sequence Read Archive, the Trace Archive and the EMBL Nucleotide Sequence Database. The ENA is produced and maintained by the European Bioinformatics Institute and is a member of the International Nucleotide Sequence Database Collaboration (INSDC) along with the DNA Data Bank of Japan and GenBank.
Timothy John Phillip Hubbard is a Professor of Bioinformatics at King's College London, Head of Genome Analysis at Genomics England and Honorary Faculty at the Wellcome Trust Sanger Institute in Cambridge, UK. Starting March 1, 2024, Tim will become the director of Europe's Life Science Data Infrastructure ELIXIR.
Alexander George Bateman is a computational biologist and Head of Protein Sequence Resources at the European Bioinformatics Institute (EBI), part of the European Molecular Biology Laboratory (EMBL) in Cambridge, UK. He has led the development of the Pfam biological database and introduced the Rfam database of RNA families. He has also been involved in the use of Wikipedia for community-based annotation of biological databases.
Donna R. Maglott is a staff scientist at the National Center for Biotechnology Information known for her research on large-scale genomics projects, including the mouse genome and development of databases required for genomics research.
In molecular phylogenetics, relationships among individuals are determined using character traits, such as DNA, RNA or protein, which may be obtained using a variety of sequencing technologies. High-throughput next-generation sequencing has become a popular technique in transcriptomics, which represent a snapshot of gene expression. In eukaryotes, making phylogenetic inferences using RNA is complicated by alternative splicing, which produces multiple transcripts from a single gene. As such, a variety of approaches may be used to improve phylogenetic inference using transcriptomic data obtained from RNA-Seq and processed using computational phylogenetics.