Toby Gibson

Last updated
Toby Gibson
Born
Toby James Gibson
Alma mater University of Edinburgh (BSc)
University of Cambridge (PhD)
Known for Clustal [1]
Scientific career
Fields Computational biology
Bioinformatics
Short linear motifs
Protein interactions
Sequence alignment [2]
Institutions Laboratory of Molecular Biology
European Molecular Biology Laboratory
Thesis Studies on the Epstein-Barr virus genome  (1984)
Website www.embl.de/research/units/scb/gibson

Toby James Gibson is a group leader and biochemist at the European Molecular Biology Laboratory (EMBL) in Heidelberg [2] [3] known for his work on Clustal. [1] [4] According to Nature , Gibson's co-authored papers describing Clustal [4] [5] are among the top ten most highly cited scientific papers of all time. [6]

Contents

Education

Gibson was educated at the University of Edinburgh [7] and went on to his PhD at the University of Cambridge in 1984 on the genome of the Epstein–Barr virus [8] while working in the Medical Research Council (MRC) Laboratory of Molecular Biology (LMB). [7]

Career and research

Gibson was a postdoctoral research fellow with Sydney Brenner before moving to EMBL in 1986. [7] He was appointed a staff scientist in 1991 and a team leader in 1996 where he has worked since.

Gibson’s research interests are in computational biology, bioinformatics, short linear motifs, protein–protein interactions and biological sequence alignment. [2] His laboratory developed and hosts the Eukaryotic Linear Motif (ELM) resource. [9]

Related Research Articles

<span class="mw-page-title-main">Sequence alignment</span> Process in bioinformatics that identifies equivalent sites within molecular sequences

In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are typically represented as rows within a matrix. Gaps are inserted between the residues so that identical or similar characters are aligned in successive columns. Sequence alignments are also used for non-biological sequences, such as calculating the distance cost between strings in a natural language or in financial data.

<span class="mw-page-title-main">Clustal</span>

Clustal is a series of widely used computer programs used in bioinformatics for multiple sequence alignment. There have been many versions of Clustal over the development of the algorithm that are listed below. The analysis of each tool and its algorithm is also detailed in their respective categories. Available operating systems listed in the sidebar are a combination of the software availability and may not be supported for every current version of the Clustal tools. Clustal Omega has the widest variety of operating systems out of all the Clustal tools.

The European Bioinformatics Institute (EMBL-EBI) is an intergovernmental organization (IGO) which, as part of the European Molecular Biology Laboratory (EMBL) family, focuses on research and services in bioinformatics. It is located on the Wellcome Genome Campus in Hinxton near Cambridge, and employs over 600 full-time equivalent (FTE) staff. Institute leaders such as Rolf Apweiler, Alex Bateman, Ewan Birney, and Guy Cochrane, an adviser on the National Genomics Data Center Scientific Advisory Board, serve as part of the international research network of the BIG Data Center at the Beijing Institute of Genomics.

<span class="mw-page-title-main">Pfam</span> Database of protein families

Pfam is a database of protein families that includes their annotations and multiple sequence alignments generated using hidden Markov models. The most recent version, Pfam 35.0, was released in November 2021 and contains 19,632 families.

<span class="mw-page-title-main">Multiple sequence alignment</span> Alignment of more than two molecular sequences

Multiple sequence alignment (MSA) may refer to the process or the result of sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutionary relationship by which they share a linkage and are descended from a common ancestor. From the resulting MSA, sequence homology can be inferred and phylogenetic analysis can be conducted to assess the sequences' shared evolutionary origins. Visual depictions of the alignment as in the image at right illustrate mutation events such as point mutations that appear as differing characters in a single alignment column, and insertion or deletion mutations that appear as hyphens in one or more of the sequences in the alignment. Multiple sequence alignment is often used to assess sequence conservation of protein domains, tertiary and secondary structures, and even individual amino acids or nucleotides.

InterPro is a database of protein families, protein domains and functional sites in which identifiable features found in known proteins can be applied to new protein sequences in order to functionally characterise them.

T-Coffee is a multiple sequence alignment software using a progressive approach. It generates a library of pairwise alignments to guide the multiple sequence alignment. It can also combine multiple sequences alignments obtained previously and in the latest versions can use structural information from PDB files (3D-Coffee). It has advanced features to evaluate the quality of the alignments and some capacity for identifying occurrence of motifs (Mocca). It produces alignment in the aln format (Clustal) by default, but can also produce PIR, MSF, and FASTA format. The most common input formats are supported.

MAVID is a multiple sequence alignment program suitable for the alignment of large numbers of DNA sequences. The sequences can be small mitochondrial genomes or large genomic regions up to megabases long. The latest version is 2.0.4.

MUltiple Sequence Comparison by Log-Expectation (MUSCLE) is computer software for multiple sequence alignment of protein and nucleotide sequences. It is licensed as public domain. The method was published by Robert C. Edgar in two papers in 2004. The first paper, published in Nucleic Acids Research, introduced the sequence alignment algorithm. The second paper, published in BMC Bioinformatics, presented more technical details.

Protein subfamily is a level of protein classification, based on their close evolutionary relationship. It is below the larger levels of protein superfamily and protein family.

<span class="mw-page-title-main">28S ribosomal RNA</span> RNA component of the large subunit of the eukaryotic ribosome

28S ribosomal RNA is the structural ribosomal RNA (rRNA) for the large subunit (LSU) of eukaryotic cytoplasmic ribosomes, and thus one of the basic components of all eukaryotic cells. It has a size of 25S in plants and 28S in mammals, hence the alias of 25S–28S rRNA.

The Eukaryotic Linear Motif (ELM) resource is a computational biology resource for investigating short linear motifs (SLiMs) in eukaryotic proteins. It is currently the largest collection of linear motif classes with annotated and experimentally validated linear motif instances.

SUPERFAMILY is a database and search platform of structural and functional annotation for all proteins and genomes. It classifies amino acid sequences into known structural domains, especially into SCOP superfamilies. Domains are functional, structural, and evolutionary units that form proteins. Domains of common Ancestry are grouped into superfamilies. The domains and domain superfamilies are defined and described in SCOP. Superfamilies are groups of proteins which have structural evidence to support a common evolutionary ancestor but may not have detectable sequence homology.

<span class="mw-page-title-main">DHRS7B</span> Protein-coding gene in the species Homo sapiens

Dehydrogenase/reductase member 7B is an enzyme encoded by the DHRS7B gene in humans, found on chromosome 17p11.2. DHRS7B encodes a protein that is predicted to function in steroid hormone regulation. A deletion in the chromosomal region 17p11.2 has been associated with Smith-Magenis Syndrome, a genetic developmental disorder.

<span class="mw-page-title-main">Fam158a</span> Protein-coding gene in the species Homo sapiens

UPF0172 protein FAM158A, also known as c14orf122 or CGI112, is a protein that in humans is encoded by the FAM158A gene located on chromosome 14q11.2.

<span class="mw-page-title-main">European Nucleotide Archive</span> Online database from the EBI on Nucleotides

The European Nucleotide Archive (ENA) is a repository providing free and unrestricted access to annotated DNA and RNA sequences. It also stores complementary information such as experimental procedures, details of sequence assembly and other metadata related to sequencing projects. The archive is composed of three main databases: the Sequence Read Archive, the Trace Archive and the EMBL Nucleotide Sequence Database. The ENA is produced and maintained by the European Bioinformatics Institute and is a member of the International Nucleotide Sequence Database Collaboration (INSDC) along with the DNA Data Bank of Japan and GenBank.

EPD is a biological database and web resource of eukaryotic RNA polymerase II promoters with experimentally defined transcription start sites. Originally, EPD was a manually curated resource relying on transcript mapping experiments targeted at individual genes and published in academic journals. More recently, automatically generated promoter collections derived from electronically distributed high-throughput data produced with the CAGE or TSS-Seq protocols were added as part of a special subsection named EPDnew. The EPD web server offers additional services, including an entry viewer which enables users to explore the genomic context of a promoter in a UCSC Genome Browser window, and direct links for uploading EPD-derived promoter subsets to associated web-based promoter analysis tools of the Signal Search Analysis (SSA) and ChIP-Seq servers. EPD also features a collection of position weight matrices (PWMs) for common promoter sequence motifs.

<span class="mw-page-title-main">Alex Bateman</span>

Alexander George Bateman is a computational biologist and Head of Protein Sequence Resources at the European Bioinformatics Institute (EBI), part of the European Molecular Biology Laboratory (EMBL) in Cambridge, UK. He has led the development of the Pfam biological database and introduced the Rfam database of RNA families. He has also been involved in the use of Wikipedia for community-based annotation of biological databases.

<span class="mw-page-title-main">Desmond G. Higgins</span>

Desmond Gerard Higgins is a Professor of Bioinformatics at University College Dublin, widely known for CLUSTAL, a series of computer programs for performing multiple sequence alignment. According to Nature, Higgins' papers describing CLUSTAL are among the top ten most highly cited scientific papers of all time.

<span class="mw-page-title-main">Proline-rich protein 30</span>

Proline-rich protein 30 is a protein in humans that is encoded for by the PRR30 gene. PRR30 is a member in the family of Proline-rich proteins characterized by their intrinsic lack of structure. Copy number variations in the PRR30 gene have been associated with an increased risk for neurofibromatosis.

References

  1. 1 2 Thompson, Julie D.; Higgins, Desmond G.; Gibson, Toby J. (1994). "CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice". Nucleic Acids Research. 22 (22): 4673–4680. doi:10.1093/nar/22.22.4673. ISSN   0305-1048. PMC   308517 . PMID   7984417.
  2. 1 2 3 Toby Gibson publications indexed by Google Scholar OOjs UI icon edit-ltr-progressive.svg
  3. Toby Gibson publications from Europe PubMed Central
  4. 1 2 Thompson, J. (1997). "The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools". Nucleic Acids Research. 25 (24): 4876–4882. doi:10.1093/nar/25.24.4876. ISSN   1362-4962. PMC   147148 . PMID   9396791.
  5. Thompson, J. D.; Higgins, D. G.; Gibson, T. J. (1994). "CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice". Nucleic Acids Research. 22 (22): 4673–4680. doi:10.1093/nar/22.22.4673. PMC   308517 . PMID   7984417.
  6. Van Noorden, R.; Maher, B.; Nuzzo, R. (2014). "The top 100 papers: Nature explores the most-cited research of all time". Nature . 514 (7524): 550–3. doi: 10.1038/514550a . PMID   25355343.
  7. 1 2 3 Anon (2019). "Toby (James) Gibson: EMBL, Heidelberg, Germany". uni-halle.de. Martin Luther University of Halle-Wittenberg. Archived from the original on 2019-07-05.
  8. Gibson, Toby James (1984). Studies on the Epstein-Barr virus genome. cam.ac.uk (PhD thesis). University of Cambridge. OCLC   499859334. EThOS   uk.bl.ethos.352786.
  9. Kumar, Manjeet; Gouw, Marc; Michael, Sushama; Sámano-Sánchez, Hugo; Pancsa, Rita; Glavina, Juliana; Diakogianni, Athina; Valverde, Jesús Alvarado; Bukirova, Dayana; Čalyševa, Jelena; Palopoli, Nicolas; Davey, Norman E; Chemes, Lucía B; Gibson, Toby J (2019). "ELM—the eukaryotic linear motif resource in 2020". Nucleic Acids Research. doi: 10.1093/nar/gkz1030 . ISSN   0305-1048. PMC   7145657 . PMID   31680160.