Eukaryotic Linear Motif resource

Last updated
ELM
Database.png
Content
Descriptioneukaryotic linear motifs.
Contact
AuthorsHolger Dinkel
Toby Gibson
Primary citationDinkel & al. (2012) [1]
Release date2011
Access
Website elm.eu.org

The Eukaryotic Linear Motif (ELM) resource is a computational biology resource (developed at the European Molecular Biology Laboratory (EMBL)) for investigating short linear motifs (SLiMs) in eukaryotic proteins. [2] [3] It is currently the largest collection of linear motif classes with annotated and experimentally validated linear motif instances.

Contents

Linear motifs are specified as patterns using regular expression rules. These expressions are used in the ELM prediction pipeline which detects putative motif instances in protein sequences. To improve the predictive power, context-based rules and logical filters are being developed and applied to reduce the amount of false positives matches.

As of 2010 ELM contained 146 different motifs that annotate more than 1300 experimentally determined instances within proteins. [3] The current version of the ELM server provides filtering by cell compartment, phylogeny, globular domain clash (using the SMART/Pfam databases) and structure. [4] In addition, both the known ELM instances and any positionally conserved matches in sequences similar to ELM instance sequences are identified and displayed.

See also

Related Research Articles

<span class="mw-page-title-main">Ensembl genome database project</span> Scientific project at the European Bioinformatics Institute

Ensembl genome database project is a scientific project at the European Bioinformatics Institute, which provides a centralized resource for geneticists, molecular biologists and other researchers studying the genomes of our own species and other vertebrates and model organisms. Ensembl is one of several well known genome browsers for the retrieval of genomic information.

<span class="mw-page-title-main">Amos Bairoch</span> Swiss bioinformatician

Amos Bairoch is a Swiss bioinformatician and Professor of Bioinformatics at the Department of Human Protein Sciences of the University of Geneva where he leads the CALIPHO group at the Swiss Institute of Bioinformatics (SIB) combining bioinformatics, curation, and experimental efforts to functionally characterize human proteins.

InterPro is a database of protein families, protein domains and functional sites in which identifiable features found in known proteins can be applied to new protein sequences in order to functionally characterise them.

<span class="mw-page-title-main">PROSITE</span> Database of protein domains, families and functional sites

PROSITE is a protein database. It consists of entries describing the protein families, domains and functional sites as well as amino acid patterns and profiles in them. These are manually curated by a team of the Swiss Institute of Bioinformatics and tightly integrated into Swiss-Prot protein annotation. PROSITE was created in 1988 by Amos Bairoch, who directed the group for more than 20 years. Since July 2018, the director of PROSITE and Swiss-Prot is Alan Bridge.

Expasy is an online bioinformatics resource operated by the SIB Swiss Institute of Bioinformatics. It is an extensible and integrative portal which provides access to over 160 databases and software tools and supports a range of life science and clinical research areas, from genomics, proteomics and structural biology, to evolution and phylogeny, systems biology and medical chemistry. The individual resources are hosted in a decentralized way by different groups of the SIB Swiss Institute of Bioinformatics and partner institutions.

Rfam is a database containing information about non-coding RNA (ncRNA) families and other structured RNA elements. It is an annotated, open access database originally developed at the Wellcome Trust Sanger Institute in collaboration with Janelia Farm, and currently hosted at the European Bioinformatics Institute. Rfam is designed to be similar to the Pfam database for annotating protein families.

The Reference Sequence (RefSeq) database is an open access, annotated and curated collection of publicly available nucleotide sequences and their protein products. RefSeq was introduced in 2000. This database is built by National Center for Biotechnology Information (NCBI), and, unlike GenBank, provides only a single record for each natural biological molecule for major organisms ranging from viruses to bacteria to eukaryotes.

A nuclear export signal (NES) is a short target peptide containing 4 hydrophobic residues in a protein that targets it for export from the cell nucleus to the cytoplasm through the nuclear pore complex using nuclear transport. It has the opposite effect of a nuclear localization signal, which targets a protein located in the cytoplasm for import to the nucleus. The NES is recognized and bound by exportins.

<span class="mw-page-title-main">Short linear motif</span>

In molecular biology short linear motifs (SLiMs), linear motifs or minimotifs are short stretches of protein sequence that mediate protein–protein interaction.

Minimotif Miner is a program and database designed to identify minimotifs in any protein. Minimotifs are short, contiguous peptide sequences that are known to have a function in at least one protein. Minimotifs are also called sequence motifs or short linear motifs or SLiMs. These are generally restricted to one secondary structure element and are less than 15 amino acids in length.

EPD is a biological database and web resource of eukaryotic RNA polymerase II promoters with experimentally defined transcription start sites. Originally, EPD was a manually curated resource relying on transcript mapping experiments targeted at individual genes and published in academic journals. More recently, automatically generated promoter collections derived from electronically distributed high-throughput data produced with the CAGE or TSS-Seq protocols were added as part of a special subsection named EPDnew. The EPD web server offers additional services, including an entry viewer which enables users to explore the genomic context of a promoter in a UCSC Genome Browser window, and direct links for uploading EPD-derived promoter subsets to associated web-based promoter analysis tools of the Signal Search Analysis (SSA) and ChIP-Seq servers. EPD also features a collection of position weight matrices (PWMs) for common promoter sequence motifs.

PomBase is a model organism database that provides online access to the fission yeast Schizosaccharomyces pombe genome sequence and annotated features, together with a wide range of manually curated functional gene-specific data. The PomBase website was redeveloped in 2016 to provide users with a more fully integrated, better-performing service.

Model organism databases (MODs) are biological databases, or knowledgebases, dedicated to the provision of in-depth biological data for intensively studied model organisms. MODs allow researchers to easily find background information on large sets of genes, plan experiments efficiently, combine their data with existing knowledge, and construct novel hypotheses. They allow users to analyse results and interpret datasets, and the data they generate are increasingly used to describe less well studied species. Where possible, MODs share common approaches to collect and represent biological information. For example, all MODs use the Gene Ontology (GO) to describe functions, processes and cellular locations of specific gene products. Projects also exist to enable software sharing for curation, visualization and querying between different MODs. Organismal diversity and varying user requirements however mean that MODs are often required to customize capture, display, and provision of data.

The Histone Database is a comprehensive database of histone protein sequences including histone variants, classified by histone types and variants, maintained by National Center for Biotechnology Information. The creation of the Histone Database was stimulated by the X-ray analysis of the structure of the nucleosomal core histone octamer followed by the application of a novel motif searching method to a group of proteins containing the histone fold motif in the early-mid-1990. The first version of the Histone Database was released in 1995 and several updates have been released since then.

<span class="mw-page-title-main">Protein tandem repeats</span>

An array of protein tandem repeats is defined as several adjacent copies having the same or similar sequence motifs. These periodic sequences are generated by internal duplications in both coding and non-coding genomic sequences. Repetitive units of protein tandem repeats are considerably diverse, ranging from the repetition of a single amino acid to domains of 100 or more residues.

Toby James Gibson is a group leader and biochemist at the European Molecular Biology Laboratory (EMBL) in Heidelberg known for his work on Clustal. According to Nature, Gibson's co-authored papers describing Clustal are among the top ten most highly cited scientific papers of all time.

<span class="mw-page-title-main">Small integral membrane protein 14</span>

Small integral membrane protein 14, also known as SMIM14 or C4orf34, is a protein encoded on chromosome 4 of the human genome by the SMIM14 gene. SMIM14 has at least 298 orthologs mainly found in jawed vertebrates and no paralogs. SMIM14 is classified as a type I transmembrane protein. While this protein is not well understood by the scientific community, the transmembrane domain of SMIM14 may be involved in ER retention.

<span class="mw-page-title-main">Jakub Paś</span> Polish scientist and entrepreneur (born 1977)

Jakub Paś is a Polish scientist and developer. He is a doctor of chemistry at the Faculty of Chemistry of Adam Mickiewicz University in Poznań.

References

  1. Dinkel, Holger; Michael Sushama; Weatheritt Robert J; Davey Norman E; Van Roey Kim; Altenberg Brigitte; Toedt Grischa; Uyar Bora; Seiler Markus; Budd Aidan; Jödicke Lisa; Dammert Marcel A; Schroeter Christian; Hammer Maria; Schmidt Tobias; Jehl Peter; McGuigan Caroline; Dymecka Magdalena; Chica Claudia; Luck Katja; Via Allegra; Chatr-Aryamontri Andrew; Haslam Niall; Grebnev Gleb; Edwards Richard J; Steinmetz Michel O; Meiselbach Heike; Diella Francesca; Gibson Toby J (Jan 2012). "ELM--the database of eukaryotic linear motifs". Nucleic Acids Research. 40 (D1): D242–D251. doi:10.1093/nar/gkr1064. PMC   3245074 . PMID   22110040.
  2. Puntervoll P, Linding R, Gemünd C, et al. (July 2003). "ELM server: A new resource for investigating short functional sites in modular eukaryotic proteins". Nucleic Acids Res. 31 (13): 3625–30. doi:10.1093/nar/gkg545. PMC   168952 . PMID   12824381.
  3. 1 2 Gould CM, Diella F, Via A, et al. (January 2010). "ELM: the status of the 2010 eukaryotic linear motif resource". Nucleic Acids Res. 38 (Database issue): D167–80. doi:10.1093/nar/gkp1016. PMC   2808914 . PMID   19920119.
  4. Via A, Gould CM, Gemünd C, Gibson TJ, Helmer-Citterich M (2009). "A structure filter for the Eukaryotic Linear Motif Resource". BMC Bioinformatics. 10: 351. doi: 10.1186/1471-2105-10-351 . PMC   2774702 . PMID   19852836.