Content | |
---|---|
Description | eukaryotic linear motifs. |
Contact | |
Authors | Holger Dinkel Toby Gibson |
Primary citation | Dinkel & al. (2012) [1] |
Release date | 2011 |
Access | |
Website | elm |
The Eukaryotic Linear Motif (ELM) resource is a computational biology resource (developed at the European Molecular Biology Laboratory (EMBL)) for investigating short linear motifs (SLiMs) in eukaryotic proteins. [2] [3] It is currently the largest collection of linear motif classes with annotated and experimentally validated linear motif instances.
Linear motifs are specified as patterns using regular expression rules. These expressions are used in the ELM prediction pipeline which detects putative motif instances in protein sequences. To improve the predictive power, context-based rules and logical filters are being developed and applied to reduce the amount of false positives matches.
As of 2010 ELM contained 146 different motifs that annotate more than 1300 experimentally determined instances within proteins. [3] The current version of the ELM server provides filtering by cell compartment, phylogeny, globular domain clash (using the SMART/Pfam databases) and structure. [4] In addition, both the known ELM instances and any positionally conserved matches in sequences similar to ELM instance sequences are identified and displayed.
Ensembl genome database project is a scientific project at the European Bioinformatics Institute, which provides a centralized resource for geneticists, molecular biologists and other researchers studying the genomes of our own species and other vertebrates and model organisms. Ensembl is one of several well known genome browsers for the retrieval of genomic information.
Amos Bairoch is a Swiss bioinformatician and Professor of Bioinformatics at the Department of Human Protein Sciences of the University of Geneva where he leads the CALIPHO group at the Swiss Institute of Bioinformatics (SIB) combining bioinformatics, curation, and experimental efforts to functionally characterize human proteins.
InterPro is a database of protein families, protein domains and functional sites in which identifiable features found in known proteins can be applied to new protein sequences in order to functionally characterise them.
PROSITE is a protein database. It consists of entries describing the protein families, domains and functional sites as well as amino acid patterns and profiles in them. These are manually curated by a team of the Swiss Institute of Bioinformatics and tightly integrated into Swiss-Prot protein annotation. PROSITE was created in 1988 by Amos Bairoch, who directed the group for more than 20 years. Since July 2018, the director of PROSITE and Swiss-Prot is Alan Bridge.
Expasy is an online bioinformatics resource operated by the SIB Swiss Institute of Bioinformatics. It is an extensible and integrative portal which provides access to over 160 databases and software tools and supports a range of life science and clinical research areas, from genomics, proteomics and structural biology, to evolution and phylogeny, systems biology and medical chemistry. The individual resources are hosted in a decentralized way by different groups of the SIB Swiss Institute of Bioinformatics and partner institutions.
Rfam is a database containing information about non-coding RNA (ncRNA) families and other structured RNA elements. It is an annotated, open access database originally developed at the Wellcome Trust Sanger Institute in collaboration with Janelia Farm, and currently hosted at the European Bioinformatics Institute. Rfam is designed to be similar to the Pfam database for annotating protein families.
The Reference Sequence (RefSeq) database is an open access, annotated and curated collection of publicly available nucleotide sequences and their protein products. RefSeq was introduced in 2000. This database is built by National Center for Biotechnology Information (NCBI), and, unlike GenBank, provides only a single record for each natural biological molecule for major organisms ranging from viruses to bacteria to eukaryotes.
A nuclear export signal (NES) is a short target peptide containing 4 hydrophobic residues in a protein that targets it for export from the cell nucleus to the cytoplasm through the nuclear pore complex using nuclear transport. It has the opposite effect of a nuclear localization signal, which targets a protein located in the cytoplasm for import to the nucleus. The NES is recognized and bound by exportins.
In molecular biology short linear motifs (SLiMs), linear motifs or minimotifs are short stretches of protein sequence that mediate protein–protein interaction.
Minimotif Miner is a program and database designed to identify minimotifs in any protein. Minimotifs are short, contiguous peptide sequences that are known to have a function in at least one protein. Minimotifs are also called sequence motifs or short linear motifs or SLiMs. These are generally restricted to one secondary structure element and are less than 15 amino acids in length.
EPD is a biological database and web resource of eukaryotic RNA polymerase II promoters with experimentally defined transcription start sites. Originally, EPD was a manually curated resource relying on transcript mapping experiments targeted at individual genes and published in academic journals. More recently, automatically generated promoter collections derived from electronically distributed high-throughput data produced with the CAGE or TSS-Seq protocols were added as part of a special subsection named EPDnew. The EPD web server offers additional services, including an entry viewer which enables users to explore the genomic context of a promoter in a UCSC Genome Browser window, and direct links for uploading EPD-derived promoter subsets to associated web-based promoter analysis tools of the Signal Search Analysis (SSA) and ChIP-Seq servers. EPD also features a collection of position weight matrices (PWMs) for common promoter sequence motifs.
PomBase is a model organism database that provides online access to the fission yeast Schizosaccharomyces pombe genome sequence and annotated features, together with a wide range of manually curated functional gene-specific data. The PomBase website was redeveloped in 2016 to provide users with a more fully integrated, better-performing service.
Model organism databases (MODs) are biological databases, or knowledgebases, dedicated to the provision of in-depth biological data for intensively studied model organisms. MODs allow researchers to easily find background information on large sets of genes, plan experiments efficiently, combine their data with existing knowledge, and construct novel hypotheses. They allow users to analyse results and interpret datasets, and the data they generate are increasingly used to describe less well studied species. Where possible, MODs share common approaches to collect and represent biological information. For example, all MODs use the Gene Ontology (GO) to describe functions, processes and cellular locations of specific gene products. Projects also exist to enable software sharing for curation, visualization and querying between different MODs. Organismal diversity and varying user requirements however mean that MODs are often required to customize capture, display, and provision of data.
The Histone Database is a comprehensive database of histone protein sequences including histone variants, classified by histone types and variants, maintained by National Center for Biotechnology Information. The creation of the Histone Database was stimulated by the X-ray analysis of the structure of the nucleosomal core histone octamer followed by the application of a novel motif searching method to a group of proteins containing the histone fold motif in the early-mid-1990. The first version of the Histone Database was released in 1995 and several updates have been released since then.
An array of protein tandem repeats is defined as several adjacent copies having the same or similar sequence motifs. These periodic sequences are generated by internal duplications in both coding and non-coding genomic sequences. Repetitive units of protein tandem repeats are considerably diverse, ranging from the repetition of a single amino acid to domains of 100 or more residues.
Toby James Gibson is a group leader and biochemist at the European Molecular Biology Laboratory (EMBL) in Heidelberg known for his work on Clustal. According to Nature, Gibson's co-authored papers describing Clustal are among the top ten most highly cited scientific papers of all time.
Small integral membrane protein 14, also known as SMIM14 or C4orf34, is a protein encoded on chromosome 4 of the human genome by the SMIM14 gene. SMIM14 has at least 298 orthologs mainly found in jawed vertebrates and no paralogs. SMIM14 is classified as a type I transmembrane protein. While this protein is not well understood by the scientific community, the transmembrane domain of SMIM14 may be involved in ER retention.
Jakub Paś is a Polish scientist and developer. He is a doctor of chemistry at the Faculty of Chemistry of Adam Mickiewicz University in Poznań.