Content | |
---|---|
Description | MobiDB database of protein disorder and mobility annotations |
Data types captured | Annotation of protein mobility and disorder |
Organisms | All |
Contact | |
Research center | Department of Biomedical Sciences at University of Padua |
Laboratory | BioComputing UP |
Primary citation | PMID 29136219 |
Release date | December 2017 |
Access | |
Data format | JSON |
Website | mobidb.org |
Web service URL | REST API see info here |
Miscellaneous | |
License | CC BY 4.0 |
Version | 3.0.0 |
Curation policy | Yes - manual and automatic |
In molecular biology, MobiDB [1] [2] [3] is a curated biological database designed to offer a centralized resource for annotations of intrinsic protein disorder. Protein disorder is a structural feature characterizing a large number of proteins with prominent members known as intrinsically unstructured (or disordered) proteins. The database features three levels of annotation: manually curated, indirect and predicted. By combining different data sources of protein disorder into a consensus annotation, MobiDB aims at giving the best possible picture of the "disorder landscape" of a given protein of interest.
Curated data for MobiDB is obtained from DisProt [4] database giving information and disorder annotation manually extracted from literature. In order to complement disorder annotation, MobiDB features additional annotations from external sources:
A great variety of intrinsic protein disorder predictors have been trained in the last decade. The bulk of them are trained to mimic the nature of the annotations previously described. Since MobiDB currently covers the full set of UniProt sequences, the included predictors need to be extremely fast. Ten predictors currently included (ESpritz in its three flavours, IUPred in its two flavours, DisEMBL in two of its flavours, GlobPlot, VSL2b and JRONN) enable MobiDB to provide disorder annotations for every protein, even when no curated or indirect data is available.
In order to provide the best possible annotation for a given protein, MobiDB combines all its data sources into a consensus annotation. This annotation differs from the ones belonging to the sources themselves in that it features a third state, in addition to "structured" and "disordered": when two authoritative sources disagree, it displays the region as "ambiguous". With the currently available annotations, this conflict arises when a manually curated source annotates a certain region as disordered, and yet there is a PDB structure available for that same region.
MobiDB website provides users with an interface to search by UniProt ID, protein name or free text. Following the submission, users are presented with a list of proteins each one annotated with disorder information integrated from various sources including consensus disorder prediction.
MobiDB web-server exposes some RESTful endpoints allowing programmatic access to MobiDB and retrieval of different data types. Available GET routes provide access to UniProt, STRING, Pfam and disorder data in JSON format.
BioJava is an open-source software project dedicated to provide Java tools to process biological data. BioJava is a set of library functions written in the programming language Java for manipulating sequences, protein structures, file parsers, Common Object Request Broker Architecture (CORBA) interoperability, Distributed Annotation System (DAS), access to AceDB, dynamic programming, and simple statistical routines. BioJava supports a range of data, starting from DNA and protein sequences to the level of 3D protein structures. The BioJava libraries are useful for automating many daily and mundane bioinformatics tasks such as to parsing a Protein Data Bank (PDB) file, interacting with Jmol and many more. This application programming interface (API) provides various file parsers, data models and algorithms to facilitate working with the standard data formats and enables rapid application development and analysis.
The Structural Classification of Proteins (SCOP) database is a largely manual classification of protein structural domains based on similarities of their structures and amino acid sequences. A motivation for this classification is to determine the evolutionary relationship between proteins. Proteins with the same shapes but having little sequence or functional similarity are placed in different superfamilies, and are assumed to have only a very distant common ancestor. Proteins having the same shape and some similarity of sequence and/or function are placed in "families", and are assumed to have a closer common ancestor.
The CATH Protein Structure Classification database is a free, publicly available online resource that provides information on the evolutionary relationships of protein domains. It was created in the mid-1990s by Professor Christine Orengo and colleagues including Janet Thornton and David Jones, and continues to be developed by the Orengo group at University College London. CATH shares many broad features with the SCOP resource, however there are also many areas in which the detailed classification differs greatly.
UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from the research literature. It is maintained by the UniProt consortium, which consists of several European bioinformatics organisations and a foundation from Washington, DC, USA.
The European Bioinformatics Institute (EMBL-EBI) is an intergovernmental organization (IGO) which, as part of the European Molecular Biology Laboratory (EMBL) family, focuses on research and services in bioinformatics. It is located on the Wellcome Genome Campus in Hinxton near Cambridge, and employs over 600 full-time equivalent (FTE) staff.
In molecular biology, an intrinsically disordered protein (IDP) is a protein that lacks a fixed or ordered three-dimensional structure, typically in the absence of its macromolecular interaction partners, such as other proteins or RNA. IDPs range from fully unstructured to partially structured and include random coil, molten globule-like aggregates, or flexible linkers in large multi-domain proteins. They are sometimes considered as a separate class of proteins along with globular, fibrous and membrane proteins.
Pfam is a database of protein families that includes their annotations and multiple sequence alignments generated using hidden Markov models. Last version of Pfam, 36.0, was released in September 2023 and contains 20,795 families. It is currently provided through InterPro database.
InterPro is a database of protein families, protein domains and functional sites in which identifiable features found in known proteins can be applied to new protein sequences in order to functionally characterise them.
PROSITE is a protein database. It consists of entries describing the protein families, domains and functional sites as well as amino acid patterns and profiles in them. These are manually curated by a team of the Swiss Institute of Bioinformatics and tightly integrated into Swiss-Prot protein annotation. PROSITE was created in 1988 by Amos Bairoch, who directed the group for more than 20 years. Since July 2018, the director of PROSITE and Swiss-Prot is Alan Bridge.
PDBsum is a database that provides an overview of the contents of each 3D macromolecular structure deposited in the Protein Data Bank (PDB).
OMPdb is a dedicated database that contains beta barrel (β-barrel) outer membrane proteins from Gram-negative bacteria. Such proteins are responsible for a broad range of important functions, like passive nutrient uptake, active transport of large molecules, protein secretion, as well as adhesion to host cells, through which bacteria expose their virulence activity.
The Protein Common Interface Database (ProtCID) is a database of similar protein-protein interfaces in crystal structures of homologous proteins.
DisProt is a manually curated biological database of intrinsically disordered proteins (IDPs) and regions (IDRs). DisProt annotations cover state information on the protein but also, when available, its state transitions, interactions and functional aspects of disorder detected by specific experimental methods. DisProt is hosted and maintained in the BioComputing UP laboratory.
Computer Atlas of Surface Topography of Proteins (CASTp) aims to provide comprehensive and detailed quantitative characterization of topographic features of protein, is now updated to version 3.0. Since its release in 2006, the CASTp server has ≈45000 visits and fulfills ≈33000 calculation requests annually. CASTp has been proven as a confident tool for a wide range of researches, including investigations of signaling receptors, discoveries of cancer therapeutics, understanding of mechanism of drug actions, studies of immune disorder diseases, analysis of protein–nanoparticle interactions, inference of protein functions and development of high-throughput computational tools. This server is maintained by Jie Liang's lab in University of Illinois at Chicago.
In bioinformatics, a Gene Disease Database is a systematized collection of data, typically structured to model aspects of reality, in a way to comprehend the underlying mechanisms of complex diseases, by understanding multiple composite interactions between phenotype-genotype relationships and gene-disease mechanisms. Gene Disease Databases integrate human gene-disease associations from various expert curated databases and text mining derived associations including Mendelian, complex and environmental diseases.
An array of protein tandem repeats is defined as several adjacent copies having the same or similar sequence motifs. These periodic sequences are generated by internal duplications in both coding and non-coding genomic sequences. Repetitive units of protein tandem repeats are considerably diverse, ranging from the repetition of a single amino acid to domains of 100 or more residues.
In molecular biology, MvirDB was a publicly available database that stored information on toxins, virulence factors and antibiotic resistance genes. Sources that this database used for DNA and protein information included: Tox-Prot, SCORPION, the PRINTS Virulence Factors, VFDB, TVFac, Islander, ARGO and VIDA. The database provided a BLAST tool that allowed the user to query their sequence against all DNA and protein sequences in MvirDB. Information on virulence factors could be obtained from the usage of the provided browser tool. Once the browser tool was used, the results were returned as a readable table that was organized by ascending E-Values, each of which were hyperlinked to their related page. MvirDB was implemented in an Oracle 10g relational database. MvirDB appears to have been inactive for some time, and is therefore not current. The last available snapshot was made on August 2, 2017.