MobiDB

Last updated
MobiDB
MobiDB logo (vector).svg
Content
DescriptionMobiDB database of protein disorder and mobility annotations
Data types
captured
Annotation of protein mobility and disorder
Organisms All
Contact
Research center Department of Biomedical Sciences at University of Padua
Laboratory BioComputing UP
Primary citation PMID   29136219
Release dateDecember 2017
Access
Data format JSON
Website mobidb.org
Web service URL REST API see info here
Miscellaneous
License CC BY 4.0
Version3.0.0
Curation policyYes - manual and automatic

In molecular biology, MobiDB [1] [2] [3] is a curated biological database designed to offer a centralized resource for annotations of intrinsic protein disorder. Protein disorder is a structural feature characterizing a large number of proteins with prominent members known as intrinsically unstructured (or disordered) proteins. The database features three levels of annotation: manually curated, indirect and predicted. By combining different data sources of protein disorder into a consensus annotation, MobiDB aims at giving the best possible picture of the "disorder landscape" of a given protein of interest.

Contents

MobiDB data sources

Curated data and additional annotation

Curated data for MobiDB is obtained from DisProt [4] database giving information and disorder annotation manually extracted from literature. In order to complement disorder annotation, MobiDB features additional annotations from external sources:

Indirect sources

Predictions

A great variety of intrinsic protein disorder predictors have been trained in the last decade. The bulk of them are trained to mimic the nature of the annotations previously described. Since MobiDB currently covers the full set of UniProt sequences, the included predictors need to be extremely fast. Ten predictors currently included (ESpritz in its three flavours, IUPred in its two flavours, DisEMBL in two of its flavours, GlobPlot, VSL2b and JRONN) enable MobiDB to provide disorder annotations for every protein, even when no curated or indirect data is available.

MobiDB consensus

In order to provide the best possible annotation for a given protein, MobiDB combines all its data sources into a consensus annotation. This annotation differs from the ones belonging to the sources themselves in that it features a third state, in addition to "structured" and "disordered": when two authoritative sources disagree, it displays the region as "ambiguous". With the currently available annotations, this conflict arises when a manually curated source annotates a certain region as disordered, and yet there is a PDB structure available for that same region.

Website

MobiDB website provides users with an interface to search by UniProt ID, protein name or free text. Following the submission, users are presented with a list of proteins each one annotated with disorder information integrated from various sources including consensus disorder prediction.

MobiDB web-server exposes some RESTful endpoints allowing programmatic access to MobiDB and retrieval of different data types. Available GET routes provide access to UniProt, STRING, Pfam and disorder data in JSON format.

Related Research Articles

BioJava is an open-source software project dedicated to provide Java tools to process biological data. BioJava is a set of library functions written in the programming language Java for manipulating sequences, protein structures, file parsers, Common Object Request Broker Architecture (CORBA) interoperability, Distributed Annotation System (DAS), access to AceDB, dynamic programming, and simple statistical routines. BioJava supports a range of data, starting from DNA and protein sequences to the level of 3D protein structures. The BioJava libraries are useful for automating many daily and mundane bioinformatics tasks such as to parsing a Protein Data Bank (PDB) file, interacting with Jmol and many more. This application programming interface (API) provides various file parsers, data models and algorithms to facilitate working with the standard data formats and enables rapid application development and analysis.

<span class="mw-page-title-main">Structural Classification of Proteins database</span> Biological database of proteins

The Structural Classification of Proteins (SCOP) database is a largely manual classification of protein structural domains based on similarities of their structures and amino acid sequences. A motivation for this classification is to determine the evolutionary relationship between proteins. Proteins with the same shapes but having little sequence or functional similarity are placed in different superfamilies, and are assumed to have only a very distant common ancestor. Proteins having the same shape and some similarity of sequence and/or function are placed in "families", and are assumed to have a closer common ancestor.

<span class="mw-page-title-main">CATH database</span>

The CATH Protein Structure Classification database is a free, publicly available online resource that provides information on the evolutionary relationships of protein domains. It was created in the mid-1990s by Professor Christine Orengo and colleagues including Janet Thornton and David Jones, and continues to be developed by the Orengo group at University College London. CATH shares many broad features with the SCOP resource, however there are also many areas in which the detailed classification differs greatly.

<span class="mw-page-title-main">UniProt</span> Database of protein sequences and functional information

UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from the research literature. It is maintained by the UniProt consortium, which consists of several European bioinformatics organisations and a foundation from Washington, DC, USA.

The European Bioinformatics Institute (EMBL-EBI) is an intergovernmental organization (IGO) which, as part of the European Molecular Biology Laboratory (EMBL) family, focuses on research and services in bioinformatics. It is located on the Wellcome Genome Campus in Hinxton near Cambridge, and employs over 600 full-time equivalent (FTE) staff.

<span class="mw-page-title-main">Intrinsically disordered proteins</span> Protein without a fixed 3D structure

In molecular biology, an intrinsically disordered protein (IDP) is a protein that lacks a fixed or ordered three-dimensional structure, typically in the absence of its macromolecular interaction partners, such as other proteins or RNA. IDPs range from fully unstructured to partially structured and include random coil, molten globule-like aggregates, or flexible linkers in large multi-domain proteins. They are sometimes considered as a separate class of proteins along with globular, fibrous and membrane proteins.

<span class="mw-page-title-main">Pfam</span> Database of protein families

Pfam is a database of protein families that includes their annotations and multiple sequence alignments generated using hidden Markov models. Last version of Pfam, 36.0, was released in September 2023 and contains 20,795 families. It is currently provided through InterPro database.

InterPro is a database of protein families, protein domains and functional sites in which identifiable features found in known proteins can be applied to new protein sequences in order to functionally characterise them.

<span class="mw-page-title-main">PROSITE</span> Database of protein domains, families and functional sites

PROSITE is a protein database. It consists of entries describing the protein families, domains and functional sites as well as amino acid patterns and profiles in them. These are manually curated by a team of the Swiss Institute of Bioinformatics and tightly integrated into Swiss-Prot protein annotation. PROSITE was created in 1988 by Amos Bairoch, who directed the group for more than 20 years. Since July 2018, the director of PROSITE and Swiss-Prot is Alan Bridge.

PDBsum is a database that provides an overview of the contents of each 3D macromolecular structure deposited in the Protein Data Bank (PDB).

OMPdb is a dedicated database that contains beta barrel (β-barrel) outer membrane proteins from Gram-negative bacteria. Such proteins are responsible for a broad range of important functions, like passive nutrient uptake, active transport of large molecules, protein secretion, as well as adhesion to host cells, through which bacteria expose their virulence activity.

<span class="mw-page-title-main">ProtCID</span>

The Protein Common Interface Database (ProtCID) is a database of similar protein-protein interfaces in crystal structures of homologous proteins.

DisProt is a manually curated biological database of intrinsically disordered proteins (IDPs) and regions (IDRs). DisProt annotations cover state information on the protein but also, when available, its state transitions, interactions and functional aspects of disorder detected by specific experimental methods. DisProt is hosted and maintained in the BioComputing UP laboratory.

Computer Atlas of Surface Topography of Proteins (CASTp) aims to provide comprehensive and detailed quantitative characterization of topographic features of protein, is now updated to version 3.0. Since its release in 2006, the CASTp server has ≈45000 visits and fulfills ≈33000 calculation requests annually. CASTp has been proven as a confident tool for a wide range of researches, including investigations of signaling receptors, discoveries of cancer therapeutics, understanding of mechanism of drug actions, studies of immune disorder diseases, analysis of protein–nanoparticle interactions, inference of protein functions and development of high-throughput computational tools. This server is maintained by Jie Liang's lab in University of Illinois at Chicago.

In bioinformatics, a Gene Disease Database is a systematized collection of data, typically structured to model aspects of reality, in a way to comprehend the underlying mechanisms of complex diseases, by understanding multiple composite interactions between phenotype-genotype relationships and gene-disease mechanisms. Gene Disease Databases integrate human gene-disease associations from various expert curated databases and text mining derived associations including Mendelian, complex and environmental diseases.

<span class="mw-page-title-main">Protein tandem repeats</span>

An array of protein tandem repeats is defined as several adjacent copies having the same or similar sequence motifs. These periodic sequences are generated by internal duplications in both coding and non-coding genomic sequences. Repetitive units of protein tandem repeats are considerably diverse, ranging from the repetition of a single amino acid to domains of 100 or more residues.

In molecular biology, MvirDB was a publicly available database that stored information on toxins, virulence factors and antibiotic resistance genes. Sources that this database used for DNA and protein information included: Tox-Prot, SCORPION, the PRINTS Virulence Factors, VFDB, TVFac, Islander, ARGO and VIDA. The database provided a BLAST tool that allowed the user to query their sequence against all DNA and protein sequences in MvirDB. Information on virulence factors could be obtained from the usage of the provided browser tool. Once the browser tool was used, the results were returned as a readable table that was organized by ascending E-Values, each of which were hyperlinked to their related page. MvirDB was implemented in an Oracle 10g relational database. MvirDB appears to have been inactive for some time, and is therefore not current. The last available snapshot was made on August 2, 2017.

References

  1. Di Domenico, Tomás; Walsh, Ian; Martin, Alberto J. M.; Tosatto, Silvio C. E. (2012-08-01). "MobiDB: a comprehensive database of intrinsic protein disorder annotations". Bioinformatics. 28 (15): 2080–2081. doi: 10.1093/bioinformatics/bts327 . ISSN   1367-4811. PMID   22661649.
  2. Potenza, Emilio; Di Domenico, Tomás; Walsh, Ian; Tosatto, Silvio C. E. (2015-01-01). "MobiDB 2.0: an improved database of intrinsically disordered and mobile proteins". Nucleic Acids Research. 43 (Database issue): D315–320. doi:10.1093/nar/gku982. ISSN   1362-4962. PMC   4384034 . PMID   25361972.
  3. Piovesan, Damiano; Tabaro, Francesco; Paladin, Lisanna; Necci, Marco; Micetic, Ivan; Camilloni, Carlo; Davey, Norman; Dosztányi, Zsuzsanna; Mészáros, Bálint (2018-01-04). "MobiDB 3.0: more annotations for intrinsic disorder, conformational diversity and interactions in proteins". Nucleic Acids Research. 46 (D1): D471–D476. doi:10.1093/nar/gkx1071. PMC   5753340 . PMID   29136219.
  4. Piovesan, Damiano; Tabaro, Francesco; Mičetić, Ivan; Necci, Marco; Quaglia, Federica; Oldfield, Christopher J.; Aspromonte, Maria Cristina; Davey, Norman E.; Davidović, Radoslav (2016-12-13). "DisProt 7.0: a major update of the database of disordered proteins". Nucleic Acids Research. 45 (D1): D1123–D1124. doi:10.1093/nar/gkw1279. ISSN   1362-4962. PMC   5210598 . PMID   27965415.