MobiDB

MobiDB
Content
Description	MobiDB database of protein disorder and mobility annotations
Data types; captured	Annotation of protein mobility and disorder
Organisms	All
Contact
Research center	Department of Biomedical Sciences at University of Padua
Laboratory	BioComputing UP
Primary citation	PMID 29136219
Release date	December 2017
Access
Data format	JSON
Website	mobidb.org
Web service URL	REST API see info here
Miscellaneous
License	CC BY 4.0
Version	3.0.0
Curation policy	Yes - manual and automatic

Last updated August 27, 2024

In molecular biology, MobiDB^[1]^[2]^[3] is a curated biological database designed to offer a centralized resource for annotations of intrinsic protein disorder. Protein disorder is a structural feature characterizing a large number of proteins with prominent members known as intrinsically unstructured (or disordered) proteins. The database features three levels of annotation: manually curated, indirect and predicted. By combining different data sources of protein disorder into a consensus annotation, MobiDB aims at giving the best possible picture of the "disorder landscape" of a given protein of interest.

MobiDB data sources

Curated data and additional annotation

Curated data for MobiDB is obtained from DisProt ^[4] database giving information and disorder annotation manually extracted from literature. In order to complement disorder annotation, MobiDB features additional annotations from external sources:

UniProt: Annotations from the UniProt database include organism, subcellular location, tissue specificity, function, relevant sites, relevant regions, post-translational modifications, and linear motifs.
Pfam: protein domain annotations are displayed in graphical form and are link-enabled, allowing the user to visit the corresponding Pfam page for further information.
PDB: Secondary structure is extracted from the PDB whenever available, and displayed in graphical form and in 3D.
STRING: Known interactors with evidence in "database" and "experimental" are displayed in a sortable table.

Indirect sources

PDB X-ray: When a crystallographic experiment is done to try and resolve a protein's structure, there are cases where the position of certain residues can not be accurately determined. One of the possible causes of this is that the residue is part of a flexible/disordered region. For this reason missing residues in PDB experiments are considered an indication of intrinsic disorder.
PDB NMR: Deposited files of NMR experiments for protein structure resolution often contain multiple models, representing different conformations of the same protein. By calculating the differences between the positions of each model's residues, one can measure the degree in which this positions change. This change can be interpreted as a measure of how flexible or disordered a protein is. The MOBI web server (from which the name of this database was derived) automates this calculations taking as input a PDB formatted file.

Predictions

A great variety of intrinsic protein disorder predictors have been trained in the last decade. The bulk of them are trained to mimic the nature of the annotations previously described. Since MobiDB currently covers the full set of UniProt sequences, the included predictors need to be extremely fast. Ten predictors currently included (ESpritz in its three flavours, IUPred in its two flavours, DisEMBL in two of its flavours, GlobPlot, VSL2b and JRONN) enable MobiDB to provide disorder annotations for every protein, even when no curated or indirect data is available.

MobiDB consensus

In order to provide the best possible annotation for a given protein, MobiDB combines all its data sources into a consensus annotation. This annotation differs from the ones belonging to the sources themselves in that it features a third state, in addition to "structured" and "disordered": when two authoritative sources disagree, it displays the region as "ambiguous". With the currently available annotations, this conflict arises when a manually curated source annotates a certain region as disordered, and yet there is a PDB structure available for that same region.

Website

MobiDB website provides users with an interface to search by UniProt ID, protein name or free text. Following the submission, users are presented with a list of proteins each one annotated with disorder information integrated from various sources including consensus disorder prediction.

MobiDB web-server exposes some RESTful endpoints allowing programmatic access to MobiDB and retrieval of different data types. Available GET routes provide access to UniProt, STRING, Pfam and disorder data in JSON format.

External links

MobiDB homepage

Related Research Articles

BioJava is an open-source software project dedicated to provide Java tools to process biological data. BioJava is a set of library functions written in the programming language Java for manipulating sequences, protein structures, file parsers, Common Object Request Broker Architecture (CORBA) interoperability, Distributed Annotation System (DAS), access to AceDB, dynamic programming, and simple statistical routines. BioJava supports a range of data, starting from DNA and protein sequences to the level of 3D protein structures. The BioJava libraries are useful for automating many daily and mundane bioinformatics tasks such as to parsing a Protein Data Bank (PDB) file, interacting with Jmol and many more. This application programming interface (API) provides various file parsers, data models and algorithms to facilitate working with the standard data formats and enables rapid application development and analysis.

The Structural Classification of Proteins (SCOP) database is a largely manual classification of protein structural domains based on similarities of their structures and amino acid sequences. A motivation for this classification is to determine the evolutionary relationship between proteins. Proteins with the same shapes but having little sequence or functional similarity are placed in different superfamilies, and are assumed to have only a very distant common ancestor. Proteins having the same shape and some similarity of sequence and/or function are placed in "families", and are assumed to have a closer common ancestor.

The CATH Protein Structure Classification database is a free, publicly available online resource that provides information on the evolutionary relationships of protein domains. It was created in the mid-1990s by Professor Christine Orengo and colleagues including Janet Thornton and David Jones, and continues to be developed by the Orengo group at University College London. CATH shares many broad features with the SCOP resource, however there are also many areas in which the detailed classification differs greatly.

UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from the research literature. It is maintained by the UniProt consortium, which consists of several European bioinformatics organisations and a foundation from Washington, DC, USA.

The European Bioinformatics Institute (EMBL-EBI) is an intergovernmental organization (IGO) which, as part of the European Molecular Biology Laboratory (EMBL) family, focuses on research and services in bioinformatics. It is located on the Wellcome Genome Campus in Hinxton near Cambridge, and employs over 600 full-time equivalent (FTE) staff.

In molecular biology, an intrinsically disordered protein (IDP) is a protein that lacks a fixed or ordered three-dimensional structure, typically in the absence of its macromolecular interaction partners, such as other proteins or RNA. IDPs range from fully unstructured to partially structured and include random coil, molten globule-like aggregates, or flexible linkers in large multi-domain proteins. They are sometimes considered as a separate class of proteins along with globular, fibrous and membrane proteins.

Pfam is a database of protein families that includes their annotations and multiple sequence alignments generated using hidden Markov models. Last version of Pfam, 36.0, was released in September 2023 and contains 20,795 families. It is currently provided through InterPro database.

InterPro is a database of protein families, protein domains and functional sites in which identifiable features found in known proteins can be applied to new protein sequences in order to functionally characterise them.

<span class="mw-page-title-main">PROSITE</span> Database of protein domains, families and functional sites

PROSITE is a protein database. It consists of entries describing the protein families, domains and functional sites as well as amino acid patterns and profiles in them. These are manually curated by a team of the Swiss Institute of Bioinformatics and tightly integrated into Swiss-Prot protein annotation. PROSITE was created in 1988 by Amos Bairoch, who directed the group for more than 20 years. Since July 2018, the director of PROSITE and Swiss-Prot is Alan Bridge.

PDBsum is a database that provides an overview of the contents of each 3D macromolecular structure deposited in the Protein Data Bank (PDB).

OMPdb is a dedicated database that contains beta barrel (β-barrel) outer membrane proteins from Gram-negative bacteria. Such proteins are responsible for a broad range of important functions, like passive nutrient uptake, active transport of large molecules, protein secretion, as well as adhesion to host cells, through which bacteria expose their virulence activity.

The Protein Common Interface Database (ProtCID) is a database of similar protein-protein interfaces in crystal structures of homologous proteins.

DisProt is a manually curated biological database of intrinsically disordered proteins (IDPs) and regions (IDRs). DisProt annotations cover state information on the protein but also, when available, its state transitions, interactions and functional aspects of disorder detected by specific experimental methods. DisProt is hosted and maintained in the BioComputing UP laboratory.

Computer Atlas of Surface Topography of Proteins (CASTp) aims to provide comprehensive and detailed quantitative characterization of topographic features of protein, is now updated to version 3.0. Since its release in 2006, the CASTp server has ≈45000 visits and fulfills ≈33000 calculation requests annually. CASTp has been proven as a confident tool for a wide range of researches, including investigations of signaling receptors, discoveries of cancer therapeutics, understanding of mechanism of drug actions, studies of immune disorder diseases, analysis of protein–nanoparticle interactions, inference of protein functions and development of high-throughput computational tools. This server is maintained by Jie Liang's lab in University of Illinois at Chicago.

In bioinformatics, a Gene Disease Database is a systematized collection of data, typically structured to model aspects of reality, in a way to comprehend the underlying mechanisms of complex diseases, by understanding multiple composite interactions between phenotype-genotype relationships and gene-disease mechanisms. Gene Disease Databases integrate human gene-disease associations from various expert curated databases and text mining derived associations including Mendelian, complex and environmental diseases.

<span class="mw-page-title-main">Protein tandem repeats</span>

An array of protein tandem repeats is defined as several adjacent copies having the same or similar sequence motifs. These periodic sequences are generated by internal duplications in both coding and non-coding genomic sequences. Repetitive units of protein tandem repeats are considerably diverse, ranging from the repetition of a single amino acid to domains of 100 or more residues.

In molecular biology, MvirDB was a publicly available database that stored information on toxins, virulence factors and antibiotic resistance genes. Sources that this database used for DNA and protein information included: Tox-Prot, SCORPION, the PRINTS Virulence Factors, VFDB, TVFac, Islander, ARGO and VIDA. The database provided a BLAST tool that allowed the user to query their sequence against all DNA and protein sequences in MvirDB. Information on virulence factors could be obtained from the usage of the provided browser tool. Once the browser tool was used, the results were returned as a readable table that was organized by ascending E-Values, each of which were hyperlinked to their related page. MvirDB was implemented in an Oracle 10g relational database. MvirDB appears to have been inactive for some time, and is therefore not current. The last available snapshot was made on August 2, 2017.

References

↑ Di Domenico, Tomás; Walsh, Ian; Martin, Alberto J. M.; Tosatto, Silvio C. E. (2012-08-01). "MobiDB: a comprehensive database of intrinsic protein disorder annotations". Bioinformatics. 28 (15): 2080–2081. doi: 10.1093/bioinformatics/bts327 . ISSN 1367-4811. PMID 22661649.
↑ Potenza, Emilio; Di Domenico, Tomás; Walsh, Ian; Tosatto, Silvio C. E. (2015-01-01). "MobiDB 2.0: an improved database of intrinsically disordered and mobile proteins". Nucleic Acids Research. 43 (Database issue): D315–320. doi:10.1093/nar/gku982. ISSN 1362-4962. PMC 4384034 . PMID 25361972.
↑ Piovesan, Damiano; Tabaro, Francesco; Paladin, Lisanna; Necci, Marco; Micetic, Ivan; Camilloni, Carlo; Davey, Norman; Dosztányi, Zsuzsanna; Mészáros, Bálint (2018-01-04). "MobiDB 3.0: more annotations for intrinsic disorder, conformational diversity and interactions in proteins". Nucleic Acids Research. 46 (D1): D471–D476. doi:10.1093/nar/gkx1071. PMC 5753340 . PMID 29136219.
↑ Piovesan, Damiano; Tabaro, Francesco; Mičetić, Ivan; Necci, Marco; Quaglia, Federica; Oldfield, Christopher J.; Aspromonte, Maria Cristina; Davey, Norman E.; Davidović, Radoslav (2016-12-13). "DisProt 7.0: a major update of the database of disordered proteins". Nucleic Acids Research. 45 (D1): D1123–D1124. doi:10.1093/nar/gkw1279. ISSN 1362-4962. PMC 5210598 . PMID 27965415.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Di Domenico, Tomás; Walsh, Ian; Martin, Alberto J. M.; Tosatto, Silvio C. E. (2012-08-01). "MobiDB: a comprehensive database of intrinsic protein disorder annotations". Bioinformatics. 28 (15): 2080–2081. doi: 10.1093/bioinformatics/bts327 . ISSN 1367-4811. PMID 22661649.

[2] Potenza, Emilio; Di Domenico, Tomás; Walsh, Ian; Tosatto, Silvio C. E. (2015-01-01). "MobiDB 2.0: an improved database of intrinsically disordered and mobile proteins". Nucleic Acids Research. 43 (Database issue): D315–320. doi:10.1093/nar/gku982. ISSN 1362-4962. PMC 4384034 . PMID 25361972.

[3] Piovesan, Damiano; Tabaro, Francesco; Paladin, Lisanna; Necci, Marco; Micetic, Ivan; Camilloni, Carlo; Davey, Norman; Dosztányi, Zsuzsanna; Mészáros, Bálint (2018-01-04). "MobiDB 3.0: more annotations for intrinsic disorder, conformational diversity and interactions in proteins". Nucleic Acids Research. 46 (D1): D471–D476. doi:10.1093/nar/gkx1071. PMC 5753340 . PMID 29136219.

[4] Piovesan, Damiano; Tabaro, Francesco; Mičetić, Ivan; Necci, Marco; Quaglia, Federica; Oldfield, Christopher J.; Aspromonte, Maria Cristina; Davey, Norman E.; Davidović, Radoslav (2016-12-13). "DisProt 7.0: a major update of the database of disordered proteins". Nucleic Acids Research. 45 (D1): D1123–D1124. doi:10.1093/nar/gkw1279. ISSN 1362-4962. PMC 5210598 . PMID 27965415.

[1]

[2]

[3]

[4]

Content

Description	MobiDB database of protein disorder and mobility annotations
Data types captured	Annotation of protein mobility and disorder
Organisms	All
Contact
Research center	Department of Biomedical Sciences at University of Padua
Laboratory	BioComputing UP
Primary citation	PMID 29136219
Release date	December 2017
Access
Data format	JSON
Website	mobidb.org
Web service URL	REST API see info here
Miscellaneous
License	CC BY 4.0
Version	3.0.0
Curation policy	Yes - manual and automatic