DisProt

Last updated
DisProt
DisProt 8 logo.svg
Content
DescriptionManually curated database of Intrinsically Disordered Proteins (IDPs) and regions (IDRs)
Data types
captured
Intrinsically Disordered Proteins
Organisms all
Contact
LaboratoryBioComputing UP laboratory (Dept. of Biomedical Sciences, University of Padova)
Primary citation PMID   34850135
Access
Website https://disprot.org/
Download URL https://disprot.org/download
Miscellaneous
License Creative Commons Attribution 4.0 International (CC BY 4.0) License
Curation policyManual curation from professional and community biocurators

DisProt is a manually curated biological database of intrinsically disordered proteins (IDPs) and regions (IDRs). [1] [2] [3] DisProt annotations cover state information on the protein but also, when available, its state transitions, interactions and functional aspects of disorder detected by specific experimental methods. DisProt is hosted and maintained in the BioComputing UP laboratory (Dept. of Biomedical Sciences, University of Padua).

Contents

Website

The latest DisProt version, DisProt 9, [3] includes more than 2300 protein entries and more than 4500 pieces of evidence of structural state, state transitions, interactions and functions, along with more than 2500 scientific publications annotated.

Biocuration in DisProt

DisProt entries are annotated by professional and community biocurators from experimental data published in scientific literature. The DisProt home page features examples of DisProt entries, i.e. p53 and Catenin beta-1, along with entries of proteins belonging to the SARS-CoV-2 virus, e.g. Nucleoprotein.

DisProt 9 DisProt 9.0.1 homepage.png
DisProt 9

Thematic datasets

Starting 2020, DisProt releases ‘thematic datasets’ describing biological areas where IDPs are involved in and play a crucial role. [3] All the entries belonging to these datasets are tagged with the name of the theme.

  • Unicellular toxins and antitoxins (DisProt release 2020_12)
  • Extracellular matrix proteins (DisProt release 2021_06)
  • Viral proteins (DisProt release 2021_12)

Model organism entries

In the DisProt home page model organisms are represented by an icon, the name of the species and the number of DisProt entries belonging to each specific organism. Entries from the following organisms are accessible from the DisProt home page under the ‘Organisms’ section and can be downloaded as single files: Homo sapiens, Mus musculus, Rattus norvegicus, Saccharomices cerevisiae, Escherichia coli, Arabidopsis thaliana, Drosophila melanogaster, Caenorhabditis elegans.

DisProt versions and releases

DisProt versions and releases include changes to the website and to the manually curated content of the database.

DisProt ontologies

DisProt uses three different ontologies to annotate disordered regions, the Intrinsically Disordered Proteins Ontology (IDPO), the Evidence and Conclusion Ontology (ECO) and the Gene Ontology (GO). DisProt has a dedicated page for each IDPO term that include the identifier, name and definition of the term and cross-references to external ontologies, e.g. Gene Ontology. Each IDPO term page list all the DisProt entries annotated with that specific term.

Related Research Articles

The Gene Ontology (GO) is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species. More specifically, the project aims to: 1) maintain and develop its controlled vocabulary of gene and gene product attributes; 2) annotate genes and gene products, and assimilate and disseminate annotation data; and 3) provide tools for easy access to all aspects of the data provided by the project, and to enable functional interpretation of experimental data using the GO, for example via enrichment analysis. GO is part of a larger classification effort, the Open Biomedical Ontologies, being one of the Initial Candidate Members of the OBO Foundry.

<span class="mw-page-title-main">UniProt</span> Database of protein sequences and functional information

UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from the research literature. It is maintained by the UniProt consortium, which consists of several European bioinformatics organisations and a foundation from Washington, DC, USA.

BRENDA is the world's most comprehensive online database for functional, biochemical and molecular biological data on enzymes, metabolites and metabolic pathways. It contains data on the properties, function and significance of all enzymes classified by the Enzyme Commission of the International Union of Biochemistry and Molecular Biology (IUBMB) classified enzymes. As ELIXIR Core Data Resource, BRENDA is considered a data resource of critical importance to the international life sciences research community. The database compiles a representative overview of enzymes and metabolites using current research data from primary scientific literature and thus serves the purpose of facilitating information retrieval for researchers. BRENDA is subject to the terms of the Creative Commons license, is accessible worldwide and can be used free of charge. As one of the digital resources of the Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures, BRENDA is part of the integrated biodata infrastructure DSMZ Digital Diversity.

The Protein Information Resource (PIR), located at Georgetown University Medical Center, is an integrated public bioinformatics resource to support genomic and proteomic research, and scientific studies. It contains protein sequences databases

<span class="mw-page-title-main">Ensembl genome database project</span> Scientific project at the European Bioinformatics Institute

Ensembl genome database project is a scientific project at the European Bioinformatics Institute, which provides a centralized resource for geneticists, molecular biologists and other researchers studying the genomes of our own species and other vertebrates and model organisms. Ensembl is one of several well known genome browsers for the retrieval of genomic information.

The European Bioinformatics Institute (EMBL-EBI) is an intergovernmental organization (IGO) which, as part of the European Molecular Biology Laboratory (EMBL) family, focuses on research and services in bioinformatics. It is located on the Wellcome Genome Campus in Hinxton near Cambridge, and employs over 600 full-time equivalent (FTE) staff.

InterPro is a database of protein families, protein domains and functional sites in which identifiable features found in known proteins can be applied to new protein sequences in order to functionally characterise them.

<span class="mw-page-title-main">PROSITE</span> Database of protein domains, families and functional sites

PROSITE is a protein database. It consists of entries describing the protein families, domains and functional sites as well as amino acid patterns and profiles in them. These are manually curated by a team of the Swiss Institute of Bioinformatics and tightly integrated into Swiss-Prot protein annotation. PROSITE was created in 1988 by Amos Bairoch, who directed the group for more than 20 years. Since July 2018, the director of PROSITE and Swiss-Prot is Alan Bridge.

Expasy is an online bioinformatics resource operated by the SIB Swiss Institute of Bioinformatics. It is an extensible and integrative portal which provides access to over 160 databases and software tools and supports a range of life science and clinical research areas, from genomics, proteomics and structural biology, to evolution and phylogeny, systems biology and medical chemistry. The individual resources are hosted in a decentralized way by different groups of the SIB Swiss Institute of Bioinformatics and partner institutions.

<span class="mw-page-title-main">PHI-base</span> Biological database

The Pathogen-Host Interactions database (PHI-base) is a biological database that contains manually curated information on genes experimentally proven to affect the outcome of pathogen-host interactions. The database has been maintained by researchers at Rothamsted Research and external collaborators since 2005. PHI-base has been part of the UK node of ELIXIR, the European life-science infrastructure for biological information, since 2016.

<span class="mw-page-title-main">MicrobesOnline</span>

MicrobesOnline is a publicly and freely accessible website that hosts multiple comparative genomic tools for comparing microbial species at the genomic, transcriptomic and functional levels. MicrobesOnline was developed by the Virtual Institute for Microbial Stress and Survival, which is based at the Lawrence Berkeley National Laboratory in Berkeley, California. The site was launched in 2005, with regular updates until 2011.

SUPERFAMILY is a database and search platform of structural and functional annotation for all proteins and genomes. It classifies amino acid sequences into known structural domains, especially into SCOP superfamilies. Domains are functional, structural, and evolutionary units that form proteins. Domains of common Ancestry are grouped into superfamilies. The domains and domain superfamilies are defined and described in SCOP. Superfamilies are groups of proteins which have structural evidence to support a common evolutionary ancestor but may not have detectable sequence homology.

<span class="mw-page-title-main">DNA annotation</span> The process of describing the structure and function of a genome

In molecular biology and genetics, DNA annotation or genome annotation is the process of describing the structure and function of the components of a genome, by analyzing and interpreting them in order to extract their biological significance and understand the biological processes in which they participate. Among other things, it identifies the locations of genes and all the coding regions in a genome and determines what those genes do.

OMPdb is a dedicated database that contains beta barrel (β-barrel) outer membrane proteins from Gram-negative bacteria. Such proteins are responsible for a broad range of important functions, like passive nutrient uptake, active transport of large molecules, protein secretion, as well as adhesion to host cells, through which bacteria expose their virulence activity.

In bioinformatics, the PANTHER classification system is a large curated biological database of gene/protein families and their functionally related subfamilies that can be used to classify and identify the function of gene products. PANTHER is part of the Gene Ontology Reference Genome Project designed to classify proteins and their genes for high-throughput analysis.

In molecular biology, MobiDB is a curated biological database designed to offer a centralized resource for annotations of intrinsic protein disorder. Protein disorder is a structural feature characterizing a large number of proteins with prominent members known as intrinsically unstructured proteins. The database features three levels of annotation: manually curated, indirect and predicted. By combining different data sources of protein disorder into a consensus annotation, MobiDB aims at giving the best possible picture of the "disorder landscape" of a given protein of interest.

In molecular biology, MvirDB was a publicly available database that stored information on toxins, virulence factors and antibiotic resistance genes. Sources that this database used for DNA and protein information included: Tox-Prot, SCORPION, the PRINTS Virulence Factors, VFDB, TVFac, Islander, ARGO and VIDA. The database provided a BLAST tool that allowed the user to query their sequence against all DNA and protein sequences in MvirDB. Information on virulence factors could be obtained from the usage of the provided browser tool. Once the browser tool was used, the results were returned as a readable table that was organized by ascending E-Values, each of which were hyperlinked to their related page. MvirDB was implemented in an Oracle 10g relational database. MvirDB appears to have been inactive for some time, and is therefore not current. The last available snapshot was made on August 2, 2017.

Biocuration is the field of life sciences dedicated to organizing biomedical data, information and knowledge into structured formats, such as spreadsheets, tables and knowledge graphs. The biocuration of biomedical knowledge is made possible by the cooperative work of biocurators, software developers and bioinformaticians and is at the base of the work of biological databases.

References

  1. Vucetic, Slobodan; Obradovic, Zoran; Vacic, Vladimir; Radivojac, Predrag; Peng, Kang; Iakoucheva, Lilia M.; Cortese, Marc S.; Lawson, J. David; Brown, Celeste J. (2005-01-01). "DisProt: a database of protein disorder". Bioinformatics. 21 (1): 137–140. doi: 10.1093/bioinformatics/bth476 . ISSN   1367-4803. PMID   15310560.
  2. Sickmeier, Megan; Hamilton, Justin A.; LeGall, Tanguy; Vacic, Vladimir; Cortese, Marc S.; Tantos, Agnes; Szabo, Beata; Tompa, Peter; Chen, Jake (2007-01-01). "DisProt: the Database of Disordered Proteins". Nucleic Acids Research. 35 (Database issue): D786–793. doi:10.1093/nar/gkl893. ISSN   1362-4962. PMC   1751543 . PMID   17145717.
  3. 1 2 3 4 Quaglia, Federica; Mészáros, Bálint; Salladini, Edoardo; Hatos, András; Pancsa, Rita; Chemes, Lucía B.; Pajkos, Mátyás; Lazar, Tamas; Peña-Díaz, Samuel; Santos, Jaime; Ács, Veronika (2021-11-25). "DisProt in 2022: improved quality and accessibility of protein intrinsic disorder annotation". Nucleic Acids Research. 50 (D1): D480–D487. doi:10.1093/nar/gkab1082. ISSN   1362-4962. PMC   8728214 . PMID   34850135.
  4. Piovesan, Damiano; Tabaro, Francesco; Mičetić, Ivan; Necci, Marco; Quaglia, Federica; Oldfield, Christopher J.; Aspromonte, Maria Cristina; Davey, Norman E.; Davidović, Radoslav (2016-11-28). "DisProt 7.0: a major update of the database of disordered proteins". Nucleic Acids Research. 45 (D1): D219–D227. doi:10.1093/nar/gkw1056. ISSN   1362-4962. PMC   5210544 . PMID   27899601.
  5. Hatos, András; Hajdu-Soltész, Borbála; Monzon, Alexander M.; Palopoli, Nicolas; Álvarez, Lucía; Aykac-Fas, Burcu; Bassot, Claudio; Benítez, Guillermo I.; Bevilacqua, Martina; Chasapi, Anastasia; Chemes, Lucia (2019). "DisProt: intrinsic protein disorder annotation in 2020". Nucleic Acids Research. 48 (D1): D269–D276. doi: 10.1093/nar/gkz975 . PMC   7145575 . PMID   31713636.
  6. Kovačević JJ (June 2012). "Computational analysis of position-dependent disorder content in DisProt database". Genomics Proteomics Bioinformatics. 10 (3): 158–65. doi:10.1016/j.gpb.2012.01.002. PMC   5056116 . PMID   22917189.