PathoPhenoDB

Last updated

PathoPhenoDB is a biological database. [1] The database connects pathogens to their phenotypes using multiple databases such as NCBI, Human Disease Ontology [2] Human Phenotype Ontology, [3] Mammalian Phenotype Ontology, [4] PubChem, SIDER [5] and CARD. [6] Pathogen-disease associations were gathered mainly through the CDC and the List of Infectious Diseases page on Wikipedia. The manner by which they assigned taxonomy was semi-automatic. When mapped against NCBI Taxonomy, if the pathogen was not an exact match, it was then mapped to the parent class. PathoPhenoDB employs NPMI [7] in order to filter pairs based on their co-occurrence statistics.

See also

Related Research Articles

Rat Genome Database

The Rat Genome Database (RGD) is a database of rat genomics, genetics, physiology and functional data, as well as data for comparative genomics between rat, human and mouse. RGD is responsible for attaching biological information to the rat genome via structured vocabulary, or ontology, annotations assigned to genes and quantitative trait loci (QTL), and for consolidating rat strain data and making it available to the research community. They are also developing a suite of tools for mining and analyzing genomic, physiologic and functional data for the rat, and comparative data for rat, mouse, human, and five other species.

Expasy is an online bioinformatics resource operated by the SIB Swiss Institute of Bioinformatics. It is an extensible and integrative portal which provides access to over 160 databases and software tools and supports a range of life science and clinical research areas, from genomics, proteomics and structural biology, to evolution and phylogeny, systems biology and medical chemistry. The individual resources are hosted in a decentralised way by different groups of the SIB Swiss Institute of Bioinformatics and partner institutions.

PHI-base

The Pathogen-Host Interactions database (PHI-base) is a biological database that contains curated information on genes experimentally proven to affect the outcome of pathogen-host interactions. The database is maintained by researchers at Rothamsted Research, together with external collaborators since 2005. Since April 2017 PHI-base is part of ELIXIR, the European life-science infrastructure for biological information via its ELIXIR-UK node.

SUPERFAMILY is a database and search platform of structural and functional annotation for all proteins and genomes. It classifies amino acid sequences into known structural domains, especially into SCOP superfamilies. Domains are functional, structural, and evolutionary units that form proteins. Domains of common Ancestry are grouped into superfamilies. The domains and domain superfamilies are defined and described in SCOP. Superfamilies are groups of proteins which have structural evidence to support a common evolutionary ancestor but may not have detectable sequence homology.

The Disease Ontology (DO) is a formal ontology of human disease. The Disease Ontology project is hosted at the Institute for Genome Sciences at the University of Maryland School of Medicine.

WikiPathways

WikiPathways is a community resource for contributing and maintaining content dedicated to biological pathways. Any registered WikiPathways user can contribute, and anybody can become a registered user. Contributions are monitored by a group of admins, but the bulk of peer review, editorial curation, and maintenance is the responsibility of the user community. WikiPathways is built using MediaWiki software, a custom graphical pathway editing tool (PathVisio) and integrated BridgeDb databases covering major gene, protein, and metabolite systems.

Sequence Read Archive

The Sequence Read Archive is a bioinformatics database that provides a public repository for DNA sequencing data, especially the "short reads" generated by high-throughput sequencing, which are typically less than 1,000 base pairs in length. The archive is part of the International Nucleotide Sequence Database Collaboration (INSDC), and run as a collaboration between the NCBI, the European Bioinformatics Institute (EBI), and the DNA Data Bank of Japan (DDBJ).

PhytoPath

PhytoPath was a joint scientific project between the European Bioinformatics Institute and Rothamsted Research, running from January 2012 to May 30, 2017. The project aimed to enable the exploitation of the growing body of “-omics” data being generated for phytopathogens, their plant hosts and related model species. Gene mutant phenotypic information is directly displayed in genome browsers.

European Nucleotide Archive Online database from the EBI on Nucleotides

The European Nucleotide Archive (ENA) is a repository providing free and unrestricted access to annotated DNA and RNA sequences. It also stores complementary information such as experimental procedures, details of sequence assembly and other metadata related to sequencing projects. The archive is composed of three main databases: the Sequence Read Archive, the Trace Archive and the EMBL Nucleotide Sequence Database. The ENA is produced and maintained by the European Bioinformatics Institute and is a member of the International Nucleotide Sequence Database Collaboration (INSDC) along with the DNA Data Bank of Japan and GenBank.

In bioinformatics, a Gene Disease Database is a systematized collection of data, typically structured to model aspects of reality, in a way to comprehend the underlying mechanisms of complex diseases, by understanding multiple composite interactions between phenotype-genotype relationships and gene-disease mechanisms. Gene Disease Databases integrate human gene-disease associations from various expert curated databases and text mining derived associations including Mendelian, complex and environmental diseases.

Monica Riley was an American scientist who contributed to the discovery of messenger RNA in her Ph.D work with Arthur Pardee, and was later a pioneer in the exploration and computer representation of the Escherichia coli genome.

PomBase is a model organism database that provides online access to the fission yeast Schizosaccharomyces pombe genome sequence and annotated features, together with a wide range of manually curated functional gene-specific data. The PomBase website was redeveloped in 2016 to provide users with a more fully integrated, better-performing service.

Model organism databases (MODs) are biological databases, or knowledgebases, dedicated to the provision of in-depth biological data for intensively studied model organisms. MODs allow researchers to easily find background information on large sets of genes, plan experiments efficiently, combine their data with existing knowledge, and construct novel hypotheses. They allow users to analyse results and interpret datasets, and the data they generate are increasingly used to describe less well studied species. Where possible, MODs share common approaches to collect and represent biological information. For example, all MODs use the Gene Ontology (GO) to describe functions, processes and cellular locations of specific gene products. Projects also exist to enable software sharing for curation, visualization and querying between different MODs. Organismal diversity and varying user requirements however mean that MODs are often required to customize capture, display, and provision of data.

Donna R. Maglott is a staff scientist at the National Center for Biotechnology Information known for her research on large-scale genomics projects, including the mouse genome and development of databases required for genomics research.

VFDB also known as Virulence Factor Database is a database that provides scientist quick access to virulence factors in bacterial pathogens. It can be navigated and browsed using genus or words. A BLAST tool is provided for search against known virulence factors. VFDB contains a collection of 16 important bacterial pathogens. Perl scripts were used to extract positions and sequences of VF from GenBank. Clusters of Orthologous Groups (COG) was used to update incomplete annotations. More information was obtained by NCBI. VFDB was built on Linux operation systems on DELL PowerEdge 1600SC servers.

In molecular biology, MvirDB is a publicly available database that stores information on toxins, virulence factors and antibiotic resistance genes. Sources that this database uses for DNA and protein information include: Tox-Prot, SCORPION, the PRINTS Virulence Factors, VFDB, TVFac, Islander, ARGO and VIDA. The database provides a BLAST tool that allows the user to query their sequence against all DNA and protein sequences in MvirDB. Information on virulence factors can be obtained from the usage of the provided browser tool. Once the browser tool is used, the results are returned as a readable table that is organized by ascending E-Values, each of which are hyperlinked to their related page. MvirDB is implemented in an Oracle 10g relational database.

Jessica Kissinger is a Distinguished Research Professor at the Franklin College of Arts and Sciences, University of Georgia and director of the Institute of Bioinformatics. Her research focus is on the evolution, assembly and data curation of protozoan parasite genomes, particularly Cryptosporidium, Toxoplasma gondii and Plasmodium.

Judith Anne Blake is a computational biologist at the Jackson Laboratory and Professor of Mammalian Genetics.

References

  1. Kafkas, Şenay; Abdelhakim, Marwa; Hashish, Yasmeen; Kulmanov, Maxat; Abdellatif, Marwa; Schofield, Paul N.; Hoehndorf, Robert (2019-06-03). "PathoPhenoDB, linking human pathogens to their phenotypes in support of infectious disease research". Scientific Data. 6 (1): 79. Bibcode:2019NatSD...6...79K. doi:10.1038/s41597-019-0090-x. ISSN   2052-4463. PMC   6546783 . PMID   31160594.
  2. Schriml, Lynn M.; Parkinson, Helen; Vasant, Drashtti; Malone, James; Binder, Janos X.; Mungall, Christopher J.; Fu, Gang; Bolton, Evan; Mitraka, Elvira (2015-01-28). "Disease Ontology 2015 update: an expanded and updated database of human diseases for linking biomedical knowledge through disease data". Nucleic Acids Research. 43 (D1): D1071–D1078. doi:10.1093/nar/gku1011. ISSN   0305-1048. PMC   4383880 . PMID   25348409.
  3. Robinson, Peter N.; Köhler, Sebastian; Bauer, Sebastian; Seelow, Dominik; Horn, Denise; Mundlos, Stefan (2008-11-17). "The Human Phenotype Ontology: A Tool for Annotating and Analyzing Human Hereditary Disease". The American Journal of Human Genetics. 83 (5): 610–615. doi:10.1016/j.ajhg.2008.09.017. ISSN   0002-9297. PMC   2668030 . PMID   18950739.
  4. Richardson, Joel E.; Kadin, James A.; Bult, Carol J.; Blake, Judith A.; Eppig, Janan T. (2015-01-28). "The Mouse Genome Database (MGD): facilitating mouse as a model for human biology and disease". Nucleic Acids Research. 43 (D1): D726–D736. doi:10.1093/nar/gku967. ISSN   0305-1048. PMC   4384027 . PMID   25348401.
  5. Bork, Peer; Jensen, Lars Juhl; Letunic, Ivica; Kuhn, Michael (2016-01-04). "The SIDER database of drugs and side effects". Nucleic Acids Research. 44 (D1): D1075–D1079. doi: 10.1093/nar/gkv1075 . ISSN   0305-1048. PMC   4702794 . PMID   26481350.
  6. McArthur, Andrew G.; Wright, Gerard D.; Brinkman, Fiona S. L.; Johnson, Timothy A.; Pawlowski, Andrew C.; Westman, Erin L.; Sardar, Daim; Elsayegh, Tariq; Frye, Jonathan G. (2017-01-04). "CARD 2017: expansion and model-centric curation of the comprehensive antibiotic resistance database". Nucleic Acids Research. 45 (D1): D566–D573. doi: 10.1093/nar/gkw1004 . ISSN   0305-1048. PMC   5210516 . PMID   27789705.
  7. Church, Kenneth Ward; Hanks, Patrick (March 1990). "Word Association Norms, Mutual Information, and Lexicography". Comput. Linguist. 16 (1): 22–29. ISSN   0891-2017.