PHI-base

Last updated
PHI-base
PHI-base 01.jpg
Content
DescriptionPathogen-Host Interactions database
Data types
captured
phenotypes of microbial mutants
Organisms ~290 fungal, bacterial and protist pathogens of agronomic and medical importance tested on ~240 hosts
Contact
Research center Rothamsted Research
Primary citation PMID   39588765
Release dateMay 2005
Access
Data format XML, FASTA
Website phibase.org
Tools
Web PHI-base Search

PHIB-BLAST

PHI-Canto (Author curation)
Miscellaneous
License Creative Commons Attribution-NoDerivatives 4.0 International License
Versioning Yes
Data release
frequency
6 monthly
Version4.17 (May 2024)
Curation policyManual Curation

The Pathogen-Host Interactions database (PHI-base) [1] is a biological database that contains manually curated information on genes experimentally proven to affect the outcome of pathogen-host interactions. The database has been maintained by researchers at Rothamsted Research and external collaborators since 2005. [2] [3] [4] [5] PHI-base has been part of the UK node of ELIXIR, the European life-science infrastructure for biological information, since 2016. [6]

Contents

Background

The Pathogen-Host Interactions database was developed to utilise the growing number of verified genes that mediate an organism's ability to cause disease and/or trigger host responses. [7]

The web-accessible database catalogues experimentally verified pathogenicity, virulence, and effector genes from bacterial, fungal, and oomycete pathogens which infect animal, plant, and fungal hosts. PHI-base was the first online resource devoted to the identification and presentation of information on fungal and oomycete pathogenicity genes and their host interactions. PHI-base is a resource for the discovery of candidate targets in medically and agronomically important fungal and oomycete pathogens for intervention with synthetic chemistries and natural products (fungicides). [8] [9]

Each entry in PHI-base is curated by domain experts and supported by strong experimental evidence (gene disruption experiments) as well as literature references in which the experiments are described. Each gene in PHI-base is presented with its nucleotide and deduced amino acid sequence as well as a detailed structured description of the predicted protein's function during the host infection process. To facilitate data interoperability, genes are annotated using controlled vocabularies (Gene Ontology terms, EC Numbers, etc.), and links to other external data sources such as UniProt, EMBL, and the NCBI taxonomy services.

Current developments

Version 4.17 (May 2024) of PHI-base [1] provides information on 9973 genes from 296 pathogens and 249 hosts and their impact on 22415 interactions as well on efficacy information on ~20 drugs and the target sequences in the pathogen. PHI-base currently focuses on plant pathogenic and human pathogenic organisms including fungi, oomycetes, and bacteria. The entire contents of the database can be downloaded in a tab delimited format. Since the launch of version 4, the PHI-base is also searchable using the PHIB-BLAST search tool, which uses the BLAST algorithm to compare a user's sequence against the sequences available from PHI-base. [10] The database providers recently announced the launch of PHI-base 5, a new gene-centric version of PHI-base, through a press release on the Rothamsted Research website. A summary of the improvements made is also available.

In 2016 the plant portion of PHI-base was used to establish a Semantic PHI-base search tool. [11]

PHI-base has been aligned with Ensembl Genomes since 2011, FungiDB since 2016, and Global Biotic Interactions (GloBI) since 2018. [12] All new PHI-base releases are integrated by these independent databases.

PHI-base is a resource for many applications including:

› The discovery of conserved genes in medically and agronomically important pathogens, which may be potential targets for chemical intervention

› Comparative genome analyses

› Annotation of newly sequenced pathogen genomes

› Functional interpretation of RNA sequencing and microarray experiments

› The rapid cross-checking of phenotypic differences between pathogenic species when writing articles for peer review

PHI-base use has been cited in over 900 peer-reviewed articles. [1]

Since 2015, the website has linked to an online literature curation tool called PHI-Canto, enabling community-driven literature curation for various pathogenic species. [13] PHI-Canto employs a community curation framework that not only offers a curation tool but also includes a phenotype ontology and controlled vocabularies using unified languages and rules used in biology experiments. The central concept of this framework is the introduction of a 'Metagenotype', which allows the annotation and assignment of phenotypes to specific pathogen mutant-host interactions. PHI-Canto extends the single species curation tool developed for PomBase [14] (https://www.pombase.org), the model organism database for fission yeast.

Funding

PHI-base is a National Capability funded by the Biotechnology and Biological Sciences Research Council (BBSRC), a UK research council. [7]

Related Research Articles

<span class="mw-page-title-main">Wellcome Sanger Institute</span> British genomics research institute

The Wellcome Sanger Institute, previously known as The Sanger Centre and Wellcome Trust Sanger Institute, is a non-profit British genomics and genetics research institute, primarily funded by the Wellcome Trust.

In academia, computational immunology is a field of science that encompasses high-throughput genomic and bioinformatics approaches to immunology. The field's main aim is to convert immunological data into computational problems, solve these problems using mathematical and computational approaches and then convert these results into immunologically meaningful interpretations.

<span class="mw-page-title-main">Amos Bairoch</span> Swiss bioinformatician

Amos Bairoch is a Swiss bioinformatician and Professor of Bioinformatics at the Department of Human Protein Sciences of the University of Geneva where he leads the CALIPHO group at the Swiss Institute of Bioinformatics (SIB) combining bioinformatics, curation, and experimental efforts to functionally characterize human proteins.

<span class="mw-page-title-main">BioGRID</span> Biological database

The Biological General Repository for Interaction Datasets (BioGRID) is a curated biological database of protein-protein interactions, genetic interactions, chemical interactions, and post-translational modifications created in 2003 (originally referred to as simply the General Repository for Interaction Datasets by Mike Tyers, Bobby-Joe Breitkreutz, and Chris Stark at the Lunenfeld-Tanenbaum Research Institute at Mount Sinai Hospital. It strives to provide a comprehensive curated resource for all major model organism species while attempting to remove redundancy to create a single mapping of data. Users of The BioGRID can search for their protein, chemical or publication of interest and retrieve annotation, as well as curated data as reported, by the primary literature and compiled by in house large-scale curation efforts. The BioGRID is hosted in Toronto, Ontario, Canada and Dallas, Texas, United States and is partnered with the Saccharomyces Genome Database, FlyBase, WormBase, PomBase, and the Alliance of Genome Resources. The BioGRID is funded by the NIH and CIHR. BioGRID is an observer member of the International Molecular Exchange Consortium.

<span class="mw-page-title-main">Integrated Microbial Genomes System</span> Genome browsing and annotation platform

The Integrated Microbial Genomes system is a genome browsing and annotation platform developed by the U.S. Department of Energy (DOE)-Joint Genome Institute. IMG contains all the draft and complete microbial genomes sequenced by the DOE-JGI integrated with other publicly available genomes. IMG provides users a set of tools for comparative analysis of microbial genomes along three dimensions: genes, genomes and functions. Users can select and transfer them in the comparative analysis carts based upon a variety of criteria. IMG also includes a genome annotation pipeline that integrates information from several tools, including KEGG, Pfam, InterPro, and the Gene Ontology, among others. Users can also type or upload their own gene annotations and the IMG system will allow them to generate Genbank or EMBL format files containing these annotations.

Mouse Genome Informatics (MGI) is a free, online database and bioinformatics resource hosted by The Jackson Laboratory, with funding by the National Human Genome Research Institute (NHGRI), the National Cancer Institute (NCI), and the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD). MGI provides access to data on the genetics, genomics and biology of the laboratory mouse to facilitate the study of human health and disease. The database integrates multiple projects, with the two largest contributions coming from the Mouse Genome Database and Mouse Gene Expression Database (GXD). As of 2018, MGI contains data curated from over 230,000 publications.

<span class="mw-page-title-main">MicrobesOnline</span>

MicrobesOnline is a publicly and freely accessible website that hosts multiple comparative genomic tools for comparing microbial species at the genomic, transcriptomic and functional levels. MicrobesOnline was developed by the Virtual Institute for Microbial Stress and Survival, which is based at the Lawrence Berkeley National Laboratory in Berkeley, California. The site was launched in 2005, with regular updates until 2011.

The Reference Sequence (RefSeq) database is an open access, annotated and curated collection of publicly available nucleotide sequences and their protein products. RefSeq was introduced in 2000. This database is built by National Center for Biotechnology Information (NCBI), and, unlike GenBank, provides only a single record for each natural biological molecule for major organisms ranging from viruses to bacteria to eukaryotes.

<span class="mw-page-title-main">STRING</span>

In molecular biology, STRING is a biological database and web resource of known and predicted protein–protein interactions.

Pathogenomics is a field which uses high-throughput screening technology and bioinformatics to study encoded microbe resistance, as well as virulence factors (VFs), which enable a microorganism to infect a host and possibly cause disease. This includes studying genomes of pathogens which cannot be cultured outside of a host. In the past, researchers and medical professionals found it difficult to study and understand pathogenic traits of infectious organisms. With newer technology, pathogen genomes can be identified and sequenced in a much shorter time and at a lower cost, thus improving the ability to diagnose, treat, and even predict and prevent pathogenic infections and disease. It has also allowed researchers to better understand genome evolution events - gene loss, gain, duplication, rearrangement - and how those events impact pathogen resistance and ability to cause disease. This influx of information has created a need for bioinformatics tools and databases to analyze and make the vast amounts of data accessible to researchers, and it has raised ethical questions about the wisdom of reconstructing previously extinct and deadly pathogens in order to better understand virulence.

This microRNA database and microRNA targets databases is a compilation of databases and web portals and servers used for microRNAs and their targets. MicroRNAs (miRNAs) represent an important class of small non-coding RNAs (ncRNAs) that regulate gene expression by targeting messenger RNAs.

The Cervical Cancer gene DataBase (CCDB) is a database of genes involved in the cervical carcinogenesis. The Cervical Cancer Database is the first database that has been manually curated. The database serves as an entity for clinicians and researchers to examine basic information as well as advanced information about the genes that differentiates into cervical cancer. There are 537 genes that have been cataloged into the CCBD. The genes that have been cataloged based on polymorphism, methylation, amplification of genes, and the change in how the gene is expressed. Science investigators have examined data that compared normal cervical cells with malignant cervical cells which has been used to study the different gene expressions that result in cervical cancer. Of the 500,000 women that have succumbed to cervical, most are from developing countries as well as of the low socioeconomic level in developed countries. The CCBD is designed to present information that will novel therapeutic treatments for leading cause of cancer within the population of women.

The Vaccine Investigation and OnLine Information Network (VIOLIN) is the largest web-based vaccine database and analysis system. VIOLIN currently contains over 3,000 vaccines or vaccine candidates for over 190 pathogens. The vaccine information in the database is collected by manual curation from over 1,600 peer-reviewed papers. Different from most existing vaccine databases, VIOLIN focuses on vaccine research data. Different types of information is curated, including vaccine name, license status, antigens used, vaccine adjuvants, vaccine vectors, vaccination procedure, host immune response, challenge procedure, vaccine efficacy, adverse events, etc. All vaccine information in the VIOLIN vaccine database is supported by quoted references. The data generated by a curator is published only after a careful review and approval by a vaccine domain expert.

<span class="mw-page-title-main">PhytoPath</span>

PhytoPath was a joint scientific project between the European Bioinformatics Institute and Rothamsted Research, running from January 2012 to May 30, 2017. The project aimed to enable the exploitation of the growing body of “-omics” data being generated for phytopathogens, their plant hosts and related model species. Gene mutant phenotypic information is directly displayed in genome browsers.

<span class="mw-page-title-main">Eukaryotic Pathogen Database</span>

The Eukaryotic Pathogen Vector and Host Database, or VEuPathDB, is a database of genomics and experimental data related to various eukaryotic pathogens. It was established in 2006 under a National Institutes of Health program to create Bioinformatics Resource Centers to facilitate research on pathogens that may pose biodefense threats. VEuPathDB stores data related to its organisms of interest and provides tools for searching through and analyzing the data. It currently consists of 14 component databases, each dedicated to a certain research topic. VEuPathDB includes:

Model organism databases (MODs) are biological databases, or knowledgebases, dedicated to the provision of in-depth biological data for intensively studied model organisms. MODs allow researchers to easily find background information on large sets of genes, efficiently plan experiments, integrate their data with existing knowledge, and formulate new hypotheses. They allow users to analyse results and interpret datasets, and the data they generate are increasingly used to describe less well studied species. Where possible, MODs share common approaches to collect and represent biological information. For example, all MODs use the Gene Ontology (GO) to describe functions, processes and cellular locations of specific gene products. Projects also exist to enable software sharing for curation, visualization and querying between different MODs. Organismal diversity and varying user requirements however mean that MODs are often required to customize capture, display, and provision of data.

Donna R. Maglott is a staff scientist at the National Center for Biotechnology Information known for her research on large-scale genomics projects, including the mouse genome and development of databases required for genomics research.

The Monarch Initiative is a large scale bioinformatics web resource focused on leveraging existing biomedical knowledge to connect genotypes with phenotypes in an effort to aid research that combats genetic diseases. Monarch does this by integrating multi-species genotype, phenotype, genetic variant and disease knowledge from various existing biomedical data resources into a centralized and structured database. While this integration process has been traditionally done manually by basic researchers and clinicians on a case-by-case basis, The Monarch Initiative provides an aggregated and structured collection of data and tools that make biomedical knowledge exploration more efficient and effective.

Biocuration is the field of life sciences dedicated to organizing biomedical data, information and knowledge into structured formats, such as spreadsheets, tables and knowledge graphs. The biocuration of biomedical knowledge is made possible by the cooperative work of biocurators, software developers and bioinformaticians and is at the base of the work of biological databases.

References

  1. 1 2 3 Urban, M.; Cuzick, A.; Seager, J.; Nonavinakere, N.; Sahoo, J.; Sahu, P.; Iyer, V. L.; Khamari, L.; Carbajo Martinez, M.; Hammond-Kosack, K.E. (2025). "PHI-base – the multi-species pathogen–host interaction database in 2025". Nucleic Acids Research. 53 (Database Issue): D826-838. doi:10.1093/nar/gkae1084. PMC   11701570 .
  2. Winnenburg, R.; Baldwin, T.K.; Urban, M.; Rawlings, C.; Köhler, J.; Hammond-Kosack, K.E. (2014). "PHI-base: a new database for pathogen host interactions". Nucleic Acids Research. 34 (Database Issue): D459-464. doi:10.1093/nar/gkj047. PMC   1347410 . PMID   16381911.
  3. Baldwin, T.K.; Winnenburg, R.; Urban, M.; Rawlings, C.; Köhler, J.; Hammond-Kosack, K.E. (2006). "The pathogen-host interactions database (PHI-base) provides insights into generic and novel themes of pathogenicity". Molecular Plant-Microbe Interactions. 19 (12): 1451–1462. doi: 10.1094/mpmi-19-1451 . PMID   17153929.
  4. Winnenburg, R.; Urban, M.; Beacham, A.; Baldwin, T.K.; Holland, S.; Lindeberg, M.; Hansen, H.; Rawlings, C.; Hammond-Kosack, K.E.; Köhler, J. (2008). "PHI-base update: additions to the pathogen host interactions database". Nucleic Acids Research. 36 (Database Issue): D572-576. doi:10.1093/nar/gkm858. PMC   2238852 . PMID   17942425.
  5. Urban, M.; Pant, R.; Raghunath, A.; Irvine, A.G.; Pedro, H.; Hammond-Kosack, K.E. (2015). "The Pathogen-Host Interactions database (PHI-base): additions and future developments". Nucleic Acids Research. 43 (Database Issue): D645 –D655. doi:10.1093/nar/gku1165. PMC   4383963 . PMID   25414340.
  6. Urban, Martin; Cuzick, Alayne; Seager, James; Wood, Valerie; Rutherford, Kim; Venkatesh, Shilpa Yagwakote; Sahu, Jashobanta; Iyer, S. Vijaylakshmi; Khamari, Lokanath; De Silva, Nishadi; Martinez, Manuel Carbajo; Pedro, Helder; Yates, Andrew D.; Hammond-Kosack, Kim E. (2022-01-07). "PHI-base in 2022: a multi-species phenotype database for Pathogen-Host Interactions". Nucleic Acids Research. 50 (D1): D837 –D847. doi:10.1093/nar/gkab1037. ISSN   1362-4962. PMC   8728202 . PMID   34788826.
  7. 1 2 Urban, M; Cuzick, A; Seager, J; Wood, V; Rutherford, K; Venkatesh, SY; De Silva, N; Martinez, MC; Pedro, H; Yates, AD; Hassani-Pak, K; Hammond-Kosack, KE (8 January 2020). "PHI-base: the pathogen-host interactions database". Nucleic Acids Research. 48 (D1): D613 –D620. doi:10.1093/nar/gkz904. PMC   7145647 . PMID   31733065.
  8. Brown, N. A.; Urban, M.; Hammond-Kosack, K.E. (2016). "The trans-kingdom identification of negative regulators of pathogen hypervirulence". FEMS Microbiol Rev. 40 (1): 19–40. doi:10.1093/femsre/fuv042. PMC   4703069 . PMID   26468211.
  9. Urban, M.; Irvine, A. G.; Raghunath, A.; Cuzick, A.; Hammond-Kosack, K.E. (2015). "Using the pathogen-host interactions database (PHI-base) to investigate plant pathogen genomes and genes implicated in virulence". Front Plant Sci. 6: 605. doi: 10.3389/fpls.2015.00605 . PMC   4526803 . PMID   26300902.
  10. Urban, M.; Cuzick, A.; Rutherford, K.; Irvine, A. G.; Pedro, H.; Pant, R.; Sadanadan, V.; Khamari, L.; Billal, S.; Mohanty, S.; Hammond-Kosack, K. (2017). "PHI-base: a new interface and further additions for the multi-species pathogen-host interactions database". Nucleic Acids Res. 45 (D1): D604 –D610. doi:10.1093/nar/gkw1089. PMC   5210566 . PMID   27915230.
  11. Rodriguez-Iglesias, A.; Rodriguez-Gonzalez, A.; Irvine, A.G.; Sesma, A.; Urban, M.; Hammond-Kosack, K.E.; Wilkinson, M.D. (2016). "Publishing FAIR Data: An Exemplar Methodology Utilizing PHI-Base". Front Plant Sci. 7: 641. doi: 10.3389/fpls.2016.00641 . PMC   4922217 . PMID   27433158.
  12. Basenko, Evelina Y.; Pulman, Jane A.; Shanmugasundram, Achchuthan; Harb, Omar S.; Crouch, Kathryn; Starns, David; Warrenfeltz, Susanne; Aurrecoechea, Cristina; Stoeckert, Christian J.; Kissinger, Jessica C.; Roos, David S.; Hertz-Fowler, Christiane (2018-03-20). "FungiDB: An Integrated Bioinformatic Resource for Fungi and Oomycetes". Journal of Fungi. 4 (1): 39. doi: 10.3390/jof4010039 . ISSN   2309-608X. PMC   5872342 . PMID   30152809.
  13. Cuzick, Alayne; Seager, James; Wood, Valerie; Urban, Martin; Rutherford, Kim; Hammond-Kosack, Kim E (2023-07-04). "A framework for community curation of interspecies interactions literature". eLife. 12. doi: 10.7554/elife.84658 . ISSN   2050-084X. PMC   10319440 . PMID   37401199.
  14. Rutherford KM, Lera-Ramírez M, Wood V (May 2024). "PomBase: a Global Core Biodata Resource-growth, collaboration, and sustainability". Genetics. 227 (1). doi:10.1093/genetics/iyae007. PMC   11075564 . PMID   38376816.