The Human Phenotype Ontology (HPO) is a formal ontology of human phenotypes. [1] [2] [3] [4] Developed as part of the Monarch Initiative in collaboration with members of the Open Biomedical Ontologies Foundry, HPO currently contains over 13,000 terms and over 156,000 annotations to hereditary diseases. Data from Online Mendelian Inheritance in Man and medical literature were used to generate the terms currently in the HPO. The ontology contains over 50,000 annotations between phenotypes and hereditary disease.
The Human Phenotype Ontology (HPO) was created in 2008 at the Charité to serve as a standardized vocabulary of phenotypic abnormalities that have been seen in human disease. Applications for the data in the ontology include clinical diagnostics, mapping between phenotypes of model organisms, and as a standard vocabulary for clinical databases. Clinical annotations within the HPO are sought from the medical and genetics community to improve the ontology.
The Gene Ontology (GO) is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species. More specifically, the project aims to: 1) maintain and develop its controlled vocabulary of gene and gene product attributes; 2) annotate genes and gene products, and assimilate and disseminate annotation data; and 3) provide tools for easy access to all aspects of the data provided by the project, and to enable functional interpretation of experimental data using the GO, for example via enrichment analysis. GO is part of a larger classification effort, the Open Biomedical Ontologies, being one of the Initial Candidate Members of the OBO Foundry.
The Rat Genome Database (RGD) is a database of rat genomics, genetics, physiology and functional data, as well as data for comparative genomics between rat, human and mouse. RGD is responsible for attaching biological information to the rat genome via structured vocabulary, or ontology, annotations assigned to genes and quantitative trait loci (QTL), and for consolidating rat strain data and making it available to the research community. They are also developing a suite of tools for mining and analyzing genomic, physiologic and functional data for the rat, and comparative data for rat, mouse, human, and five other species.
Biomedical text mining refers to the methods and study of how text mining may be applied to texts and literature of the biomedical domain. As a field of research, biomedical text mining incorporates ideas from natural language processing, bioinformatics, medical informatics and computational linguistics. The strategies in this field have been applied to the biomedical literature available through services such as PubMed.
The Open Biological and Biomedical Ontologies (OBO) Foundry is a group of people dedicated to build and maintain ontologies related to the life sciences. The OBO Foundry establishes a set of principles for ontology development for creating a suite of interoperable reference ontologies in the biomedical domain. Currently, there are more than a hundred ontologies that follow the OBO Foundry principles.
The Pathogen-Host Interactions database (PHI-base) is a biological database that contains manually curated information on genes experimentally proven to affect the outcome of pathogen-host interactions. The database has been maintained by researchers at Rothamsted Research and external collaborators since 2005. PHI-base has been part of the UK node of ELIXIR, the European life-science infrastructure for biological information, since 2016.
FlyBase is an online bioinformatics database and the primary repository of genetic and molecular data for the insect family Drosophilidae. For the most extensively studied species and model organism, Drosophila melanogaster, a wide range of data are presented in different formats.
A dysmorphic feature is an abnormal difference in body structure. It can be an isolated finding in an otherwise normal individual, or it can be related to a congenital disorder, genetic syndrome or birth defect. Dysmorphology is the study of dysmorphic features, their origins and proper nomenclature. One of the key challenges in identifying and describing dysmorphic features is the use and understanding of specific terms between different individuals. Clinical geneticists and pediatricians are usually those most closely involved with the identification and description of dysmorphic features, as most are apparent during childhood.
DECIPHER is a web-based resource and database of genomic variation data from analysis of patient DNA. It documents submicroscopic chromosome abnormalities and pathogenic sequence variants, from over 25000 patients and maps them to the human genome using Ensembl or UCSC Genome Browser. In addition it catalogues the clinical characteristics from each patient and maintains a database of microdeletion/duplication syndromes, together with links to relevant scientific reports and support groups.
In molecular biology and genetics, DNA annotation or genome annotation is the process of describing the structure and function of the components of a genome, by analyzing and interpreting them in order to extract their biological significance and understand the biological processes in which they participate. Among other things, it identifies the locations of genes and all the coding regions in a genome and determines what those genes do.
The Uber-anatomy ontology (Uberon) is a comparative anatomy ontology representing a variety of structures found in animals, such as lungs, muscles, bones, feathers and fins. These structures are connected to other structures via relationships such as part-of and develops-from. One of the uses of this ontology is to integrate data from different biological databases, and other species-specific ontologies such as the Foundational Model of Anatomy.
The International Mouse Phenotyping Consortium (IMPC) is an international scientific endeavour to create and characterize the phenotype of 20,000 knockout mouse strains. Launched in September 2011, the consortium consists of over 15 research institutes across four continents with funding provided by the NIH, European national governments and the partner institutions.
Gene Ontology (GO) term enrichment is a technique for interpreting sets of genes making use of the Gene Ontology system of classification, in which genes are assigned to a set of predefined bins depending on their functional characteristics. For example, the gene FasR is categorized as being a receptor, involved in apoptosis and located on the plasma membrane.
Gene set enrichment analysis (GSEA) (also called functional enrichment analysis or pathway enrichment analysis) is a method to identify classes of genes or proteins that are over-represented in a large set of genes or proteins, and may have an association with different phenotypes (e.g. different organism growth patterns or diseases). The method uses statistical approaches to identify significantly enriched or depleted groups of genes. Transcriptomics technologies and proteomics results often identify thousands of genes, which are used for the analysis.
In bioinformatics, a Gene Disease Database is a systematized collection of data, typically structured to model aspects of reality, in a way to comprehend the underlying mechanisms of complex diseases, by understanding multiple composite interactions between phenotype-genotype relationships and gene-disease mechanisms. Gene Disease Databases integrate human gene-disease associations from various expert curated databases and text mining derived associations including Mendelian, complex and environmental diseases.
Single nucleotide polymorphism annotation is the process of predicting the effect or function of an individual SNP using SNP annotation tools. In SNP annotation the biological information is extracted, collected and displayed in a clear form amenable to query. SNP functional annotation is typically performed based on the available information on nucleic acid and protein sequences.
PomBase is a model organism database that provides online access to the fission yeast Schizosaccharomyces pombe genome sequence and annotated features, together with a wide range of manually curated functional gene-specific data. The PomBase website was redeveloped in 2016 to provide users with a more fully integrated, better-performing service.
The Monarch Initiative is a large scale bioinformatics web resource focused on leveraging existing biomedical knowledge to connect genotypes with phenotypes in an effort to aid research that combats genetic diseases. Monarch does this by integrating multi-species genotype, phenotype, genetic variant and disease knowledge from various existing biomedical data resources into a centralized and structured database. While this integration process has been traditionally done manually by basic researchers and clinicians on a case-by-case basis, The Monarch Initiative provides an aggregated and structured collection of data and tools that make biomedical knowledge exploration more efficient and effective.
PathoPhenoDB is a biological database. The database connects pathogens to their phenotypes using multiple databases such as NCBI, Human Disease Ontology Human Phenotype Ontology, Mammalian Phenotype Ontology, PubChem, SIDER and CARD. Pathogen-disease associations were gathered mainly through the CDC and the List of Infectious Diseases page on Wikipedia. The manner by which they assigned taxonomy was semi-automatic. When mapped against NCBI Taxonomy, if the pathogen was not an exact match, it was then mapped to the parent class. PathoPhenoDB employs NPMI in order to filter pairs based on their co-occurrence statistics.
Biocuration is the field of life sciences dedicated to organizing biomedical data, information and knowledge into structured formats, such as spreadsheets, tables and knowledge graphs. The biocuration of biomedical knowledge is made possible by the cooperative work of biocurators, software developers and bioinformaticians and is at the base of the work of biological databases.