PhenomicDB is a free phenotype oriented database. It contains data for some of the main model organisms such as Homo sapiens, Mus musculus, Drosophila melanogaster, and others. PhenomicDB merges and structures phenotypic data from various public sources: WormBase, FlyBase, NCBI Gene, MGI and ZFIN using clustering algorithms. The website is now offline. [1]
In genetics, the phenotype is the set of observable characteristics or traits of an organism. The term covers the organism's morphology, its developmental processes, its biochemical and physiological properties, its behavior, and the products of behavior. An organism's phenotype results from two basic factors: the expression of an organism's genetic code and the influence of environmental factors. Both factors may interact, further affecting the phenotype. When two or more clearly different phenotypes exist in the same population of a species, the species is called polymorphic. A well-documented example of polymorphism is Labrador Retriever coloring; while the coat color depends on many genes, it is clearly seen in the environment as yellow, black, and brown. Richard Dawkins in 1978 and then again in his 1982 book The Extended Phenotype suggested that one can regard bird nests and other built structures such as caddisfly larva cases and beaver dams as "extended phenotypes".
Biological databases are libraries of biological sciences, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis. They contain information from research areas including genomics, proteomics, metabolomics, microarray gene expression, and phylogenetics. Information contained in biological databases includes gene function, structure, localization, clinical effects of mutations as well as similarities of biological sequences and structures.
In genetics and bioinformatics, a single-nucleotide polymorphism is a germline substitution of a single nucleotide at a specific position in the genome that is present in a sufficiently large fraction of considered population.
Biomedical text mining refers to the methods and study of how text mining may be applied to texts and literature of the biomedical domain. As a field of research, biomedical text mining incorporates ideas from natural language processing, bioinformatics, medical informatics and computational linguistics. The strategies in this field have been applied to the biomedical literature available through services such as PubMed.
Phenomics is the systematic study of traits that make up a phenotype. It was coined by UC Berkeley and LBNL scientist Steven A. Garan. As such, it is a transdisciplinary area of research that involves biology, data sciences, engineering and other fields. Phenomics is concerned with the measurement of the phenotype where a phenome is a set of traits that can be produced by a given organism over the course of development and in response to genetic mutation and environmental influences. It is also important to remember that an organisms phenotype changes with time. The relationship between phenotype and genotype enables researchers to understand and study pleiotropy. Phenomics concepts are used in functional genomics, pharmaceutical research, metabolic engineering, agricultural research, and increasingly in phylogenetics.
The Pathogen-Host Interactions database (PHI-base) is a biological database that contains manually curated information on genes experimentally proven to affect the outcome of pathogen-host interactions. The database has been maintained by researchers at Rothamsted Research and external collaborators since 2005. PHI-base has been part of the UK node of ELIXIR, the European life-science infrastructure for biological information, since 2016.
Mouse Genome Informatics (MGI) is a free, online database and bioinformatics resource hosted by The Jackson Laboratory, with funding by the National Human Genome Research Institute (NHGRI), the National Cancer Institute (NCI), and the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD). MGI provides access to data on the genetics, genomics and biology of the laboratory mouse to facilitate the study of human health and disease. The database integrates multiple projects, with the two largest contributions coming from the Mouse Genome Database and Mouse Gene Expression Database (GXD). As of 2018, MGI contains data curated from over 230,000 publications.
The Single Nucleotide Polymorphism Database (dbSNP) is a free public archive for genetic variation within and across different species developed and hosted by the National Center for Biotechnology Information (NCBI) in collaboration with the National Human Genome Research Institute (NHGRI). Although the name of the database implies a collection of one class of polymorphisms only, it in fact contains a range of molecular variation: (1) SNPs, (2) short deletion and insertion polymorphisms (indels/DIPs), (3) microsatellite markers or short tandem repeats (STRs), (4) multinucleotide polymorphisms (MNPs), (5) heterozygous sequences, and (6) named variants. The dbSNP accepts apparently neutral polymorphisms, polymorphisms corresponding to known phenotypes, and regions of no variation. It was created in September 1998 to supplement GenBank, NCBI’s collection of publicly available nucleic acid and protein sequences.
BIOBASE is an international bioinformatics company headquartered in Wolfenbüttel, Germany. The company focuses on the generation, maintenance, and licensing of databases in the field of molecular biology, and their related software platforms.
The Comparative Toxicogenomics Database (CTD) is a public website and research tool launched in November 2004 that curates scientific data describing relationships between chemicals/drugs, genes/proteins, diseases, taxa, phenotypes, GO annotations, pathways, and interaction modules. The database is maintained by the Department of Biological Sciences at North Carolina State University.
GeneNetwork is a combined database and open-source bioinformatics data analysis software resource for systems genetics. This resource is used to study gene regulatory networks that link DNA sequence differences to corresponding differences in gene and protein expression and to variation in traits such as health and disease risk. Data sets in GeneNetwork are typically made up of large collections of genotypes and phenotypes from groups of individuals, including humans, strains of mice and rats, and organisms as diverse as Drosophila melanogaster, Arabidopsis thaliana, and barley. The inclusion of genotypes makes it practical to carry out web-based gene mapping to discover those regions of genomes that contribute to differences among individuals in mRNA, protein, and metabolite levels, as well as differences in cell function, anatomy, physiology, and behavior.
GWAS Central is a publicly available database of summary-level findings from genetic association studies in humans, including genome-wide association studies (GWAS).
PhytoPath was a joint scientific project between the European Bioinformatics Institute and Rothamsted Research, running from January 2012 to May 30, 2017. The project aimed to enable the exploitation of the growing body of “-omics” data being generated for phytopathogens, their plant hosts and related model species. Gene mutant phenotypic information is directly displayed in genome browsers.
Gene set enrichment analysis (GSEA) (also called functional enrichment analysis or pathway enrichment analysis) is a method to identify classes of genes or proteins that are over-represented in a large set of genes or proteins, and may have an association with different phenotypes (e.g. different organism growth patterns or diseases). The method uses statistical approaches to identify significantly enriched or depleted groups of genes. Transcriptomics technologies and proteomics results often identify thousands of genes which are used for the analysis.
In bioinformatics, a Gene Disease Database is a systematized collection of data, typically structured to model aspects of reality, in a way to comprehend the underlying mechanisms of complex diseases, by understanding multiple composite interactions between phenotype-genotype relationships and gene-disease mechanisms. Gene Disease Databases integrate human gene-disease associations from various expert curated databases and text mining derived associations including Mendelian, complex and environmental diseases.
PomBase is a model organism database that provides online access to the fission yeast Schizosaccharomyces pombe genome sequence and annotated features, together with a wide range of manually curated functional gene-specific data. The PomBase website was redeveloped in 2016 to provide users with a more fully integrated, better-performing service.
The Monarch Initiative is a large scale bioinformatics web resource focused on leveraging existing biomedical knowledge to connect genotypes with phenotypes in an effort to aid research that combats genetic diseases. Monarch does this by integrating multi-species genotype, phenotype, genetic variant and disease knowledge from various existing biomedical data resources into a centralized and structured database. While this integration process has been traditionally done manually by basic researchers and clinicians on a case-by-case basis, The Monarch Initiative provides an aggregated and structured collection of data and tools that make biomedical knowledge exploration more efficient and effective.
Canto is a web-based tool to support the curation of gene-specific scientific data, by both professional biocurators and publication authors. Canto was developed as part of the PomBase project, and is funded by the Wellcome Trust.
Biocuration is the field of life sciences dedicated to organizing biomedical data, information and knowledge into structured formats, such as spreadsheets, tables and knowledge graphs. The biocuration of biomedical knowledge is made possible by the cooperative work of biocurators, software developers and bioinformaticians and is at the base of the work of biological databases.