Content | |
---|---|
Description | e-Mouse Atlas of Gene Expression |
Organisms | mouse |
Contact | |
Research center | MRC Human Genetics Unit |
Primary citation | PMID 19767607 |
Access | |
Website | emouseatlas |
Miscellaneous | |
License | Creative Commons Attribution License 3.0 |
EMAGE (e-Mouse Atlas of Gene Expression [note 1] ) is an online biological database of gene expression data in the developing mouse ( Mus musculus ) embryo. [1] [2] [3] The data held in EMAGE is spatially annotated to a framework of 3D mouse embryo models produced by EMAP (e-Mouse Atlas Project). These spatial annotations allow users to query EMAGE by spatial pattern as well as by gene name, anatomy term or Gene Ontology (GO) term. EMAGE is a freely available web-based resource funded by the Medical Research Council (UK) and based at the MRC Human Genetics Unit in the Institute of Genetics and Molecular Medicine, Edinburgh, UK.
EMAGE contains in situ hybridisation, immunohistochemistry, and in situ reporter (e.g. knock-in and gene trap) data. It includes wholemount data, section data and full 3D OPT (Optical Projection Tomography) data. The gene expression patterns are mapped into or onto the standard models by a team of biocurators, using bespoke mapping software. In addition to the spatial annotations, EMAGE data is also text annotated to provide a text based description of the expression patterns. This text annotation is carried out in collaboration with the MGI Gene Expression Database (GXD) using the EMAP mouse anatomy ontology.
EMAGE data comes primarily from peer reviewed, published journal articles, and from large scale screens, but also from direct submissions from researches working in the field. Data does not need to be published to be included in EMAGE, however EMAGE is a curated database. Biocurators check the accuracy of the meta-data included in the database entries and as well as performing the spatial annotations of the data.
EMAGE entries are designed to adhere to the Minimum information specification for in situ hybridization and immunohistochemistry experiments (MISFISHIE) [4] specifications, and as such contain information about the submitter/author publication, detection reagent, assay specimen preparation, and experimental procedures as well as the original data images and the spatial and text annotations. EMAGE entries also contain links to a variety of related resources based on the either the gene being assayed, or the assay itself.
Biological databases are libraries of biological sciences, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis. They contain information from research areas including genomics, proteomics, metabolomics, microarray gene expression, and phylogenetics. Information contained in biological databases includes gene function, structure, localization, clinical effects of mutations as well as similarities of biological sequences and structures.
Spatiotemporal gene expression is the activation of genes within specific tissues of an organism at specific times during development. Gene activation patterns vary widely in complexity. Some are straightforward and static, such as the pattern of tubulin, which is expressed in all cells at all times in life. Some, on the other hand, are extraordinarily intricate and difficult to predict and model, with expression fluctuating wildly from minute to minute or from cell to cell. Spatiotemporal variation plays a key role in generating the diversity of cell types found in developed organisms; since the identity of a cell is specified by the collection of genes actively expressed within that cell, if gene expression was uniform spatially and temporally, there could be at most one kind of cell.
Ensembl genome database project is a scientific project at the European Bioinformatics Institute, which was launched in 1999 in response to the imminent completion of the Human Genome Project. Ensembl aims to provide a centralized resource for geneticists, molecular biologists and other researchers studying the genomes of our own species and other vertebrates and model organisms. Ensembl is one of several well known genome browsers for the retrieval of genomic information.
The Rat Genome Database (RGD) is a database of rat genomics, genetics, physiology and functional data, as well as data for comparative genomics between rat, human and mouse. RGD is responsible for attaching biological information to the rat genome via structured vocabulary, or ontology, annotations assigned to genes and quantitative trait loci (QTL), and for consolidating rat strain data and making it available to the research community. They are also developing a suite of tools for mining and analyzing genomic, physiologic and functional data for the rat, and comparative data for rat, mouse, human, and five other species.
The Saccharomyces Genome Database (SGD) is a scientific database of the molecular biology and genetics of the yeast Saccharomyces cerevisiae, which is commonly known as baker's or budding yeast.
The Generic Model Organism Database (GMOD) project provides biological research communities with a toolkit of open-source software components for visualizing, annotating, managing, and storing biological data. The GMOD project is funded by the United States National Institutes of Health, National Science Foundation and the USDA Agricultural Research Service.
FlyBase is an online bioinformatics database and the primary repository of genetic and molecular data for the insect family Drosophilidae. For the most extensively studied species and model organism, Drosophila melanogaster, a wide range of data are presented in different formats.
Mouse Genome Informatics (MGI) is a free, online database and bioinformatics resource hosted by The Jackson Laboratory, with funding by the National Human Genome Research Institute (NHGRI), the National Cancer Institute (NCI), and the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD). MGI provides access to data on the genetics, genomics and biology of the laboratory mouse to facilitate the study of human health and disease. The database integrates multiple projects, with the two largest contributions coming from the Mouse Genome Database and Mouse Gene Expression Database (GXD). As of 2018, MGI contains data curated from over 230,000 publications.
MicrobesOnline is a publicly and freely accessible website that hosts multiple comparative genomic tools for comparing microbial species at the genomic, transcriptomic and functional levels. MicrobesOnline was developed by the Virtual Institute for Microbial Stress and Survival, which is based at the Lawrence Berkeley National Laboratory in Berkeley, California. The site was launched in 2005, with regular updates until 2011.
The Gene Wiki is a project within Wikipedia that aims to describe the relationships and functions of all human genes. It was established to transfer information from scientific resources to Wikipedia stub articles.
SUPERFAMILY is a database and search platform of structural and functional annotation for all proteins and genomes. It classifies amino acid sequences into known structural domains, especially into SCOP superfamilies. Domains are functional, structural, and evolutionary units that form proteins. Domains of common Ancestry are grouped into superfamilies. The domains and domain superfamilies are defined and described in SCOP. Superfamilies are groups of proteins which have structural evidence to support a common evolutionary ancestor but may not have detectable sequence homology.
Protein function prediction methods are techniques that bioinformatics researchers use to assign biological or biochemical roles to proteins. These proteins are usually ones that are poorly studied or predicted based on genomic sequence data. These predictions are often driven by data-intensive computational procedures. Information may come from nucleic acid sequence homology, gene expression profiles, protein domain structures, text mining of publications, phylogenetic profiles, phenotypic profiles, and protein-protein interaction. Protein function is a broad term: the roles of proteins range from catalysis of biochemical reactions to transport to signal transduction, and a single protein may play a role in multiple processes or cellular pathways.
DisProt is a manually curated biological database of intrinsically disordered proteins (IDPs) and regions (IDRs). DisProt annotations cover state information on the protein but also, when available, its state transitions, interactions and functional aspects of disorder detected by specific experimental methods. DisProt is hosted and maintained in the BioComputing UP laboratory.
Blast2GO, first published in 2005, is a bioinformatics software tool for the automatic, high-throughput functional annotation of novel sequence data. It makes use of the BLAST algorithm to identify similar sequences to then transfers existing functional annotation from yet characterised sequences to the novel one. The functional information is represented via the Gene Ontology (GO), a controlled vocabulary of functional attributes. The Gene Ontology, or GO, is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species.
In bioinformatics, the PANTHER classification system is a large curated biological database of gene/protein families and their functionally related subfamilies that can be used to classify and identify the function of gene products. PANTHER is part of the Gene Ontology Reference Genome Project designed to classify proteins and their genes for high-throughput analysis.
In bioinformatics, a Gene Disease Database is a systematized collection of data, typically structured to model aspects of reality, in a way to comprehend the underlying mechanisms of complex diseases, by understanding multiple composite interactions between phenotype-genotype relationships and gene-disease mechanisms. Gene Disease Databases integrate human gene-disease associations from various expert curated databases and text mining derived associations including Mendelian, complex and environmental diseases.
The Expression Atlas is a database maintained by the European Bioinformatics Institute that provides information on gene expression patterns from RNA-Seq and Microarray studies, and protein expression from Proteomics studies. The Expression Atlas allows searches by gene, splice variant, protein attribute, disease, treatment or organism part. Individual genes or gene sets can be searched for. All datasets in Expression Atlas have its metadata manually curated and its data analysed through standardised analysis pipelines. There are two components to the Expression Atlas, the Baseline Atlas and the Differential Atlas:
Model organism databases (MODs) are biological databases, or knowledgebases, dedicated to the provision of in-depth biological data for intensively studied model organisms. MODs allow researchers to easily find background information on large sets of genes, plan experiments efficiently, combine their data with existing knowledge, and construct novel hypotheses. They allow users to analyse results and interpret datasets, and the data they generate are increasingly used to describe less well studied species. Where possible, MODs share common approaches to collect and represent biological information. For example, all MODs use the Gene Ontology (GO) to describe functions, processes and cellular locations of specific gene products. Projects also exist to enable software sharing for curation, visualization and querying between different MODs. Organismal diversity and varying user requirements however mean that MODs are often required to customize capture, display, and provision of data.
Biocuration is the field of life sciences dedicated to organizing biomedical data, information and knowledge into structured formats, such as spreadsheets, tables and knowledge graphs. The biocuration of biomedical knowledge is made possible by the cooperative work of biocurators, software developers and bioinformaticians and is at the base of the work of biological databases.