Mouse Genome Informatics (MGI) is a free, online database and bioinformatics resource hosted by The Jackson Laboratory, with funding by the National Human Genome Research Institute (NHGRI), the National Cancer Institute (NCI), and the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD). [1] MGI provides access to data on the genetics, genomics and biology of the laboratory mouse to facilitate the study of human health and disease. [2] [3] The database integrates multiple projects, with the two largest contributions coming from the Mouse Genome Database and Mouse Gene Expression Database (GXD). [4] As of 2018 [update] , MGI contains data curated from over 230,000 publications. [5]
The MGI resource was first published online in 1994 [5] and is a collection of data, tools, and analyses created and tailored for use in the laboratory mouse, a widely used model organism. It is "the authoritative source of official names for mouse genes, alleles, and strains", which follow the guidelines established by the International Committee on Standardized Genetic Nomenclature for Mice. [6] The history and focus of Jackson Laboratory research and production facilities generates tremendous knowledge and depth which researchers can mine to advance their research. A dedicated community of mouse researchers, worldwide enhances and contributes to the knowledge as well. This is an indispensable tool for any researcher using the mouse as a model organism for their research, and for researchers interested in genes that share homology with the mouse genes. Various mouse research support resources including animal collections and free colony management software are also available at the MGI site. [7]
The Mouse Genome Database collects and curates comprehensive phenotype and functional annotations for mouse genes and alleles. [8] This is an NHGRI-funded project which contributes to the Mouse Genome Informatics database.
The Gene Expression Database is a community resource of mouse developmental expression information. [9]
MGI evolved from a project funded by the National Center for Human Genome Research in 1989 to combine the databases of several Jackson Laboratory scientists and create a tool for visualizing data on the mouse genome. [10] The result of that project, led by Joseph H. Nadeau, Larry E. Mobraaten, and Janan T. Eppig, was called the "Encyclopedia of the Mouse Genome" and distributed via floppy disk semi-annually to around 300 scientists around the world. [10] In 1992, that group joined with the team responsible for developing the "Genomic Database for Mouse", led by Muriel T. Davisson and Thomas H. Roderick, to start the "Mouse Genome Informatics" project. [10] That project resulted in the first online release of the "Mouse Genome Database" in 1994. [10]
Inbred strains are individuals of a particular species which are nearly identical to each other in genotype due to long inbreeding. A strain is inbred when it has undergone at least 20 generations of brother x sister or offspring x parent mating, at which point at least 98.6% of the loci in an individual of the strain will be homozygous, and each individual can be treated effectively as clones. Some inbred strains have been bred for over 150 generations, leaving individuals in the population to be isogenic in nature. Inbred strains of animals are frequently used in laboratories for experiments where for the reproducibility of conclusions all the test animals should be as similar as possible. However, for some experiments, genetic diversity in the test population may be desired. Thus outbred strains of most laboratory animals are also available, where an outbred strain is a strain of an organism that is effectively wildtype in nature, where there is as little inbreeding as possible.
Biological databases are libraries of biological sciences, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis. They contain information from research areas including genomics, proteomics, metabolomics, microarray gene expression, and phylogenetics. Information contained in biological databases includes gene function, structure, localization, clinical effects of mutations as well as similarities of biological sequences and structures.
The Jackson Laboratory is an independent, non-profit biomedical research institution which was founded by Clarence Cook Little in 1929. It employs over 3,000 employees in Bar Harbor, Maine; Sacramento, California; Farmington, Connecticut; Shanghai, China; and Yokohama, Japan. The institution is a National Cancer Institute-designated Cancer Center and has NIH Centers of Excellence in aging and systems genetics. The stated mission of The Jackson Laboratory is "to discover the genetic basis for preventing, treating and curing human diseases, and to enable research and education for the global biomedical community."
KEGG is a collection of databases dealing with genomes, biological pathways, diseases, drugs, and chemical substances. KEGG is utilized for bioinformatics research and education, including data analysis in genomics, metagenomics, metabolomics and other omics studies, modeling and simulation in systems biology, and translational research in drug development.
Amos Bairoch is a Swiss bioinformatician and Professor of Bioinformatics at the Department of Human Protein Sciences of the University of Geneva where he leads the CALIPHO group at the Swiss Institute of Bioinformatics (SIB) combining bioinformatics, curation, and experimental efforts to functionally characterize human proteins.
Gerald Mayer Rubin is an American biologist, notable for pioneering the use of transposable P elements in genetics, and for leading the public project to sequence the Drosophila melanogaster genome. Related to his genomics work, Rubin's lab is notable for development of genetic and genomics tools and studies of signal transduction and gene regulation. Rubin also served as a vice president of the Howard Hughes Medical Institute (2003-2020) and founding executive director of its Janelia Research Campus.
MicrobesOnline is a publicly and freely accessible website that hosts multiple comparative genomic tools for comparing microbial species at the genomic, transcriptomic and functional levels. MicrobesOnline was developed by the Virtual Institute for Microbial Stress and Survival, which is based at the Lawrence Berkeley National Laboratory in Berkeley, California. The site was launched in 2005, with regular updates until 2011.
The International Knockout Mouse Consortium (IKMC) is a scientific endeavour to produce a collection of mouse embryonic stem cell lines that together lack every gene in the genome, and then to distribute the cells to scientific researchers to create knockout mice to study. Many of the targeted alleles are designed so that they can generate both complete and conditional gene knockout mice. The IKMC was initiated on March 15, 2007, at a meeting in Brussels. By 2011, Nature reported that approximately 17,000 different genes have already been disabled by the consortium, "leaving only around 3,000 more to go".
In bioinformatics, miRBase is a biological database that acts as an archive of microRNA sequences and annotations. As of September 2010 it contained information about 15,172 microRNAs. This number has risen to 38,589 by March 2018. The miRBase registry provides a centralised system for assigning new names to microRNA genes.
EMAGE is an online biological database of gene expression data in the developing mouse embryo. The data held in EMAGE is spatially annotated to a framework of 3D mouse embryo models produced by EMAP. These spatial annotations allow users to query EMAGE by spatial pattern as well as by gene name, anatomy term or Gene Ontology (GO) term. EMAGE is a freely available web-based resource funded by the Medical Research Council (UK) and based at the MRC Human Genetics Unit in the Institute of Genetics and Molecular Medicine, Edinburgh, UK.
BioMart is a community-driven project to provide a single point of access to distributed research data. The BioMart project contributes open source software and data services to the international scientific community. Although the BioMart software is primarily used by the biomedical research community, it is designed in such a way that any type of data can be incorporated into the BioMart framework. The BioMart project originated at the European Bioinformatics Institute as a data management solution for the Human Genome Project. Since then, BioMart has grown to become a multi-institute collaboration involving various database projects on five continents.
Experimental factor ontology, also known as EFO, is an open-access ontology of experimental variables particularly those used in molecular biology. The ontology covers variables which include aspects of disease, anatomy, cell type, cell lines, chemical compounds and assay information. EFO is developed and maintained at the EMBL-EBI as a cross-cutting resource for the purposes of curation, querying and data integration in resources such as Ensembl, ChEMBL and Expression Atlas.
In bioinformatics, a Gene Disease Database is a systematized collection of data, typically structured to model aspects of reality, in a way to comprehend the underlying mechanisms of complex diseases, by understanding multiple composite interactions between phenotype-genotype relationships and gene-disease mechanisms. Gene Disease Databases integrate human gene-disease associations from various expert curated databases and text mining derived associations including Mendelian, complex and environmental diseases.
Cathy H. Wu is the Edward G. Jefferson Chair and professor and director of the Center for Bioinformatics & Computational Biology (CBCB) at the University of Delaware. She is also the director of the Protein Information Resource (PIR) and the North east Bioinformatics Collaborative Steering Committee, and the adjunct professor at the Georgetown University Medical Center.
Model organism databases (MODs) are biological databases, or knowledgebases, dedicated to the provision of in-depth biological data for intensively studied model organisms. MODs allow researchers to easily find background information on large sets of genes, plan experiments efficiently, combine their data with existing knowledge, and construct novel hypotheses. They allow users to analyse results and interpret datasets, and the data they generate are increasingly used to describe less well studied species. Where possible, MODs share common approaches to collect and represent biological information. For example, all MODs use the Gene Ontology (GO) to describe functions, processes and cellular locations of specific gene products. Projects also exist to enable software sharing for curation, visualization and querying between different MODs. Organismal diversity and varying user requirements however mean that MODs are often required to customize capture, display, and provision of data.
Coisogenic strains are one type of inbred strain that differs by a mutation at a single locus and all of the other loci are identical. There are numerous ways to create an inbred strain and each of these strains are unique. Genetically engineered mice can be considered a coisogenic strain if the only difference between the engineered mouse and a wild-type mouse is a specific locus. Coisogenic strains can be used to investigate the function of a certain genetic locus.
PathoPhenoDB is a biological database. The database connects pathogens to their phenotypes using multiple databases such as NCBI, Human Disease Ontology Human Phenotype Ontology, Mammalian Phenotype Ontology, PubChem, SIDER and CARD. Pathogen-disease associations were gathered mainly through the CDC and the List of Infectious Diseases page on Wikipedia. The manner by which they assigned taxonomy was semi-automatic. When mapped against NCBI Taxonomy, if the pathogen was not an exact match, it was then mapped to the parent class. PathoPhenoDB employs NPMI in order to filter pairs based on their co-occurrence statistics.
The laboratory mouse has been instrumental in investigating the genetics of human disease, including cancer, for over 110 years. The laboratory mouse has physiology and genetic characteristics very similar to humans providing powerful models for investigation of the genetic characteristics of disease.
Judith Anne Blake is a computational biologist at the Jackson Laboratory and Professor of Mammalian Genetics.