Written in | Java |
---|---|
Operating system | Unix-like |
Available in | English |
Type | Federated database system |
License | LGPL |
Website | useast |
BioMart is a community-driven project to provide a single point of access to distributed research data. The BioMart project contributes open source software and data services to the international scientific community. Although the BioMart software is primarily used by the biomedical research community, it is designed in such a way that any type of data can be incorporated into the BioMart framework. The BioMart project originated at the European Bioinformatics Institute as a data management solution [1] for the Human Genome Project. [2] Since then, BioMart has grown to become a multi-institute collaboration involving various database projects on five continents. [3] [4] [5] [6]
BioMart is a powerful tool for researchers and bioinformaticians that allows a user to export data from Ensembl, this could include data such as gene ID’s, gene positions, associated variations, protein domains and sequences. BioMArt allows the data to be exported into convenient file types like FASTA, XLS, CSV, TSV, HTML. Researchers can use the exported data in a variety of applications, including genomic studies, gene expression analysis, and comparative genomics. BioMart's intuitive interface enables users to customize queries to access specific data sets or features of interest easily [7]
BioMart is a freely available, open-source, federated database system that provides unified access to disparate, geographically distributed data sources. [8] BioMart allows databases hosted on different servers to be presented seamlessly to users, facilitating collaborative projects. BioMart contains several levels of query optimization to efficiently manage large data sets, and offers a diverse selection of graphical user interfaces and application programming interfaces to allow queries to be performed in whatever manner is most convenient for the user. BioMart's capabilities are extended by integration with several widely used software packages such as Bioconductor, [9] Galaxy, [10] Cytoscape, [11] and Taverna. [12]
There are around 40 BioMart data sources including the Atlas of UTR Regulatory Activity (AURA), the COSMIC cancer database, Ensembl Genomes, HapMap, InterPro, Mouse Genome Informatics (MGI), Rfam and UniProt. Access is provided by institutions including the European Bioinformatics Institute (EBI) and the Wellcome Trust Sanger Institute in the UK, Cold Spring Harbor Laboratory and the National Center for Biotechnology Information (NCBI) in the United States and French National Centre for Scientific Research (CNRS). [13] The BioMart Central Portal was established to provide a convenient single point of access to this growing pool of data sources. [3] [5] [6]
Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.
BioPerl is a collection of Perl modules that facilitate the development of Perl scripts for bioinformatics applications. It has played an integral role in the Human Genome Project.
Ensembl genome database project is a scientific project at the European Bioinformatics Institute, which provides a centralized resource for geneticists, molecular biologists and other researchers studying the genomes of our own species and other vertebrates and model organisms. Ensembl is one of several well known genome browsers for the retrieval of genomic information.
The European Bioinformatics Institute (EMBL-EBI) is an intergovernmental organization (IGO) which, as part of the European Molecular Biology Laboratory (EMBL) family, focuses on research and services in bioinformatics. It is located on the Wellcome Genome Campus in Hinxton near Cambridge, and employs over 600 full-time equivalent (FTE) staff. Institute leaders such as Rolf Apweiler, Alex Bateman, Ewan Birney, and Guy Cochrane, an adviser on the National Genomics Data Center Scientific Advisory Board, serve as part of the international research network of the BIG Data Center at the Beijing Institute of Genomics.
The Rat Genome Database (RGD) is a database of rat genomics, genetics, physiology and functional data, as well as data for comparative genomics between rat, human and mouse. RGD is responsible for attaching biological information to the rat genome via structured vocabulary, or ontology, annotations assigned to genes and quantitative trait loci (QTL), and for consolidating rat strain data and making it available to the research community. They are also developing a suite of tools for mining and analyzing genomic, physiologic and functional data for the rat, and comparative data for rat, mouse, human, and five other species.
Apache Taverna was an open source software tool for designing and executing workflows, initially created by the myGrid project under the name Taverna Workbench, then a project under the Apache incubator. Taverna allowed users to integrate many different software components, including WSDL SOAP or REST Web services, such as those provided by the National Center for Biotechnology Information, the European Bioinformatics Institute, the DNA Databank of Japan (DDBJ), SoapLab, BioMOBY and EMBOSS. The set of available services was not finite and users could import new service descriptions into the Taverna Workbench.
FlyBase is an online bioinformatics database and the primary repository of genetic and molecular data for the insect family Drosophilidae. For the most extensively studied species and model organism, Drosophila melanogaster, a wide range of data are presented in different formats.
Galaxy is a scientific workflow, data integration, and data and analysis persistence and publishing platform that aims to make computational biology accessible to research scientists that do not have computer programming or systems administration experience. Although it was initially developed for genomics research, it is largely domain agnostic and is now used as a general bioinformatics workflow management system.
BIOBASE is an international bioinformatics company headquartered in Wolfenbüttel, Germany. The company focuses on the generation, maintenance, and licensing of databases in the field of molecular biology, and their related software platforms.
The UCSC Genome Browser is an online and downloadable genome browser hosted by the University of California, Santa Cruz (UCSC). It is an interactive website offering access to genome sequence data from a variety of vertebrate and invertebrate species and major model organisms, integrated with a large collection of aligned annotations. The Browser is a graphical viewer optimized to support fast interactive performance and is an open-source, web-based tool suite built on top of a MySQL database for rapid visualization, examination, and querying of the data at many levels. The Genome Browser Database, browsing tools, downloadable data files, and documentation can all be found on the UCSC Genome Bioinformatics website.
The National Center for Integrative Biomedical Informatics (NCIBI) is one of seven National Centers for Biomedical Computing funded by the National Institutes of Health's (NIH) Roadmap for Medical Research. The center is based at the University of Michigan and is part of the Center for Computational Medicine and Bioinformatics. NCIBI's mission is to create targeted knowledge environments for molecular biomedical research to help guide experiments and enable new insights from the analysis of complex diseases. It was established in October 2005.
Anduril is an open source component-based workflow framework for scientific data analysis developed at the Systems Biology Laboratory, University of Helsinki.
BioSamples (BioSD) is a database at European Bioinformatics Institute for the information about the biological samples used in sequencing.
Ensembl Genomes is a scientific project to provide genome-scale data from non-vertebrate species.
A bioinformatics workflow management system is a specialized form of workflow management system designed specifically to compose and execute a series of computational or data manipulation steps, or a workflow, that relate to bioinformatics.
In bioinformatics, the PANTHER classification system is a large curated biological database of gene/protein families and their functionally related subfamilies that can be used to classify and identify the function of gene products. PANTHER is part of the Gene Ontology Reference Genome Project designed to classify proteins and their genes for high-throughput analysis.
Gene set enrichment analysis (GSEA) (also called functional enrichment analysis or pathway enrichment analysis) is a method to identify classes of genes or proteins that are over-represented in a large set of genes or proteins, and may have an association with different phenotypes (e.g. different organism growth patterns or diseases). The method uses statistical approaches to identify significantly enriched or depleted groups of genes. Transcriptomics technologies and proteomics results often identify thousands of genes, which are used for the analysis.
Model organism databases (MODs) are biological databases, or knowledgebases, dedicated to the provision of in-depth biological data for intensively studied model organisms. MODs allow researchers to easily find background information on large sets of genes, plan experiments efficiently, combine their data with existing knowledge, and construct novel hypotheses. They allow users to analyse results and interpret datasets, and the data they generate are increasingly used to describe less well studied species. Where possible, MODs share common approaches to collect and represent biological information. For example, all MODs use the Gene Ontology (GO) to describe functions, processes and cellular locations of specific gene products. Projects also exist to enable software sharing for curation, visualization and querying between different MODs. Organismal diversity and varying user requirements however mean that MODs are often required to customize capture, display, and provision of data.
Biocuration is the field of life sciences dedicated to organizing biomedical data, information and knowledge into structured formats, such as spreadsheets, tables and knowledge graphs. The biocuration of biomedical knowledge is made possible by the cooperative work of biocurators, software developers and bioinformaticians and is at the base of the work of biological databases.