InterMine

Last updated
InterMine
IntermineLogo.png
Content
DescriptionOpen source data warehouse system for the integration and analysis of biological data.
Organisms varied
Access
Website http://www.intermine.org/
Download URL https://github.com/intermine/intermine
Web service URL http://iodocs.apps.intermine.org/
Miscellaneous
License LGPL 2.1
Data release
frequency
Quarterly
Version1.6.6
Bookmarkable
entities
Yes

InterMine is an open source data warehouse system, licensed under the LGPL 2.1. InterMine is used to create databases of biological data accessed by sophisticated web query tools. InterMine can be used to create databases from a single data set or can integrate multiple sources of data. Support is provided for several common biological formats and there is a framework for adding other data. InterMine includes a user-friendly web interface that works 'out of the box' and can be easily customised. [1] [2]

Contents

InterMine makes it easy to integrate multiple data sources into a single data warehouse. It has a core data model based on the sequence ontology and supports several biological data formats, allowing sysadmins to configure which organisms or data files are required. It is easy to extend the data model and integrate other data, with a web service API, clients in seven different languages, and an XML format to help import custom data.

As an active open source project, InterMine maintains a developer mailing list and thorough developer and user documentation.

Supported data formats

Clients

Web clients allow users to access the data programatically with minimal effort, and are available for perl, python, ruby, javascript, Java, and R. Data can also be queried via a native Android app.

Web application

The InterMine web application allows creation of custom bioinformatics queries, includes template queries (web forms to run 'canned' queries). Users can upload and operate on lists of data. It is possible to configure/create widgets to analyse lists with graphs and enrichment statistics.

An admin user can publish new template queries, change report pages and create public lists at any time without any programming. Many aspects of the web app can be configured and branded.

Current projects (not exhaustive list)

An up-to-date list of projects can be viewed at the InterMine Registry

Related Research Articles

IBM Db2 Relational model database server

Db2 is a family of data management products, including database servers, developed by IBM. They initially supported the relational model, but were extended to support object–relational features and non-relational structures like JSON and XML. The brand name was originally styled as DB/2, then DB2 until 2017 and finally changed to its present form.

National Center for Biotechnology Information Database branch of the US National Library of Medicine

The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The NCBI is located in Bethesda, Maryland, and was founded in 1988 through legislation sponsored by US Congressman Claude Pepper.

Web application Application that uses a web browser as a client

A web application is application software that runs on a web server, unlike computer-based software programs that are run locally on the operating system (OS) of the device. Web applications are accessed by the user through a web browser with an active network connection. These applications are programmed using a client–server modeled structure—the user ("client") is provided services through an off-site server that is hosted by a third-party. Examples of commonly-used web applications include: web-mail, online retail sales, online banking, and online auctions.

Biological database

Biological databases are libraries of biological sciences, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis. They contain information from research areas including genomics, proteomics, metabolomics, microarray gene expression, and phylogenetics. Information contained in biological databases includes gene function, structure, localization, clinical effects of mutations as well as similarities of biological sequences and structures.

Enterprise information integration (EII) is the ability to support an unified view of data and information for an entire organization. In a data virtualization application of EII, a process of information integration, using data abstraction to provide a unified interface for viewing all the data within an organization, and a single set of structures and naming conventions to represent this data; the goal of EII is to get a large set of heterogeneous data sources to appear to a user or system as a single, homogeneous data source.

Hibernate ORM is an object–relational mapping tool for the Java programming language. It provides a framework for mapping an object-oriented domain model to a relational database. Hibernate handles object–relational impedance mismatch problems by replacing direct, persistent database accesses with high-level object handling functions.

Ensembl genome database project Scientific project at the European Bioinformatics Institute

Ensembl genome database project is a scientific project at the European Bioinformatics Institute, which was launched in 1999 in response to the imminent completion of the Human Genome Project. Ensembl aims to provide a centralized resource for geneticists, molecular biologists and other researchers studying the genomes of our own species and other vertebrates and model organisms. Ensembl is one of several well known genome browsers for the retrieval of genomic information.

InterPro is a database of protein families, domains and functional sites in which identifiable features found in known proteins can be applied to new protein sequences in order to functionally characterise them.

The Spring Framework is an application framework and inversion of control container for the Java platform. The framework's core features can be used by any Java application, but there are extensions for building web applications on top of the Java EE platform. Although the framework does not impose any specific programming model, it has become popular in the Java community as an addition to the Enterprise JavaBeans (EJB) model. The Spring Framework is open source.

BioMOBY is a registry of web services used in bioinformatics. It allows interoperability between biological data hosts and analytical services by annotating services with terms taken from standard ontologies. BioMOBY is released under the Artistic License.

The EB-eye, also known as EBI Search, is a search engine that provides uniform access to the biological data resources hosted at the European Bioinformatics Institute (EBI).

The Biomolecular Object Network Databank is a bioinformatics databank containing information on small molecule structures and interactions. The databank integrates a number of existing databases to provide a comprehensive overview of the information currently available for a given molecule.

Integrative bioinformatics is a discipline of bioinformatics that focuses on problems of data integration for the life sciences.

Microsoft SQL Server is a relational database management system developed by Microsoft. As a database server, it is a software product with the primary function of storing and retrieving data as requested by other software applications—which may run either on the same computer or on another computer across a network. Microsoft markets at least a dozen different editions of Microsoft SQL Server, aimed at different audiences and for workloads ranging from small single-machine applications to large Internet-facing applications with many concurrent users.

Db4o

db4o was an embeddable open-source object database for Java and .NET developers. It was developed, commercially licensed and supported by Actian. In October 2014, Actian declined to continue to actively pursue and promote the commercial db4o product offering for new customers.

In computing, Open Data Protocol (OData) is an open protocol that allows the creation and consumption of queryable and interoperable REST APIs in a simple and standard way. Microsoft initiated OData in 2007. Versions 1.0, 2.0, and 3.0 are released under the Microsoft Open Specification Promise. Version 4.0 was standardized at OASIS, with a release in March 2014. In April 2015 OASIS submitted OData v4 and OData JSON Format v4 to ISO/IEC JTC 1 for approval as an international standard. In December 2016, ISO/IEC published OData 4.0 Core as ISO/IEC 20802-1:2016 and the OData JSON Format as ISO/IEC 20802-2:2016.

The BioSamples Database (BioSD) is a database at European Bioinformatics Institute for the information about the biological samples used in sequencing.

BisQue is a free, open source web-based platform for the exchange and exploration of large, complex datasets. It is being developed at the Vision Research Lab at the University of California, Santa Barbara. BisQue specifically supports large scale, multi-dimensional multimodal-images and image analysis. Metadata is stored as arbitrarily nested and linked tag/value pairs, allowing for domain-specific data organization. Image analysis modules can be added to perform complex analysis tasks on compute clusters. Analysis results are stored within the database for further querying and processing. The data and analysis provenance is maintained for reproducibility of results. BisQue can be easily deployed in cloud computing environments or on computer clusters for scalability. BisQue has been integrated into the NSF Cyberinfrastructure project CyVerse. The user interacts with BisQue via any modern web browser.

Oracle TopLink is a mapping and persistence framework for Java developers. TopLink is produced by Oracle and is a part of Oracle's OracleAS, WebLogic, and OC4J servers. It is an object-persistence and object-transformation framework. TopLink provides development tools and run-time functionalities that ease the development process and help increase functionality. Persistent object-oriented data is stored in relational databases which helps build high-performance applications. Storing data in either XML or relational databases is made possible by transforming it from object-oriented data.

References

  1. http://www.intermine.org
  2. Smith, Richard N.; Aleksic, Jelena; et al. (1 December 2012). "InterMine: a flexible data warehouse system for the integration and analysis of heterogeneous biological data". Bioinformatics . 28 (23): 3163–3165. doi:10.1093/bioinformatics/bts577. ISSN   1367-4803. PMC   3516146 . PMID   23023984.