GenePattern

Last updated
GenePattern
Developer(s) Broad Institute, University of California, San Diego
Stable release
3.9.11 rc4 b216 / May 2019
Operating system
Type genomic analysis
License BSD [1]
Website www.genepattern.org

GenePattern is a freely available computational biology open-source software package originally created and developed at the Broad Institute for the analysis of genomic data. Designed to enable researchers to develop, capture, and reproduce genomic analysis methodologies, GenePattern was first released in 2004. GenePattern is currently developed at the University of California, San Diego.

Contents

Functionality

GenePattern is a powerful scientific workflow system that provides access to hundreds of genomic analysis tools. Use these analysis tools as building blocks to design sophisticated analysis pipelines that capture the methods, parameters, and data used to produce analysis results. Pipelines can be used to create, edit and share reproducible in silico results.

Project Objectives

  1. Accessibility: Run over 200 regularly updated analysis and visualization tools (that support data preprocessing, gene expression analysis, proteomics, Single nucleotide polymorphism (SNP) analysis, flow cytometry, and next-generation sequencing) and create analytic workflows without any programming through a point and click user interface.
  2. Reproducibility: Automated history and provenance tracking with versioning so that any user can share, repeat and understand a complete computational analysis
  3. Extensibility: Computational users can import their methods and code for sharing using tools that support easy creation and integration
  4. Multiple interfaces: Web browser, application, and programmatic interfaces make analysis modules and pipelines available to a broad range of users; public hosted server

Features

Availability

GenePattern is available:

  1. As a free public web application, [3] hosted on Amazon Web Services. Users can create accounts, perform analyses, and create pipelines on the server.
  2. As open-source software that can be downloaded and installed locally. [4]
  3. Public web servers hosted by other organizations. [5]

Notes

  1. "GenePattern: A platform for reproducible bioinformatics". GitHub . 5 November 2021.
  2. Kuehn, Heidi; Liberzon, Arthur; Reich, Michael; Mesirov, Jill P. (June 2008). "Using GenePattern for Gene Expression Analysis". Current Protocols in Bioinformatics. 22 (1): 7.12.1–7.12.39. doi:10.1002/0471250953.bi0712s22. ISSN   1934-3396. PMC   3893799 . PMID   18551415.
  3. "GenePattern". cloud.genepattern.org. Retrieved 2012-05-07.
  4. "GenePattern: Download GenePattern". Broadinstitute.org. Archived from the original on 2012-05-09. Retrieved 2012-05-07.
  5. "Use GenePattern". genepattern.org. 2006-10-07. Archived from the original on 2012-06-07. Retrieved 2012-05-07.

Related Research Articles

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The process of analyzing and interpreting data can some times referred to as computational biology, however this distinction between the two terms is often disputed. To some, the term computational biology refers to building and using models of biological systems.

In bioinformatics, sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of a wide range of analytical methods to understand its features, function, structure, or evolution. It can be performed on the entire genome, transcriptome or proteome of an organism, and can also involve only selected segments or regions, like tandem repeats and transposable elements. Methodologies used include sequence alignment, searches against biological databases, and others.

Bioconductor is a free, open source and open development software project for the analysis and comprehension of genomic data generated by wet lab experiments in molecular biology.

<span class="mw-page-title-main">Orange (software)</span> Open-source data analysis software

Orange is an open-source data visualization, machine learning and data mining toolkit. It features a visual programming front-end for exploratory qualitative data analysis and interactive data visualization.

Computational genomics refers to the use of computational and statistical analysis to decipher biology from genome sequences and related data, including both DNA and RNA sequence as well as other "post-genomic" data. These, in combination with computational and statistical approaches to understanding the function of the genes and statistical association analysis, this field is also often referred to as Computational and Statistical Genetics/genomics. As such, computational genomics may be regarded as a subset of bioinformatics and computational biology, but with a focus on using whole genomes to understand the principles of how the DNA of a species controls its biology at the molecular level and beyond. With the current abundance of massive biological datasets, computational studies have become one of the most important means to biological discovery.

The completion of the human genome sequencing in the early 2000s was a turning point in genomics research. Scientists have conducted series of research into the activities of genes and the genome as a whole. The human genome contains around 3 billion base pairs nucleotide, and the huge quantity of data created necessitates the development of an accessible tool to explore and interpret this information in order to investigate the genetic basis of disease, evolution, and biological processes. The field of genomics has continued to grow, with new sequencing technologies and computational tool making it easier to study the genome.

<span class="mw-page-title-main">Microarray analysis techniques</span>

Microarray analysis techniques are used in interpreting the data generated from experiments on DNA, RNA, and protein microarrays, which allow researchers to investigate the expression state of a large number of genes – in many cases, an organism's entire genome – in a single experiment. Such experiments can generate very large amounts of data, allowing researchers to assess the overall state of a cell or organism. Data in such large quantities is difficult – if not impossible – to analyze without the help of computer programs.

Mouse Genome Informatics (MGI) is a free, online database and bioinformatics resource hosted by The Jackson Laboratory, with funding by the National Human Genome Research Institute (NHGRI), the National Cancer Institute (NCI), and the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD). MGI provides access to data on the genetics, genomics and biology of the laboratory mouse to facilitate the study of human health and disease. The database integrates multiple projects, with the two largest contributions coming from the Mouse Genome Database and Mouse Gene Expression Database (GXD). As of 2018, MGI contains data curated from over 230,000 publications.

<span class="mw-page-title-main">Galaxy (computational biology)</span>

Galaxy is a scientific workflow, data integration, and data and analysis persistence and publishing platform that aims to make computational biology accessible to research scientists that do not have computer programming or systems administration experience. Although it was initially developed for genomics research, it is largely domain agnostic and is now used as a general bioinformatics workflow management system.

The Viral Bioinformatics Resource Center (VBRC) is an online resource providing access to a database of curated viral genomes and a variety of tools for bioinformatic genome analysis. This resource was one of eight BRCs funded by NIAID with the goal of promoting research against emerging and re-emerging pathogens, particularly those seen as potential bioterrorism threats. The VBRC is now supported by Dr. Chris Upton at the University of Victoria.

<span class="mw-page-title-main">LONI Pipeline</span> Scientific workflow software

The LONI Pipeline is a free distributed system for designing, executing, monitoring and sharing scientific workflows on grid computing architectures. Pipeline allows users to connect and run any number of different software tools, and conveniently visualize and download the results.

A scientific workflow system is a specialized form of a workflow management system designed specifically to compose and execute a series of computational or data manipulation steps, or workflow, in a scientific application.

LabKey Server is a software suite available for scientists to integrate, analyze, and share biomedical research data. The platform provides a secure data repository that allows web-based querying, reporting, and collaborating across a range of data sources. Specific scientific applications and workflows can be added on top of the basic platform and leverage a data processing pipeline.

GenomeSpace is an environment for genomics software tools and applications. It helps users manage their analysis workflows involving multiple diverse tools, including web applications and desktop tools and facilitates the transfer of data between tools via automatic format conversion. Analyses can use data from local or cloud-based stores.

A bioinformatics workflow management system is a specialized form of workflow management system designed specifically to compose and execute a series of computational or data manipulation steps, or a workflow, that relate to bioinformatics.

Flow cytometry bioinformatics is the application of bioinformatics to flow cytometry data, which involves storing, retrieving, organizing and analyzing flow cytometry data using extensive computational resources and tools. Flow cytometry bioinformatics requires extensive use of and contributes to the development of techniques from computational statistics and machine learning. Flow cytometry and related methods allow the quantification of multiple independent biomarkers on large numbers of single cells. The rapid growth in the multidimensionality and throughput of flow cytometry data, particularly in the 2000s, has led to the creation of a variety of computational analysis methods, data standards, and public databases for the sharing of results.

<span class="mw-page-title-main">Gene set enrichment analysis</span> Bioinformatics method

Gene set enrichment analysis (GSEA) (also called functional enrichment analysis or pathway enrichment analysis) is a method to identify classes of genes or proteins that are over-represented in a large set of genes or proteins, and may have an association with different phenotypes (e.g. different organism growth patterns or diseases). The method uses statistical approaches to identify significantly enriched or depleted groups of genes. Transcriptomics technologies and proteomics results often identify thousands of genes, which are used for the analysis.

The BioCompute Object (BCO) project is a community-driven initiative to build a framework for standardizing and sharing computations and analyses generated from High-throughput sequencing. The project has since been standardized as IEEE 2791-2020, and the project files are maintained in an open source repository. The July 22nd, 2020 edition of the Federal Register announced that the FDA now supports the use of BioCompute in regulatory submissions, and the inclusion of the standard in the Data Standards Catalog for the submission of HTS data in NDAs, ANDAs, BLAs, and INDs to CBER, CDER, and CFSAN.

Nextflow is a scientific workflow system predominantly used for bioinformatic data analysis. It establishes standards for programmatically creating a series of dependent computational steps and facilitates their execution on various local and cloud resources.

References

Related software: