Gene set enrichment analysis

Last updated
Schematic overview of the modular structure underlying procedures for gene set enrichment analysis Gsea meta.png
Schematic overview of the modular structure underlying procedures for gene set enrichment analysis

Gene set enrichment analysis (GSEA) (also called functional enrichment analysis or pathway enrichment analysis) is a method to identify classes of genes or proteins that are over-represented in a large set of genes or proteins, and may have an association with different phenotypes (e.g. different organism growth patterns or diseases). The method uses statistical approaches to identify significantly enriched or depleted groups of genes. Transcriptomics technologies and proteomics results often identify thousands of genes, which are used for the analysis. [1]

Contents

Researchers performing high-throughput experiments that yield sets of genes (for example, genes that are differentially expressed under different conditions) often want to retrieve a functional profile of that gene set, in order to better understand the underlying biological processes. This can be done by comparing the input gene set to each of the bins (terms) in the gene ontology – a statistical test can be performed for each bin to see if it is enriched for the input genes.

Background

After the completion of the Human Genome Project, the problem of how to interpret and analyze it remained. In order to seek out genes associated with diseases, DNA microarrays were used to measure the amount of gene expression in different cells. Microarrays on thousands of different genes were carried out, and comparisons the results of two different cell categories, e.g. normal cells versus cancerous cells. However, this method of comparison is not sensitive enough to detect the subtle differences between the expression of individual genes, because diseases typically involve entire groups of genes. [2] Multiple genes are linked to a single biological pathway, and so it is the additive change in expression within gene sets that leads to the difference in phenotypic expression. Gene Set Enrichment Analysis was developed [2] to focus on the changes of expression in groups of a priori defined gene sets. By doing so, this method resolves the problem of the undetectable, small changes in the expression of single genes. [3]

Methods

Gene set enrichment analysis uses a priori gene sets that have been grouped together by their involvement in the same biological pathway, or by proximal location on a chromosome. [1] A database of these predefined sets can be found at the Molecular signatures database (MSigDB). [4] [5] In GSEA, DNA microarrays, or now RNA-Seq, are still performed and compared between two cell categories, but instead of focusing on individual genes in a long list, the focus is put on a gene set. [1] Researchers analyze whether the majority of genes in the set fall in the extremes of this list: the top and bottom of the list correspond to the largest differences in expression between the two cell types. If the gene set falls at either the top (over-expressed) or bottom (under-expressed), it is thought to be related to the phenotypic differences.

In the method that is typically referred to as standard GSEA, there are three steps involved in the analytical process. [1] [2] The general steps are summarized below:

  1. Calculate the enrichment score (ES) that represents the amount to which the genes in the set are over-represented at either the top or bottom of the list. This score is a Kolmogorov–Smirnov-like statistic. [1] [2]
  2. Estimate the statistical significance of the ES. This calculation is done by a phenotypic-based permutation test in order to produce a null distribution for the ES. The P value is determined by comparison to the null distribution. [1] [2]
    • Calculating significance this way tests for the dependence of the gene set on the diagnostic/phenotypic labels [1] [2]
  3. Adjust for multiple hypothesis testing for when a large number of gene sets are being analyzed at one time. The enrichment scores for each set are normalized and a false discovery rate is calculated. [1] [2]

This can be described as:

Where is the rank of the gene, is the power usually set to 1 (if it were 0, it would be equivalent to the Kolmogorov–Smirnov test).

Limitations and proposed alternatives

SEA

When GSEA was first proposed in 2003 some immediate concerns were raised regarding its methodology. These criticisms led to the use of the correlation-weighted Kolmogorov–Smirnov test, the normalized ES, and the false discovery rate calculation, all of which are the factors that currently define standard GSEA. [6] However, GSEA has now also been criticized for the fact that its null distribution is superfluous, and too difficult to be worth calculating, as well as the fact that its Kolmogorov–Smirnov-like statistic is not as sensitive as the original. [6] As an alternative, the method known as Simpler Enrichment Analysis (SEA), was proposed. This method assumes gene independence and uses a simpler approach to calculate t-test. However, it is thought that these assumptions are in fact too simplifying, and gene correlation cannot be disregarded. [6]

SGSE

One other limitation to Gene Set Enrichment Analysis is that the results are very dependent on the algorithm that clusters the genes, and the number of clusters being tested. [7] Spectral Gene Set Enrichment (SGSE) is a proposed, unsupervised test. The method's founders claim that it is a better way to find associations between MSigDB gene sets and microarray data. The general steps include:

1. Calculating the association between principal components and gene sets. [7]

2. Using the weighted Z-method to calculate the association between the gene sets and the spectral structure of the data. [7]

Tools

GSEA uses complicated statistics, so it requires a computer program to run the calculations. GSEA has become standard practice, and there are many websites and downloadable programs that will provide the data sets and run the analysis.

MOET

Multi-Ontology Enrichment Tool (MOET): MOET is a web-based ontology analysis tool that provides functionality for multiple ontologies, including Disease, GO, Pathway, Phenotype, and Chemical entities (ChEBI) for multiple species, including rat, mouse, human, bonobo, squirrel, dog, pig, chinchilla, naked mole-rat and vervet (green monkey). It outputs a downloadable graph and a list of statistically overrepresented terms in the user's list of genes using hypergeometric distribution. MOET also displays the corresponding Bonferroni correction and odds ratio on the results page. It is simple to use, and results are provided with a few clicks in seconds; no software installations or programming skills are required. In addition, MOET is updated weekly, providing the user with the most recent data for analyses.

NASQAR

NASQAR (Nucleic Acid SeQuence Analysis Resource) is an open source, web-based platform for high-throughput sequencing data analysis and visualization. [8] [9] Users can perform GSEA using the popular R-based clusterProfiler package [10] in a simple, user-friendly web app. NASQAR currently supports GO Term and KEGG Pathway enrichment with all organisms supported by an Org.Db database. [11]

PlantRegMap

The gene ontology (GO) annotation for 165 plant species and GO enrichment analysis is available. [12]

MSigDB

The Molecular Signatures Database hosts an extensive collection of annotated gene sets that can be used with most GSEA Software.

Broad Institute

The Broad Institute website is in cooperation with MSigDB and has a downloadable GSEA software, as well a general tutorial for those new to performing this analytical technique. [13]

WebGestalt

WebGestalt [14] is a web based gene set analysis toolkit. It supports three well-established and complementary methods for enrichment analysis, including Over-Representation Analysis (ORA), Gene Set Enrichment Analysis (GSEA), and Network Topology-based Analysis (NTA). Analysis can be performed against 12 organisms and 321,251 functional categories using 354 gene identifiers from various databases and technology platforms.

Enrichr

Enrichr [15] [16] [17] is a gene set enrichment analysis tool for mammalian gene sets. It contains background libraries for transcription regulation, pathways and protein interactions, ontologies including GO and the human and mouse phenotype ontologies, signatures from cells treated with drugs, gene sets associated with human diseases, and expression of genes in different cells and tissues. Enrichr was developed by the Ma'ayan Laboratory at the Icahn School of Medicine at Mount Sinai. [18] The background libraries are from over 200 resources and contain over 450,000 annotated gene sets. The tool can be accessed through API and provides different ways to visualize the results.

GeneSCF

GeneSCF is a real-time based functional enrichment tool with support for multiple organisms [19] and is designed to overcome the problems associated with using outdated resources and databases. [20] Advantages of using GeneSCF: real-time analysis, users do not have to depend on enrichment tools to get updated, easy for computational biologists to integrate GeneSCF with their NGS pipeline, it supports multiple organisms, enrichment analysis for multiple gene list using multiple source database in single run, retrieve or download complete GO terms/Pathways/Functions with associated genes as simple table format in a plain text file. [21] [22]

DAVID

DAVID is the database for annotation, visualization and integrated discovery, a bioinformatics tool that pools together information from most major bioinformatics sources, with the aim of analyzing large gene lists in a high-throughput manner. [23] DAVID goes beyond standard GSEA with additional functions like switching between gene and protein identifiers on the genome-wide scale, [23] however, the annotations used by DAVID was not updated since October 2016 to Dec 2021, [24] which can have a considerable impact on practical interpretation of results. [25] However, A most recent update was performed in 2021 [24]

Metascape

Metascape is a biologist-oriented gene-list analysis portal. [26] Metascape integrates pathway enrichment analysis, protein complex analysis, and multi-list meta-analysis into one seamless workflow accessible through a significantly simplified user interface. Metascape maintains analysis accuracy by updating its 40 underlying knowledgebases monthly. Metascape presents results using easy-to-interpret graphics, spreadsheets, and publication quality presentations, and is freely available. [27]

AmiGO 2

The Gene Ontology (GO) consortium has also developed their own online GO term enrichment tool, [28] allowing species-specific enrichment analysis versus the complete database, coarser-grained GO slims, or custom references. [29]

GREAT

In 2010, Gill Bejerano from Stanford University released the Genomic region enrichment of annotations tool (GREAT), a software which takes advantage of regulatory domains to better associate gene ontology terms to genes. [30] Its primary purpose is to identify pathways and processes that are significantly associated with factor regulating activity. This method maps genes with regulatory regions through a hypergeometric test over genes, inferring proximal gene regulatory domains. It does this by using the total fraction of the genome associated with a given ontology term as the expected fraction of input regions associated with the term by chance. Enrichment is calculated by all regulatory regions, and several experiments were performed to validate GREAT, one of which being enrichment analyses done on 8 ChIP-seq datasets. [31]

FunRich

The Functional Enrichment Analysis (FunRich) tool [32] is mainly used for the functional enrichment and network analysis of Omics data. [33]

FuncAssociate

FuncAssociate tool enables Gene Ontology and custom enrichment analyses. It allows inputting ordered sets as well as weighted gene space files for background. [34]

InterMine

Instances of InterMine automatically provide enrichment analysis [35] for uploaded sets of genes and other biological entities.

ToppGene suite

ToppGene is a one-stop portal for gene list enrichment analysis and candidate gene prioritization based on functional annotations and protein interactions network. [36] Developed and maintained by the Division of Biomedical Informatics at Cincinnati Children's Hospital Medical Center.

QuSAGE

Quantitative Set Analysis for Gene Expression (QuSAGE) is a computational method for gene set enrichment analysis. [37] QuSAGE improves power by accounting for inter-gene correlations and quantifies gene set activity with a complete probability density function (PDF). From this PDF, P values and confidence intervals can be easily extracted. Preserving the PDF also allows for post-hoc analysis (e.g., pair-wise comparisons of gene set activity) while maintaining statistical traceability. Turner et al. extended the applicability of QuSAGE to longitudinal studies by adding functionality for general linear mixed models. [38] QuSAGE was used by the NIH/NIAID Human Immunology Project Consortium to identify baseline transcriptional signatures that were associated with human influenza vaccination responses. [39] QuSAGE is available as an R/Bioconductor package, and is maintained by the Kleinstein Lab at Yale School of Medicine.

Blast2GO

Blast2GO is a bioinformatics platform for functional annotation and analysis of genomic datasets. [40] This tool allows to perform gene set enrichment analysis (GSEA), [41] among other functions.

g:Profiler

g:Profiler is a widely used toolset for finding biological categories enriched in gene lists, conversions between gene identifiers and mappings to their orthologs. Mission of g:Profiler is to provide a reliable service based on up to date high quality data in a convenient manner across many evidence types, identifier spaces and organisms. g:Profiler relies on Ensembl as a primary data source and follows their quarterly release cycle while updating the other data sources simultaneously. g:Profiler provides modern responsive interactive web interface, standardised API, an R package gprofiler2 and libraries. The results are delivered through interactive and configurable interface. Results can be downloaded as publication ready visualisations or delimited text files. g:Profiler supports close to 500 species and strains, including vertebrates, plants, fungi, insects and parasites. By supporting user uploaded custom GMT files, g:Profiler is capable of analysing data from any organism. All past releases are maintained for reproducibility and transparency. g:Profiler is freely available for all users at https://biit.cs.ut.ee/gprofiler.

Applications

Genome-wide association studies

Single-nucleotide polymorphisms, or SNPs, are single base mutations that may be associated with diseases. One base change has the potential to affect the protein that results from that gene being expressed; however, it also has the potential to have no effect at all. Genome-wide association studies (GWAS) are comparisons between healthy and disease genotypes to try to find SNPs that are overrepresented in the disease genomes, and might be associated with that condition. Before GSEA, the accuracy of genome-wide SNP association studies was severely limited by a high number of false positives. [42] The theory that the SNPs contributing to a disease tend to be grouped in a set of genes that are all involved in the same biological pathway, is what the GSEA-SNP method is based on. This application of GSEA does not only aid in the discovery of disease-associated SNPs, but helps illuminate the corresponding pathways and mechanisms of the diseases. [42]

Spontaneous preterm birth

Gene set enrichment methods led to the discovery of new suspect genes and biological pathways related to spontaneous preterm births. [43] Exome sequences from women who had experienced SPTB were compared to those from females from the 1000 Genome Project, using a tool that scored possible disease-causing variants. Genes with higher scores were then run through different programs to group them into gene sets based on pathways and ontology groups. This study found that the variants were significantly clustered in sets related to several pathways, all suspects in SPTB. [43]

Cancer cell profiling

Gene set enrichment analysis can be used to understand the changes that cells undergo during carcinogenesis and metastasis. In a study, microarrays were performed on renal cell carcinoma metastases, primary renal tumors, and normal kidney tissue, and the data was analyzed using GSEA. [44] This analysis showed significant changes of expression in genes involved in pathways that have not been previously associated with the progression of renal cancer. From this study, GSEA has provided potential new targets for renal cell carcinoma therapy.

Schizophrenia

GSEA can be used to help understand the molecular mechanisms of complex disorders. Schizophrenia is a largely heritable disorder, but is also very complex, and the onset of the disease involves many genes interacting within multiple pathways, as well the interaction of those genes with environmental factors. For instance, epigenetic changes, like DNA methylation, are affected by the environment, but are also inherently dependent on the DNA itself. DNA methylation is the most well-studied epigenetic change, and was recently analyzed using GSEA in relation to schizophrenia-related intermediate phenotypes. [45] Researchers ranked genes for their correlation between methylation patterns and each of the phenotypes. They then used GSEA to look for an enrichment of genes that are predicted to be targeted by microRNAs in the progression of the disease. [45]

Depression

GSEA can help provide molecular evidence for the association of biological pathways with diseases. Previous studies have shown that long-term depression symptoms are correlated with changes in immune response and inflammatory pathways. [46] Genetic and molecular evidence was sought to support this. Researchers took blood samples from sufferers of depression, and used genome-wide expression data, along with GSEA to find expression differences in gene sets related to inflammatory pathways. This study found that those people who rated with the most severe depression symptoms also had significant expression differences in those gene sets, and this result supports the association hypothesis. [46]

See also

Related Research Articles

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.

The Gene Ontology (GO) is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species. More specifically, the project aims to: 1) maintain and develop its controlled vocabulary of gene and gene product attributes; 2) annotate genes and gene products, and assimilate and disseminate annotation data; and 3) provide tools for easy access to all aspects of the data provided by the project, and to enable functional interpretation of experimental data using the GO, for example via enrichment analysis. GO is part of a larger classification effort, the Open Biomedical Ontologies, being one of the Initial Candidate Members of the OBO Foundry.

The Rat Genome Database (RGD) is a database of rat genomics, genetics, physiology and functional data, as well as data for comparative genomics between rat, human and mouse. RGD is responsible for attaching biological information to the rat genome via structured vocabulary, or ontology, annotations assigned to genes and quantitative trait loci (QTL), and for consolidating rat strain data and making it available to the research community. They are also developing a suite of tools for mining and analyzing genomic, physiologic and functional data for the rat, and comparative data for rat, mouse, human, and five other species.

<span class="mw-page-title-main">Gene expression profiling</span>

In the field of molecular biology, gene expression profiling is the measurement of the activity of thousands of genes at once, to create a global picture of cellular function. These profiles can, for example, distinguish between cells that are actively dividing, or show how the cells react to a particular treatment. Many experiments of this sort measure an entire genome simultaneously, that is, every gene present in a particular cell.

Reactome is a free online database of biological pathways. It is manually curated and authored by PhD-level biologists, in collaboration with Reactome editorial staff. The content is cross-referenced to many bioinformatics databases. The rationale behind Reactome is to visually represent biological pathways in full mechanistic detail, while making the source data available in a computationally accessible format.

<span class="mw-page-title-main">Microarray analysis techniques</span>

Microarray analysis techniques are used in interpreting the data generated from experiments on DNA, RNA, and protein microarrays, which allow researchers to investigate the expression state of a large number of genes – in many cases, an organism's entire genome – in a single experiment. Such experiments can generate very large amounts of data, allowing researchers to assess the overall state of a cell or organism. Data in such large quantities is difficult – if not impossible – to analyze without the help of computer programs.

<span class="mw-page-title-main">MicrobesOnline</span>

MicrobesOnline is a publicly and freely accessible website that hosts multiple comparative genomic tools for comparing microbial species at the genomic, transcriptomic and functional levels. MicrobesOnline was developed by the Virtual Institute for Microbial Stress and Survival, which is based at the Lawrence Berkeley National Laboratory in Berkeley, California. The site was launched in 2005, with regular updates until 2011.

DAVID is a free online bioinformatics resource developed by the Laboratory of Human Retrovirology and Immunoinformatics. All tools in the DAVID Bioinformatics Resources aim to provide functional interpretation of large lists of genes derived from genomic studies, e.g. microarray and proteomics studies. DAVID can be found at https://david.ncifcrf.gov/

<span class="mw-page-title-main">RNA-Seq</span> Lab technique in cellular biology

RNA-Seq is a technique that uses next-generation sequencing to reveal the presence and quantity of RNA molecules in a biological sample, providing a snapshot of gene expression in the sample, also known as transcriptome.

GeneCards is a database of human genes that provides genomic, proteomic, transcriptomic, genetic and functional information on all known and predicted human genes. It is being developed and maintained by the Crown Human Genome Center at the Weizmann Institute of Science, in collaboration with LifeMap Sciences.

Protein function prediction methods are techniques that bioinformatics researchers use to assign biological or biochemical roles to proteins. These proteins are usually ones that are poorly studied or predicted based on genomic sequence data. These predictions are often driven by data-intensive computational procedures. Information may come from nucleic acid sequence homology, gene expression profiles, protein domain structures, text mining of publications, phylogenetic profiles, phenotypic profiles, and protein-protein interaction. Protein function is a broad term: the roles of proteins range from catalysis of biochemical reactions to transport to signal transduction, and a single protein may play a role in multiple processes or cellular pathways.

<span class="mw-page-title-main">DNA annotation</span> The process of describing the structure and function of a genome

In molecular biology and genetics, DNA annotation or genome annotation is the process of describing the structure and function of the components of a genome, by analyzing and interpreting them in order to extract their biological significance and understand the biological processes in which they participate. Among other things, it identifies the locations of genes and all the coding regions in a genome and determines what those genes do.

dcGO is a comprehensive ontology database for protein domains. As an ontology resource, dcGO integrates Open Biomedical Ontologies from a variety of contexts, ranging from functional information like Gene Ontology to others on enzymes and pathways, from phenotype information across major model organisms to information about human diseases and drugs. As a protein domain resource, dcGO includes annotations to both the individual domains and supra-domains.

In bioinformatics, the PANTHER classification system is a large curated biological database of gene/protein families and their functionally related subfamilies that can be used to classify and identify the function of gene products. PANTHER is part of the Gene Ontology Reference Genome Project designed to classify proteins and their genes for high-throughput analysis.

Gene Ontology (GO) term enrichment is a technique for interpreting sets of genes making use of the Gene Ontology system of classification, in which genes are assigned to a set of predefined bins depending on their functional characteristics. For example, the gene FasR is categorized as being a receptor, involved in apoptosis and located on the plasma membrane.

Metatranscriptomics is the set of techniques used to study gene expression of microbes within natural environments, i.e., the metatranscriptome.

<span class="mw-page-title-main">Pathway analysis</span>

Pathway is the term from molecular biology for a curated schematic representation of a well characterized segment of the molecular physiological machinery, such as a metabolic pathway describing an enzymatic process within a cell or tissue or a signaling pathway model representing a regulatory process that might, in its turn, enable a metabolic or another regulatory process downstream. A typical pathway model starts with an extracellular signaling molecule that activates a specific receptor, thus triggering a chain of molecular interactions. A pathway is most often represented as a relatively small graph with gene, protein, and/or small molecule nodes connected by edges of known functional relations. While a simpler pathway might appear as a chain, complex pathway topologies with loops and alternative routes are much more common. Computational analyses employ special formats of pathway representation. In the simplest form, however, a pathway might be represented as a list of member molecules with order and relations unspecified. Such a representation, generally called Functional Gene Set (FGS), can also refer to other functionally characterised groups such as protein families, Gene Ontology (GO) and Disease Ontology (DO) terms etc. In bioinformatics, methods of pathway analysis might be used to identify key genes/ proteins within a previously known pathway in relation to a particular experiment / pathological condition or building a pathway de novo from proteins that have been identified as key affected elements. By examining changes in e.g. gene expression in a pathway, its biological activity can be explored. However most frequently, pathway analysis refers to a method of initial characterization and interpretation of an experimental condition that was studied with omics tools or genome-wide association study. Such studies might identify long lists of altered genes. A visual inspection is then challenging and the information is hard to summarize, since the altered genes map to a broad range of pathways, processes, and molecular functions. In such situations, the most productive way of exploring the list is to identify enrichment of specific FGSs in it. The general approach of enrichment analyses is to identify FGSs, members of which were most frequently or most strongly altered in the given condition, in comparison to a gene set sampled by chance. In other words, enrichment can map canonical prior knowledge structured in the form of FGSs to the condition represented by altered genes.

Metascape is a free gene annotation and analysis resource that helps biologists make sense of one or multiple gene lists. Metascape provides automated meta-analysis tools to understand either common or unique pathways and protein networks within a group of orthogonal target-discovery studies.

Model organism databases (MODs) are biological databases, or knowledgebases, dedicated to the provision of in-depth biological data for intensively studied model organisms. MODs allow researchers to easily find background information on large sets of genes, plan experiments efficiently, combine their data with existing knowledge, and construct novel hypotheses. They allow users to analyse results and interpret datasets, and the data they generate are increasingly used to describe less well studied species. Where possible, MODs share common approaches to collect and represent biological information. For example, all MODs use the Gene Ontology (GO) to describe functions, processes and cellular locations of specific gene products. Projects also exist to enable software sharing for curation, visualization and querying between different MODs. Organismal diversity and varying user requirements however mean that MODs are often required to customize capture, display, and provision of data.

References

  1. 1 2 3 4 5 6 7 8 Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. (October 2005). "Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles". Proceedings of the National Academy of Sciences of the United States of America. 102 (43): 15545–15550. doi: 10.1073/pnas.0506580102 . PMC   1239896 . PMID   16199517.
  2. 1 2 3 4 5 6 7 Mootha VK, Lindgren CM, Eriksson KF, Subramanian A, Sihag S, Lehar J, et al. (July 2003). "PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes". Nature Genetics. 34 (3): 267–273. doi:10.1038/ng1180. PMID   12808457. S2CID   13940856.
  3. Maleki F, Ovens K, Hogan DJ, Kusalik AJ (2020). "Gene Set Analysis: Challenges, Opportunities, and Future Research". Frontiers in Genetics. 11: 654. doi: 10.3389/fgene.2020.00654 . PMC   7339292 . PMID   32695141.
  4. Liberzon A, Birger C, Thorvaldsdóttir H, Ghandi M, Mesirov JP, Tamayo P (December 2015). "The Molecular Signatures Database (MSigDB) hallmark gene set collection". Cell Systems. 1 (6): 417–425. doi:10.1016/j.cels.2015.12.004. PMC   4707969 . PMID   26771021.
  5. "Molecular signature database (MSigDB) 3.0 (PDF Download Available)". ResearchGate.
  6. 1 2 3 Tamayo P, Steinhardt G, Liberzon A, Mesirov JP (February 2016). "The limitations of simple gene set enrichment analysis assuming gene independence". Statistical Methods in Medical Research. 25 (1): 472–487. arXiv: 1110.4128 . doi:10.1177/0962280212460441. PMC   3758419 . PMID   23070592.
  7. 1 2 3 Frost HR, Li Z, Moore JH (March 2015). "Spectral gene set enrichment (SGSE)". BMC Bioinformatics. 16 (1): 70. doi: 10.1186/s12859-015-0490-7 . PMC   4365810 . PMID   25879888.
  8. Yousif A, Drou N, Rowe J, Khalfan M, Gunsalus KC (June 2020). "NASQAR: a web-based platform for high-throughput sequencing data analysis and visualization". BMC Bioinformatics. 21 (1): 267. bioRxiv   10.1101/709980 . doi: 10.1186/s12859-020-03577-4 . PMC   7322916 . PMID   32600310.
  9. "NASQAR: Nucleic Acid SeQuence Analysis Resource".
  10. Yu G, Wang LG, Han Y, He QY (May 2012). "clusterProfiler: an R package for comparing biological themes among gene clusters". Omics. 16 (5): 284–287. doi:10.1089/omi.2011.0118. PMC   3339379 . PMID   22455463.
  11. "Bioconductor Org.Db Packages".
  12. "PlantRegMap: Plant Regulation Data and Analysis Platform @ CBI, PKU". plantregmap.cbi.pku.edu.cn. Archived from the original on 2017-02-08. Retrieved 2016-10-19.
  13. "GSEA | Desktop Tutorial". software.broadinstitute.org.
  14. "WebGestalt (WEB-based GEne SeT AnaLysis Toolkit)". www.webgestalt.org.
  15. Chen, E. Y.; Tan, C. M.; Kou, Y.; Duan, Q.; Wang, Z.; Meirelles, G. V.; Clark, N. R.; Ma'Ayan, A. (2013). "Chen EY, Tan CM, Kou Y, Duan Q, Wang Z, Meirelles GV, Clark NR, Ma'ayan A. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinformatics. 2013; 128(14)". BMC Bioinformatics. 14: 128. doi: 10.1186/1471-2105-14-128 . PMC   3637064 . PMID   23586463.
  16. Kuleshov, M. V.; Jones, M. R.; Rouillard, A. D.; Fernandez, N. F.; Duan, Q.; Wang, Z.; Koplev, S.; Jenkins, S. L.; Jagodnik, K. M.; Lachmann, A.; McDermott, M. G.; Monteiro, C. D.; Gundersen, G. W.; Ma'Ayan, A. (2016). "Kuleshov MV, Jones MR, Rouillard AD, Fernandez NF, Duan Q, Wang Z, Koplev S, Jenkins SL, Jagodnik KM, Lachmann A, McDermott MG, Monteiro CD, Gundersen GW, Ma'ayan A. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Research. 2016; gkw377". Nucleic Acids Research. 44 (W1): W90-7. doi:10.1093/nar/gkw377. PMC   4987924 . PMID   27141961.
  17. Xie, Z.; Bailey, A.; Kuleshov, M. V.; Clarke DJB; Evangelista, J. E.; Jenkins, S. L.; Lachmann, A.; Wojciechowicz, M. L.; Kropiwnicki, E.; Jagodnik, K. M.; Jeon, M.; Ma'Ayan, A. (2021). "Xie Z, Bailey A, Kuleshov MV, Clarke DJB., Evangelista JE, Jenkins SL, Lachmann A, Wojciechowicz ML, Kropiwnicki E, Jagodnik KM, Jeon M, & Ma'ayan A. Gene set knowledge discovery with Enrichr. Current Protocols. 1 e90 2021". Current Protocols. 1 (3): e90. doi:10.1002/cpz1.90. PMC   8152575 . PMID   33780170.
  18. "Ma'ayan Laboratory - Computational Systems Biology - Icahn School of Medicine at Mount Sinai". labs.icahn.mssm.edu. 19 September 2023.
  19. Subhash S, Kanduri C (September 2016). "GeneSCF: a real-time based functional enrichment tool with support for multiple organisms". BMC Bioinformatics. 17 (1): 365. doi: 10.1186/s12859-016-1250-z . PMC   5020511 . PMID   27618934.
  20. Wadi L, Meyer M, Weiser J, Stein LD, Reimand J (August 2016). "Impact of outdated gene annotations on pathway enrichment analysis". Nature Methods. 13 (9): 705–706. doi:10.1038/nmeth.3963. PMC   7802636 . PMID   27575621. S2CID   19548133.
  21. "GeneSCF::Gene Set Clustering based on Functional annotation". genescf.kandurilab.org.
  22. "Gene Set Clustering based on Functional annotation (GeneSCF)". www.biostars.org.
  23. 1 2 Huang DA, Sherman BT, Lempicki RA (2009). "Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources". Nature Protocols. 4 (1): 44–57. doi:10.1038/nprot.2008.211. PMID   19131956. S2CID   10418677.
  24. 1 2 DAVID release and version information, DAVID Bioinformatics Resources 6.8
  25. Huang DA, Sherman BT, Lempicki RA (1 December 2008). "Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources". Nature Protocols. 4 (1): 44–57. doi:10.1038/nprot.2008.211. PMID   19131956. S2CID   10418677.
  26. Zhou Y, Zhou B, Pache L, Chang M, Khodabakhshi AH, Tanaseichuk O, et al. (April 2019). "Metascape provides a biologist-oriented resource for the analysis of systems-level datasets". Nature Communications. 10 (1): 1523. Bibcode:2019NatCo..10.1523Z. doi:10.1038/s41467-019-09234-6. PMC   6447622 . PMID   30944313.
  27. "Metascape". metascape.org. Retrieved 20 December 2019.
  28. Gene Ontology Consortium. "AmiGO 2: Welcome". amigo.geneontology.org.
  29. Gene Ontology, Consortium (January 2015). "Gene Ontology Consortium: going forward". Nucleic Acids Research. 43 (Database issue): D1049–D1056. doi:10.1093/nar/gku1179. PMC   4383973 . PMID   25428369.
  30. "GREAT Input: Genomic Regions Enrichment of Annotations Tool, Bejerano Lab, Stanford University". bejerano.stanford.edu.
  31. "GREAT improves functional interpretation of cis-regulatory regions" (PDF).
  32. "FunRich :: Download". funrich.org.
  33. Pathan M, Keerthikumar S, Ang CS, Gangoda L, Quek CY, Williamson NA, et al. (August 2015). "FunRich: An open access standalone functional enrichment and interaction network analysis tool". Proteomics. 15 (15): 2597–2601. doi:10.1002/pmic.201400515. PMID   25921073. S2CID   28583044.
  34. Berriz GF, Beaver JE, Cenik C, Tasan M, Roth FP (November 2009). "Next generation software for functional trend analysis". Bioinformatics. 25 (22): 3043–3044. doi:10.1093/bioinformatics/btp498. PMC   2800365 . PMID   19717575.
  35. "List enrichment widgets statistics — InterMine documentation".
  36. Chen J, Bardes EE, Aronow BJ, Jegga AG (July 2009). "ToppGene Suite for gene list enrichment analysis and candidate gene prioritization". Nucleic Acids Research. 37 (Web Server issue): W305–W311. doi:10.1093/nar/gkp427. PMC   2703978 . PMID   19465376.
  37. Yaari G, Bolen CR, Thakar J, Kleinstein SH (October 2013). "Quantitative set analysis for gene expression: a method to quantify gene set differential expression including gene-gene correlations". Nucleic Acids Research. 41 (18): e170. doi:10.1093/nar/gkt660. PMC   3794608 . PMID   23921631.
  38. Turner JA, Bolen CR, Blankenship DM (August 2015). "Quantitative gene set analysis generalized for repeated measures, confounder adjustment, and continuous covariates". BMC Bioinformatics. 16: 272. doi: 10.1186/s12859-015-0707-9 . PMC   4551517 . PMID   26316107.
  39. Avey, Stefan; et al. (August 2017). "Multicohort analysis reveals baseline transcriptional predictors of influenza vaccination responses". Science Immunology. 2 (14): eaal4656. doi:10.1126/sciimmunol.aal4656. PMC   5800877 . PMID   28842433.
  40. Conesa A, Götz S, García-Gómez JM, Terol J, Talón M, Robles M (September 2005). "Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research". Bioinformatics. 21 (18): 3674–3676. doi: 10.1093/bioinformatics/bti610 . PMID   16081474.
  41. "Figure 3: Heatmaps of gene set enrichment analysis (GSEA) of DEGs based on RNAseq data in response to abiotic stresses". www.nature.com. Retrieved 2018-09-05.
  42. 1 2 Holden M, Deng S, Wojnowski L, Kulle B (December 2008). "GSEA-SNP: applying gene set enrichment analysis to SNP data from genome-wide association studies". Bioinformatics. 24 (23): 2784–2785. doi: 10.1093/bioinformatics/btn516 . PMID   18854360.
  43. 1 2 Manuck TA, Watkins S, Esplin MS, Parry S, Zhang H, Huang H, Biggio JR, Bukowski R, Saade G, Andrews W, Baldwin D (2016). "242: Gene set enrichment investigation of maternal exome variation in spontaneous preterm birth (SPTB)". American Journal of Obstetrics and Gynecology. 214 (1): S142–S143. doi: 10.1016/j.ajog.2015.10.280 .
  44. Maruschke M, Hakenberg OW, Koczan D, Zimmermann W, Stief CG, Buchner A (January 2014). "Expression profiling of metastatic renal cell carcinoma using gene set enrichment analysis". International Journal of Urology. 21 (1): 46–51. doi: 10.1111/iju.12183 . PMID   23634695. S2CID   33377555.
  45. 1 2 Hass J, Walton E, Wright C, Beyer A, Scholz M, Turner J, et al. (June 2015). "Associations between DNA methylation and schizophrenia-related intermediate phenotypes - a gene set enrichment analysis". Progress in Neuro-Psychopharmacology & Biological Psychiatry. 59: 31–39. doi:10.1016/j.pnpbp.2015.01.006. PMC   4346504 . PMID   25598502.
  46. 1 2 Elovainio M, Taipale T, Seppälä I, Mononen N, Raitoharju E, Jokela M, et al. (December 2015). "Activated immune-inflammatory pathways are associated with long-standing depressive symptoms: Evidence from gene-set enrichment analyses in the Young Finns Study". Journal of Psychiatric Research. 71: 120–125. doi:10.1016/j.jpsychires.2015.09.017. PMID   26473696.

Further reading