Pathway analysis

Last updated

Pathway resources and types of pathway analysis using databases like KEGG, Reactome and WikiPathways. Fgene-10-01203-g002.jpg
Pathway resources and types of pathway analysis using databases like KEGG, Reactome and WikiPathways.

Pathway is the term from molecular biology for a curated schematic representation of a well characterized segment of the molecular physiological machinery, such as a metabolic pathway describing an enzymatic process within a cell or tissue or a signaling pathway model representing a regulatory process that might, in its turn, enable a metabolic or another regulatory process downstream. A typical pathway model starts with an extracellular signaling molecule that activates a specific receptor, thus triggering a chain of molecular interactions. [2] A pathway is most often represented as a relatively small graph with gene, protein, and/or small molecule nodes connected by edges of known functional relations. While a simpler pathway might appear as a chain, [3] complex pathway topologies with loops and alternative routes are much more common. Computational analyses employ special formats of pathway representation. [4] [5] In the simplest form, however, a pathway might be represented as a list of member molecules with order and relations unspecified. Such a representation, generally called Functional Gene Set (FGS), can also refer to other functionally characterised groups such as protein families, Gene Ontology (GO) and Disease Ontology (DO) terms etc. In bioinformatics, methods of pathway analysis might be used to identify key genes/ proteins within a previously known pathway in relation to a particular experiment / pathological condition or building a pathway de novo from proteins that have been identified as key affected elements. By examining changes in e.g. gene expression in a pathway, its biological activity can be explored. However most frequently, pathway analysis refers to a method of initial characterization and interpretation of an experimental (or pathological) condition that was studied with omics tools or genome-wide association study. [6] Such studies might identify long lists of altered genes. A visual inspection is then challenging and the information is hard to summarize, since the altered genes map to a broad range of pathways, processes, and molecular functions (with a large gene fraction lacking any annotation). In such situations, the most productive way of exploring the list is to identify enrichment of specific FGSs in it. The general approach of enrichment analyses is to identify FGSs, members of which were most frequently or most strongly altered in the given condition, in comparison to a gene set sampled by chance. In other words, enrichment can map canonical prior knowledge structured in the form of FGSs to the condition represented by altered genes.

Contents

Use

The data for pathway analysis come from high throughput biology. This includes high throughput sequencing data and microarray data. Before pathway analysis can be done, each gene's alteration should be evaluated using the omics dataset in either quantitative (differential expression analysis) or qualitative (detection of somatic point mutations or mapping neighbor genes to a disease-associated SNP). It is also possible to combine datasets from different research groups or multiple omics platform with a meta-analysis and cross-platform regularization. [7] [8] Further, a list where gene identifiers are accompanied by the alteration attributes is subjected to a pathway analysis. By using pathway analysis software, researchers can determine which FGSs are enriched with the altered experimental genes [9] [10] For example, pathway analysis of several independent microarray experiments (meta-analysis) helped to discover potential biomarkers in a single pathway important for fast-to-slow switch fiber type transition in Duchenne muscular dystrophy. [11] In another study meta-analysis identified two biomarkers in blood of patients with Parkinson's disease, which can be useful for monitoring the disease. [12] Candidate gene alleles causative of Alzheimer's disease and elderly dementia where first discovered via genome-wide association study and further validated with network enrichment analysis against FGS consisting of known Alzheimer's genes. [13] [14]

Databases

Pathway collections and interaction networks constitute the knowledge base required for a pathway analysis. Pathway content, structure, format, and functionality vary between different database resources such as KEGG, [15] WikiPathways, or Reactome. [16] Also exist proprietary pathways collections used by e.g. Pathway Studio [17] and Ingenuity Pathway Analysis [18] tools. Public online tools can provide pre-compiled and ready-to-go menus of pathways and networks from different open sources (e.g. EviNet).

Methods and software

Pathway analysis software can be found in the form of desktop programs, web-based applications, or packages coded in such languages as R and Python and shared openly through the BioConductor [19] and GitHub [20] projects. The methodology of pathway analysis evolves fast and the classification is still discussable, [21] [22] with the following main categories of pathway enrichment analysis applicable to high-throughput data: [21]

Over-representation analysis (ORA)

This method measures the overlap between, on the one hand, a set of genes (or proteins) in an FGS and, on the other hand, a list of most altered genes generally called Altered Gene Sets (AGS). A typical AGS example is a list of top N differentially expressed genes from an RNA-Seq assay. The basic assumption behind ORA is that a biologically relevant pathway can be identified by excess of AGS genes in it compared to the number expected by chance. The aim of ORA is to identify such enriched pathways, judging by statistical significance of the overlap between FGS and AGS as determined either by an appropriate statistic, such as Jaccard index or by a statistical test producing p-values (Fisher's exact test or the test using hypergeometric distribution).

Functional class scoring (FCS)

This method identifies FGS by considering their relative positions in the full list of genes studied in the experiment. This full list should be therefore ranked in advance by a statistic (such as mRNA expression fold-change, Student's t-test etc.) or a p-value - while watching the direction of fold change, since p-values are non-directional. Thus FCS takes into account every FGS gene regardless of its statistical significance and does not require pre-compiled AGS. One of the first and most popular methods deploying the FCS approach was the Gene Set Enrichment Analysis (GSEA). [10]

Pathway topology analysis (PTA)

Similarly to FCS, PTA accounts for high-throughput data for every FGS gene. [23] In addition, specific topological information is used about role, position, and interaction directions of the pathway genes. This requires additional input data from a pathway database in a pre-specified format, such as KEGG Markup Language (KGML). Using this information, PTA estimates a pathway significance by considering how much each individual gene alteration might have affected the whole pathway. Multiple alteration types can be used in parallel (somatic copy-number variations, point mutations etc.) when available. [21] The set of PTA methods includes the Impact Analysis, [24] [25] EnrichNet, [26] GGEA, [27] and TopoGSA. [28]

Network enrichment analysis (NEA)

Network enrichment analysis (NEA) has been an extension of gene-set enrichment analysis to the domain of global gene networks [29] [30] [31] [32] The major principle of NEA can be understood in comparison with ORA, where enrichment of FGS in genes of the AGS is determined by how many genes are directly shared by AGS and FGS. In NEA, on the contrary, the global network is searched for network edges that connect any genes of AGS with any genes of FGS. Since enrichment significance is influenced by the highly variable node degrees of individual AGS and FGS genes, it should be determined by a dedicated statistical test, which compares the observed number of network edges to the number expected by chance in the same network context. Some valuable properties of NEA are that:

  1. it is more robust to biological and technical variability between sample replicates; [8] [33]
  2. AGS genes may not necessarily be annotated as pathway members; [34]
  3. FGS members do not have to be altered themselves, but still are accounted for due to possessing network links to AGS genes. [35]

Commercial solutions

Beyond open-source tools, such as STRING or Cytoscape, a number of companies sell licensed software products to analyse gene sets. While most of the publicly available solutions use online and public pathway collections, the commercial products mostly promote own, proprietary pathways and networks. The choice of such products might be driven by customers' skills, financial and time resources, and needs. [6] Ingenuity, for example, maintains a knowledge base for comparative analysis of gene expression data. [36] Pathways Studio [37] is commercial software which allows searching for biologically relevant facts, analyze experiments, and create pathways. Pathways Studio Viewer [38] is a free resource from the same company for presenting the Pathway Studio interactive pathway collection and database. Two commercial solutions offer PTA: iPathwayGuide from Advaita Corporation and MetaCore from Thomson Reuters. [39] Advaita uses the peer reviewed Impact Analysis method [24] [25] while the MetaCore method is unpublished. [39] Correlation Engine uses the Running Fisher algorithm for gene set enrichment within its Pathway Enrichment application. [40]

Limitations

Lack of annotations

Application of pathway analysis methods depends on annotations found in existing databases, such as gene set membership in pathways, pathway topology, presence of genes in the global network etc. These annotations, however, are far from being complete and have highly variable degrees of confidence. In addition, such information is usually general, i.e. deprived of e.g. cell type, compartment, or developmental context. Therefore, interpretation of pathway analysis results for omics datasets should be done with caution [22] Partially, the problem can be addressed by analysing larger gene sets in a more, such as big pathway collections or global interaction networks. [41]

See also

Related Research Articles

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The process of analyzing and interpreting data can sometimes be referred to as computational biology, however this distinction between the two terms is often disputed. To some, the term computational biology refers to building and using models of biological systems.

<span class="mw-page-title-main">Systems biology</span> Computational and mathematical modeling of complex biological systems

Systems biology is the computational and mathematical analysis and modeling of complex biological systems. It is a biology-based interdisciplinary field of study that focuses on complex interactions within biological systems, using a holistic approach to biological research.

<span class="mw-page-title-main">Omics</span> Suffix in biology

The branches of science known informally as omics are various disciplines in biology whose names end in the suffix -omics, such as genomics, proteomics, metabolomics, metagenomics, phenomics and transcriptomics. Omics aims at the collective characterization and quantification of pools of biological molecules that translate into the structure, function, and dynamics of an organism or organisms.

<span class="mw-page-title-main">Functional genomics</span> Field of molecular biology

Functional genomics is a field of molecular biology that attempts to describe gene functions and interactions. Functional genomics make use of the vast data generated by genomic and transcriptomic projects. Functional genomics focuses on the dynamic aspects such as gene transcription, translation, regulation of gene expression and protein–protein interactions, as opposed to the static aspects of the genomic information such as DNA sequence or structures. A key characteristic of functional genomics studies is their genome-wide approach to these questions, generally involving high-throughput methods rather than a more traditional "candidate-gene" approach.

The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The term transcriptome is a portmanteau of the words transcript and genome; it is associated with the process of transcript production during the biological process of transcription.

The candidate gene approach to conducting genetic association studies focuses on associations between genetic variation within pre-specified genes of interest, and phenotypes or disease states. This is in contrast to genome-wide association studies (GWAS), which is a hypothesis-free approach that scans the entire genome for associations between common genetic variants and traits of interest. Candidate genes are most often selected for study based on a priori knowledge of the gene's biological functional impact on the trait or disease in question. The rationale behind focusing on allelic variation in specific, biologically relevant regions of the genome is that certain alleles within a gene may directly impact the function of the gene in question and lead to variation in the phenotype or disease state being investigated. This approach often uses the case-control study design to try to answer the question, "Is one allele of a candidate gene more frequently seen in subjects with the disease than in subjects without the disease?" Candidate genes hypothesized to be associated with complex traits have generally not been replicated by subsequent GWASs or highly powered replication attempts. The failure of candidate gene studies to shed light on the specific genes underlying such traits has been ascribed to insufficient statistical power, low prior probability that scientists can correctly guess a specific allele within a specific gene that is related to a trait, poor methodological practices, and data dredging.

<span class="mw-page-title-main">Metabolic network modelling</span> Form of biological modelling

Metabolic network modelling, also known as metabolic network reconstruction or metabolic pathway analysis, allows for an in-depth insight into the molecular mechanisms of a particular organism. In particular, these models correlate the genome with molecular physiology. A reconstruction breaks down metabolic pathways into their respective reactions and enzymes, and analyzes them within the perspective of the entire network. In simplified terms, a reconstruction collects all of the relevant metabolic information of an organism and compiles it in a mathematical model. Validation and analysis of reconstructions can allow identification of key features of metabolism such as growth yield, resource distribution, network robustness, and gene essentiality. This knowledge can then be applied to create novel biotechnology.

<span class="mw-page-title-main">KEGG</span> Collection of bioinformatics databases

KEGG is a collection of databases dealing with genomes, biological pathways, diseases, drugs, and chemical substances. KEGG is utilized for bioinformatics research and education, including data analysis in genomics, metagenomics, metabolomics and other omics studies, modeling and simulation in systems biology, and translational research in drug development.

<span class="mw-page-title-main">Gene expression profiling</span> Detection of mRNA molecules

In the field of molecular biology, gene expression profiling is the measurement of the activity of thousands of genes at once, to create a global picture of cellular function. These profiles can, for example, distinguish between cells that are actively dividing, or show how the cells react to a particular treatment. Many experiments of this sort measure an entire genome simultaneously, that is, every gene present in a particular cell.

Reactome is a free online database of biological pathways. It is manually curated and authored by PhD-level biologists, in collaboration with Reactome editorial staff. The content is cross-referenced to many bioinformatics databases. The rationale behind Reactome is to visually represent biological pathways in full mechanistic detail, while making the source data available in a computationally accessible format.

<span class="mw-page-title-main">Microarray analysis techniques</span>

Microarray analysis techniques are used in interpreting the data generated from experiments on DNA, RNA, and protein microarrays, which allow researchers to investigate the expression state of a large number of genes – in many cases, an organism's entire genome – in a single experiment. Such experiments can generate very large amounts of data, allowing researchers to assess the overall state of a cell or organism. Data in such large quantities is difficult – if not impossible – to analyze without the help of computer programs.

<span class="mw-page-title-main">MicrobesOnline</span>

MicrobesOnline is a publicly and freely accessible website that hosts multiple comparative genomic tools for comparing microbial species at the genomic, transcriptomic and functional levels. MicrobesOnline was developed by the Virtual Institute for Microbial Stress and Survival, which is based at the Lawrence Berkeley National Laboratory in Berkeley, California. The site was launched in 2005, with regular updates until 2011.

DAVID is a free online bioinformatics resource developed by the Laboratory of Human Retrovirology and Immunoinformatics. All tools in the DAVID Bioinformatics Resources aim to provide functional interpretation of large lists of genes derived from genomic studies, e.g. microarray and proteomics studies. DAVID can be found at https://david.ncifcrf.gov/

<span class="mw-page-title-main">RNA-Seq</span> Lab technique in cellular biology

RNA-Seq is a technique that uses next-generation sequencing to reveal the presence and quantity of RNA molecules in a biological sample, providing a snapshot of gene expression in the sample, also known as transcriptome.

<span class="mw-page-title-main">Gene set enrichment analysis</span> Bioinformatics method

Gene set enrichment analysis (GSEA) (also called functional enrichment analysis or pathway enrichment analysis) is a method to identify classes of genes or proteins that are over-represented in a large set of genes or proteins, and may have an association with different phenotypes (e.g. different organism growth patterns or diseases). The method uses statistical approaches to identify significantly enriched or depleted groups of genes. Transcriptomics technologies and proteomics results often identify thousands of genes, which are used for the analysis.

<span class="mw-page-title-main">Sorin Draghici</span> Researcher

Sorin Drăghici is a Romanian-American computer scientist and a program director in the Division of Information and Intelligent Systems (IIS) of the Directorate for Computer and Information Science and Engineering (CISE) at the National Science Foundation (NSF). Previous positions include: Associate Dean for Entrepreneurship and Innovation of Wayne State University's College of Engineering, the Director of the Bioinformatics and Biostatistics Core at Karmanos Cancer Institute, and the Director of the James and Patricia Anderson Engineering Ventures Institute. Draghici was elected a Fellow of the Institute of Electrical and Electronics Engineers (IEEE) in 2022, for contributions to the analysis of high-throughput genomics and proteomics data. He has also been elected a Fellow of the Asia-Pacific Artificial Intelligence Association (AAIA).

Machine learning in bioinformatics is the application of machine learning algorithms to bioinformatics, including genomics, proteomics, microarrays, systems biology, evolution, and text mining.

Transcriptomics technologies are the techniques used to study an organism's transcriptome, the sum of all of its RNA transcripts. The information content of an organism is recorded in the DNA of its genome and expressed through transcription. Here, mRNA serves as a transient intermediary molecule in the information network, whilst non-coding RNAs perform additional diverse functions. A transcriptome captures a snapshot in time of the total transcripts present in a cell. Transcriptomics technologies provide a broad account of which cellular processes are active and which are dormant. A major challenge in molecular biology is to understand how a single genome gives rise to a variety of cells. Another is how gene expression is regulated.

References

  1. Mubeen S, Hoyt CT, Gemünd A, Hofmann-Apitius M, Fröhlich H, Domingo-Fernández D (2019). "The Impact of Pathway Database Choice on Statistical Enrichment Analysis and Predictive Modeling". Frontiers in Genetics. 10: 1203. doi: 10.3389/fgene.2019.01203 . PMC   6883970 . PMID   31824580.
  2. Berg JM, Tymoczko JL, Stryer L (2002). Biochemistry (5th ed.). New York: W.H. Freeman. ISBN   978-0-7167-3051-4.
  3. Ohlrogge J, Browse J (July 1995). "Lipid biosynthesis". The Plant Cell. 7 (7): 957–70. doi: 10.1105/tpc.7.7.957 . PMC   160893 . PMID   7640528. S2CID   219201001.
  4. "Main Page - SBML.caltech.edu". sbml.org.
  5. "KGML (KEGG Markup Language)". www.genome.jp.
  6. 1 2 García-Campos MA, Espinal-Enríquez J, Hernández-Lemus E (2015). "Pathway Analysis: State of the Art". Frontiers in Physiology. 6: 383. doi: 10.3389/fphys.2015.00383 . PMC   4681784 . PMID   26733877.
  7. Walsh CJ, Hu P, Batt J, Santos CC (August 2015). "Microarray Meta-Analysis and Cross-Platform Normalization: Integrative Genomics for Robust Biomarker Discovery". Microarrays. 4 (3): 389–406. doi: 10.3390/microarrays4030389 . PMC   4996376 . PMID   27600230.
  8. 1 2 Suo C, Hrydziuszko O, Lee D, Pramana S, Saputra D, Joshi H, et al. (August 2015). "Integration of somatic mutation, expression and functional data reveals potential driver genes predictive of breast cancer survival". Bioinformatics. 31 (16): 2607–13. doi: 10.1093/bioinformatics/btv164 . PMID   25810432.
  9. Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM (July 1999). "Systematic determination of genetic network architecture". Nature Genetics. 22 (3): 281–5. doi:10.1038/10343. PMID   10391217. S2CID   14688842.
  10. 1 2 Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. (October 2005). "Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles". Proceedings of the National Academy of Sciences of the United States of America. 102 (43): 15545–50. Bibcode:2005PNAS..10215545S. doi: 10.1073/pnas.0506580102 . PMC   1239896 . PMID   16199517.
  11. Kotelnikova E, Shkrob MA, Pyatnitskiy MA, Ferlini A, Daraselia N (February 2012). "Novel approach to meta-analysis of microarray datasets reveals muscle remodeling-related drug targets and biomarkers in Duchenne muscular dystrophy". PLOS Computational Biology. 8 (2): e1002365. Bibcode:2012PLSCB...8E2365K. doi: 10.1371/journal.pcbi.1002365 . PMC   3271016 . PMID   22319435.
  12. Santiago JA, Potashkin JA (February 2015). "Network-based metaanalysis identifies HNF4A and PTBP1 as longitudinally dynamic biomarkers for Parkinson's disease". Proceedings of the National Academy of Sciences of the United States of America. 112 (7): 2257–62. Bibcode:2015PNAS..112.2257S. doi: 10.1073/pnas.1423573112 . PMC   4343174 . PMID   25646437.
  13. Reynolds CA, Hong MG, Eriksson UK, Blennow K, Wiklund F, Johansson B, et al. (May 2010). "Analysis of lipid pathway genes indicates association of sequence variation near SREBF1/TOM1L2/ATPAF2 with dementia risk". Human Molecular Genetics. 19 (10): 2068–78. doi:10.1093/hmg/ddq079. PMC   2860895 . PMID   20167577.
  14. Bennet AM, Reynolds CA, Eriksson UK, Hong MG, Blennow K, Gatz M, et al. (1 January 2011). "Genetic association of sequence variants near AGER/NOTCH4 and dementia". Journal of Alzheimer's Disease. 24 (3): 475–84. doi:10.3233/jad-2011-101848. PMC   3477600 . PMID   21297263.
  15. Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M (January 1999). "KEGG: Kyoto Encyclopedia of Genes and Genomes". Nucleic Acids Research. 27 (1): 29–34. doi:10.1093/nar/27.1.29. PMC   148090 . PMID   9847135.
  16. Vastrik I, D'Eustachio P, Schmidt E, Joshi-Tope G, Gopinath G, Croft D, et al. (2007). "Reactome: a knowledge base of biologic pathways and processes". Genome Biology. 8 (3): R39. doi: 10.1186/gb-2007-8-3-r39 . PMC   1868929 . PMID   17367534.
  17. Pathway Studio Pathways
  18. Pathway Central
  19. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, et al. (2004). "Bioconductor: open software development for computational biology and bioinformatics". Genome Biology. 5 (10): R80. doi: 10.1186/gb-2004-5-10-r80 . PMC   545600 . PMID   15461798.
  20. Dabbish L, Stuart C, Tsay J, Herbsleb J (February 2012). "Social coding in GitHub: transparency and collaboration in an open software repository." (PDF). Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work. New York: Association for Computing Machinery. pp. 1277–1286. doi:10.1145/2145204.21453 (inactive 1 November 2024).{{cite book}}: CS1 maint: DOI inactive as of November 2024 (link)
  21. 1 2 3 Khatri P, Sirota M, Butte AJ (23 February 2012). "Ten years of pathway analysis: current approaches and outstanding challenges". PLOS Computational Biology. 8 (2): e1002375. Bibcode:2012PLSCB...8E2375K. doi: 10.1371/journal.pcbi.1002375 . PMC   3285573 . PMID   22383865.
  22. 1 2 Henderson-Maclennan NK, Papp JC, Talbot CC, McCabe ER, Presson AP (2010). "Pathway analysis software: annotation errors and solutions". Molecular Genetics and Metabolism. 101 (2–3): 134–40. doi:10.1016/j.ymgme.2010.06.005. PMC   2950253 . PMID   20663702.
  23. Emmert-Streib F, Dehmer M (May 2011). "Networks for systems biology: conceptual connection of data and function". IET Systems Biology. 5 (3): 185–207. doi:10.1049/iet-syb.2010.0025. PMID   21639592.
  24. 1 2 Draghici S, Khatri P, Tarca AL, Amin K, Done A, Voichita C, et al. (October 2007). "A systems biology approach for pathway level analysis". Genome Research. 17 (10): 1537–45. doi:10.1101/gr.6202607. PMC   1987343 . PMID   17785539.
  25. 1 2 Tarca AL, Draghici S, Khatri P, Hassan SS, Mittal P, Kim JS, et al. (January 2009). "A novel signaling pathway impact analysis". Bioinformatics. 25 (1): 75–82. doi:10.1093/bioinformatics/btn577. PMC   2732297 . PMID   18990722.
  26. Glaab E, Baudot A, Krasnogor N, Schneider R, Valencia A (September 2012). "EnrichNet: network-based gene set enrichment analysis". Bioinformatics. 28 (18): i451–i457. doi:10.1093/bioinformatics/bts389. PMC   3436816 . PMID   22962466.
  27. Geistlinger L, Csaba G, Küffner R, Mulder N, Zimmer R (July 2011). "From sets to graphs: towards a realistic enrichment analysis of transcriptomic systems". Bioinformatics. 27 (13): i366-73. doi:10.1093/bioinformatics/btr228. PMC   3117393 . PMID   21685094.
  28. Glaab E, Baudot A, Krasnogor N, Valencia A (May 2010). "TopoGSA: network topological gene set analysis". Bioinformatics. 26 (9): 1271–2. doi:10.1093/bioinformatics/btq131. PMC   2859135 . PMID   20335277.
  29. Shojaie A, Michailidis G (22 May 2010). "Network enrichment analysis in complex experiments". Statistical Applications in Genetics and Molecular Biology. 9 (1): Article22. doi:10.2202/1544-6115.1483. PMC   2898649 . PMID   20597848.
  30. Huttenhower C, Haley EM, Hibbs MA, Dumeaux V, Barrett DR, Coller HA, et al. (June 2009). "Exploring the human genome with functional maps". Genome Research. 19 (6): 1093–106. doi: 10.1101/gr.082214.108 . PMC   2694471 . PMID   19246570.
  31. Alexeyenko A, Lee W, Pernemalm M, Guegan J, Dessen P, Lazar V, et al. (September 2012). "Network enrichment analysis: extension of gene-set enrichment analysis to gene networks". BMC Bioinformatics. 13: 226. doi: 10.1186/1471-2105-13-226 . PMC   3505158 . PMID   22966941.
  32. Signorelli M, Vinciotti V, Wit EC (September 2016). "NEAT: an efficient network enrichment analysis test". BMC Bioinformatics. 17 (1): 352. arXiv: 1604.01210 . doi: 10.1186/s12859-016-1203-6 . PMC   5011912 . PMID   27597310. S2CID   2274758.
  33. Jeggari A, Alexeyenko A (March 2017). "NEArender: an R package for functional interpretation of 'omics' data via network enrichment analysis". BMC Bioinformatics. 18 (Suppl 5): 118. doi: 10.1186/s12859-017-1534-y . PMC   5374688 . PMID   28361684.
  34. Hong MG, Alexeyenko A, Lambert JC, Amouyel P, Prince JA (October 2010). "Genome-wide pathway analysis implicates intracellular transmembrane protein transport in Alzheimer disease". Journal of Human Genetics. 55 (10): 707–9. doi: 10.1038/jhg.2010.92 . PMID   20668461. S2CID   27020289.
  35. Jeggari A, Alekseenko Z, Petrov I, Dias JM, Ericson J, Alexeyenko A (July 2018). "EviNet: a web platform for network enrichment analysis with flexible definition of gene sets". Nucleic Acids Research. 46 (W1): W163–W170. doi:10.1093/nar/gky485. PMC   6030852 . PMID   29893885.
  36. "Ingenuity IPA - Integrate and Understand Complex 'omics Data". Ingenuity. 8 April 2015.
  37. Pathway Studio
  38. Pathway Studio Viewer
  39. 1 2 Mitrea C, Taghavi Z, Bokanizad B, Hanoudi S, Tagett R, Donato M, et al. (October 2013). "Methods and approaches in the topology-based analysis of biological pathways". Frontiers in Physiology. 4: 278. doi: 10.3389/fphys.2013.00278 . PMC   3794382 . PMID   24133454.
  40. Kupershmidt I, Su QJ, Grewal A, Sundaresh S, Halperin I, Flynn J, et al. (September 2010). Aziz RK (ed.). "Ontology-based meta-analysis of global collections of high-throughput public data". PLOS ONE. 5 (9): e13066. Bibcode:2010PLoSO...513066K. doi: 10.1371/journal.pone.0013066 . PMC   2947508 . PMID   20927376.
  41. Franco M, Jeggari A, Peuget S, Böttger F, Selivanova G, Alexeyenko A (February 2019). "Prediction of response to anti-cancer drugs becomes robust via network integration of molecular data". Scientific Reports. 9 (1): 2379. Bibcode:2019NatSR...9.2379F. doi:10.1038/s41598-019-39019-2. PMC   6382934 . PMID   30787419.