The topic of this article may not meet Wikipedia's general notability guideline .(December 2014) |
Denoising Algorithm based on Relevance network Topology (DART) is an unsupervised algorithm that estimates an activity score for a pathway in a gene expression matrix, following a denoising step. [1] In DART, a weighted average is used where the weights reflect the degree of the nodes in the pruned network. [1] The denoising step removes prior information that is inconsistent with a data set. This strategy substantially improves unsupervised predictions of pathway activity that are based on a prior model, which was learned from a different biological system or context. [1]
Pre-existing methods such as gene set enrichment analysis method attempt to infer. [2] However, it did not construct a structured list of genes. SPIA (Signaling Pathway Impact analysis) [3] is a method that uses the phenotype information to evaluate the pathway activity between two phenotypes. However, it does not identify the pathway gene subset that could be used to differentiate individual samples. [3] CORG is used to identify a relevant gene subset. It is a supervised method, which does not perform as well as DART in analyzing independent data set [1]
Understanding molecular pathway activity is crucial for risk assessment, clinical diagnosis and treatment. Meta-analysis of complex genomic data is often associated with difficulties such as extracting useful information from big data, eliminating confounding factors and providing more sensible interpretation. Different approaches have been taken to highlight the identification of relevant pathway in order to provide better gene expression prediction.
Pearson correlations were first computed between regulatory genes at the level of transcription and a gene expression data set. The correlation coefficient then underwent a Fisher's transform:
Where cij is the correlation coefficient between gene i and j, and where γij is the variable that under the null hypothesis, its mean is zero and standard deviation 1/√n_s-3, where ns is the number of tumor samples. The threshold of p-value was set at 0.0001. Gene pairs with significant correlation will be considered relevant in the network. To predict the activity score in which genes that are nearby are also taken into consideration:
Where ki is the number of neighbors of gene i, zi is the normalized z-score and σi is a binary variable ( i.e 1 means upregulated upon activation and -1 means downregulated). This step is to estimate the activation level, in which sw AV is the activity score. A linear regression model was then applied to estimate the pathway activation levels. Thus, tij and pij denote the t-statistics and p-value associated with, whereas p<0.05 indicates a significance. To assess the consistency in a validation data set D, the performance measure Vij is denoted:
Where S is defined by
S is the threshold function of a given pair of pathways. And where
σij is the score that tells the directionality of a correlation, in which an opposite prediction will be panelized by given a value of -1. tij is the t-statistics of interpathway correlation. The performance measure Vij accounts for the significance of correlation between pathways, the direction of correlation, and the weights in the magnitude of the correlation. A two-tailed paired Wilcoxon test is performed to compare the distribution under hypothesis. Advantages and limitation: DART gives an improved performance and higher accuracy in inferring pathway activity from prior information of pathway databases. Pre-existed information and large database are needed in order for DART to run. In other words, DART requires well-established prior gene expression data to start with, and then it can proceed evaluation of consistency and denoise any irrelevant information.
DART is an algorithm that is applicable and used successfully in Cancer Genomics. The DART algorithm has been shown to be a strong method for estimating the pathway activity and perturbation signature activity in breast and lung cancer gene expression data sets. [1] Imaging traits such as mammography (Mammography is the process of using low-energy X-rays to examine the human breast tissue) plays an important role in cancer tumor diagnosis. Studies have shown that women with increased mammographic density have a higher risk of developing Breast cancer. [4] Estrogen receptor alpha gene 1 encodes Estrogen Receptor-alpha, which is activated by estrogen. Polymorphisms in ESR1 are associated with breast cancer risk through differences in different level of breast density. DART successfully predicted an inverse correlation between ESR1 signaling and MMD. It can be used in simulated and real multidimensional cancer genomic data. It gives more reliable prediction about pathway activation, which would be helpful in association studies.
In statistics, propagation of uncertainty is the effect of variables' uncertainties on the uncertainty of a function based on them. When the variables are the values of experimental measurements they have uncertainties due to measurement limitations which propagate due to the combination of variables in the function.
Osteoprotegerin (OPG), also known as osteoclastogenesis inhibitory factor (OCIF) or tumour necrosis factor receptor superfamily member 11B (TNFRSF11B), is a cytokine receptor of the tumour necrosis factor (TNF) receptor superfamily encoded by the TNFRSF11B gene.
Feature selection is the process of selecting a subset of relevant features for use in model construction. Stylometry and DNA microarray analysis are two cases where feature selection is used. It should be distinguished from feature extraction.
Aromatase, also called estrogen synthetase or estrogen synthase, is an enzyme responsible for a key step in the biosynthesis of estrogens. It is CYP19A1, a member of the cytochrome P450 superfamily, which are monooxygenases that catalyze many reactions involved in steroidogenesis. In particular, aromatase is responsible for the aromatization of androgens into estrogens. The enzyme aromatase can be found in many tissues including gonads, brain, adipose tissue, placenta, blood vessels, skin, and bone, as well as in tissue of endometriosis, uterine fibroids, breast cancer, and endometrial cancer. It is an important factor in sexual development.
Estrogen receptors (ERs) are a group of proteins found inside cells. They are receptors that are activated by the hormone estrogen (17β-estradiol). Two classes of ER exist: nuclear estrogen receptors, which are members of the nuclear receptor family of intracellular receptors, and membrane estrogen receptors (mERs), which are mostly G protein-coupled receptors. This article refers to the former (ER).
The progesterone receptor (PR), also known as NR3C3 or nuclear receptor subfamily 3, group C, member 3, is a protein found inside cells. It is activated by the steroid hormone progesterone.
Receptor tyrosine-protein kinase erbB-2 is a protein that normally resides in the membranes of cells and is encoded by the ERBB2 gene. ERBB is abbreviated from erythroblastic oncogene B, a gene originally isolated from the avian genome. The human protein is also frequently referred to as HER2 or CD340.
In statistics, a rank correlation is any of several statistics that measure an ordinal association—the relationship between rankings of different ordinal variables or different rankings of the same variable, where a "ranking" is the assignment of the ordering labels "first", "second", "third", etc. to different observations of a particular variable. A rank correlation coefficient measures the degree of similarity between two rankings, and can be used to assess the significance of the relation between them. For example, two common nonparametric methods of significance that use rank correlation are the Mann–Whitney U test and the Wilcoxon signed-rank test.
Annexin A1, also known as lipocortin I, is a protein that is encoded by the ANXA1 gene in humans.
The nuclear receptor coactivator 3 also known as NCOA3 is a protein that, in humans, is encoded by the NCOA3 gene. NCOA3 is also frequently called 'amplified in breast 1' (AIB1), steroid receptor coactivator-3 (SRC-3), or thyroid hormone receptor activator molecule 1 (TRAM-1).
Signal transducer and activator of transcription 5A is a protein that in humans is encoded by the STAT5A gene. STAT5A orthologs have been identified in several placentals for which complete genome data are available.
Secreted frizzled-related protein 1, also known as SFRP1, is a protein which in humans is encoded by the SFRP1 gene.
In statistics, an additive model (AM) is a nonparametric regression method. It was suggested by Jerome H. Friedman and Werner Stuetzle (1981) and is an essential part of the ACE algorithm. The AM uses a one-dimensional smoother to build a restricted class of nonparametric regression models. Because of this, it is less affected by the curse of dimensionality than a p-dimensional smoother. Furthermore, the AM is more flexible than a standard linear model, while being more interpretable than a general regression surface at the cost of approximation errors. Problems with AM, like many other machine-learning methods, include model selection, overfitting, and multicollinearity.
DirectHit is a pharmacodiagnostic test used to determine the tumor sensitivity or resistance to drug regimens recommended for the treatment of breast cancer by the National Comprehensive Cancer Network. It is a noninvasive test performed on small amounts of tissue removed during the original surgery lumpectomy, mastectomy, or core biopsy. DirectHit was developed by CCC Diagnostics Inc., a biotechnology company established by former researchers from Johns Hopkins University. DirectHit was launched on 14 January 2010. Currently, it is the only available test for predicting treatment outcomes for anticancer chemotherapy drugs for breast cancer.
Antineoplastic resistance, often used interchangeably with chemotherapy resistance, is the resistance of neoplastic (cancerous) cells, or the ability of cancer cells to survive and grow despite anti-cancer therapies. In some cases, cancers can evolve resistance to multiple drugs, called multiple drug resistance.
Weighted correlation network analysis, also known as weighted gene co-expression network analysis (WGCNA), is a widely used data mining method especially for studying biological networks based on pairwise correlations between variables. While it can be applied to most high-dimensional data sets, it has been most widely used in genomic applications. It allows one to define modules (clusters), intramodular hubs, and network nodes with regard to module membership, to study the relationships between co-expression modules, and to compare the network topology of different networks. WGCNA can be used as a data reduction technique, as a clustering method, as a feature selection method, as a framework for integrating complementary (genomic) data, and as a data exploratory technique. Although WGCNA incorporates traditional data exploratory techniques, its intuitive network language and analysis framework transcend any standard analysis technique. Since it uses network methodology and is well suited for integrating complementary genomic data sets, it can be interpreted as systems biologic or systems genetic data analysis method. By selecting intramodular hubs in consensus modules, WGCNA also gives rise to network based meta analysis techniques.
Growth regulation by estrogen in breast cancer 1 is a protein that in humans is encoded by the GREB1 gene.
Lemur tail kinase 3 is a protein that in humans is encoded by the LMTK3 gene.
E-SCREEN is a cell proliferation assay based on the enhanced proliferation of human breast cancer cells (MCF-7) in the presence of estrogen active substances. The E-SCREEN test is a tool to easily and rapidly assess estrogenic activity of suspected xenoestrogens. This bioassay measures estrogen-induced increase of the number of human breast cancer cell, which is biologically equivalent to the increase of mitotic activity in tissues of the genital tract. It was originally developed by Soto et al. and was included in the first version of the OECD Conceptual Framework for Testing and Assessment of Endocrine Disrupters published in 2012. However, due to failed validation, it was not included in the updated version of the framework published in 2018.
Benita S. Katzenellenbogen née Schulman is an American physiologist and cell biologist at the University of Illinois at Urbana-Champaign. She has studied cancer, endocrinology, and women's health, focusing on nuclear receptors. She also dedicated efforts to focusing on improving the effectiveness of endocrine therapies in breast cancer.