Expression quantitative trait loci

Last updated

Expression quantitative trait loci (eQTLs) are genomic loci that explain variation in expression levels of mRNAs. [1] [2]

Contents

Distant and local, trans- and cis-eQTLs, respectively

An expression quantitative trait is an amount of an mRNA transcript or a protein. These are usually the product of a single gene with a specific chromosomal location. This distinguishes expression quantitative traits from most complex traits, which are not the product of the expression of a single gene. Chromosomal loci that explain variance in expression traits are called eQTLs. eQTLs located near the gene-of-origin (gene which produces the transcript or protein) are referred to as local eQTLs or cis-eQTLs. By contrast, those located distant from their gene of origin, often on different chromosomes, are referred to as distant eQTLs or trans-eQTLs. [3] [4] The first genome-wide study of gene expression was carried out in yeast and published in 2002. [5] The initial wave of eQTL studies employed microarrays to measure genome-wide gene expression; more recent studies have employed massively parallel RNA sequencing. Many expression QTL studies were performed in plants and animals, including humans, [6] non-human primates [7] [8] and mice. [9]

Some cis eQTLs are detected in many tissue types but the majority of trans eQTLs are tissue-dependent (dynamic). [10] eQTLs may act in cis (locally) or trans (at a distance) to a gene. [11] The abundance of a gene transcript is directly modified by polymorphism in regulatory elements. Consequently, transcript abundance might be considered as a quantitative trait that can be mapped with considerable power. These have been named expression QTLs (eQTLs). [12] The combination of whole-genome genetic association studies and the measurement of global gene expression allows the systematic identification of eQTLs. By assaying gene expression and genetic variation simultaneously on a genome-wide basis in a large number of individuals, statistical genetic methods can be used to map the genetic factors that underpin individual differences in quantitative levels of expression of many thousands of transcripts. [13] Studies have shown that single nucleotide polymorphisms (SNPs) reproducibly associated with complex disorders [14] as well as certain pharmacologic phenotypes [15] are found to be significantly enriched for eQTLs, relative to frequency-matched control SNPs. The integration of eQTLs with GWAS has led to development of the transcriptome-wide association study (TWAS) methodology. [16] [17]

Detecting eQTLs

Mapping eQTLs is done using standard QTL mapping methods that test the linkage between variation in expression and genetic polymorphisms. The only considerable difference is that eQTL studies can involve a million or more expression microtraits. Standard gene mapping software packages can be used, although it is often faster to use custom code such as QTL Reaper or the web-based eQTL mapping system GeneNetwork. GeneNetwork hosts many large eQTL mapping data sets and provide access to fast algorithms to map single loci and epistatic interactions. As is true in all QTL mapping studies, the final steps in defining DNA variants that cause variation in traits are usually difficult and require a second round of experimentation. This is especially the case for trans eQTLs that do not benefit from the strong prior probability that relevant variants are in the immediate vicinity of the parent gene. Statistical, graphical, and bioinformatic methods are used to evaluate positional candidate genes and entire systems of interactions. [18] [19] The development of single cell technologies, and parallel advances in statistical methods has made it possible to define even subtle changes in eQTLs as cell-states change. [20] [21]

See also

Related Research Articles

Genomic imprinting is an epigenetic phenomenon that causes genes to be expressed or not, depending on whether they are inherited from the mother or the father. Genes can also be partially imprinted. Partial imprinting occurs when alleles from both parents are differently expressed rather than complete expression and complete suppression of one parent's allele. Forms of genomic imprinting have been demonstrated in fungi, plants and animals. In 2014, there were about 150 imprinted genes known in mice and about half that in humans. As of 2019, 260 imprinted genes have been reported in mice and 228 in humans.

<span class="mw-page-title-main">Single-nucleotide polymorphism</span> Single nucleotide in genomic DNA at which different sequence alternatives exist

In genetics and bioinformatics, a single-nucleotide polymorphism is a germline substitution of a single nucleotide at a specific position in the genome that is present in a sufficiently large fraction of considered population.

A quantitative trait locus (QTL) is a locus that correlates with variation of a quantitative trait in the phenotype of a population of organisms. QTLs are mapped by identifying which molecular markers correlate with an observed trait. This is often an early step in identifying the actual genes that cause the trait variation.

<span class="mw-page-title-main">Functional genomics</span> Field of molecular biology

Functional genomics is a field of molecular biology that attempts to describe gene functions and interactions. Functional genomics make use of the vast data generated by genomic and transcriptomic projects. Functional genomics focuses on the dynamic aspects such as gene transcription, translation, regulation of gene expression and protein–protein interactions, as opposed to the static aspects of the genomic information such as DNA sequence or structures. A key characteristic of functional genomics studies is their genome-wide approach to these questions, generally involving high-throughput methods rather than a more traditional "candidate-gene" approach.

The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The term transcriptome is a portmanteau of the words transcript and genome; it is associated with the process of transcript production during the biological process of transcription.

Genetic architecture is the underlying genetic basis of a phenotypic trait and its variational properties. Phenotypic variation for quantitative traits is, at the most basic level, the result of the segregation of alleles at quantitative trait loci (QTL). Environmental factors and other external influences can also play a role in phenotypic variation. Genetic architecture is a broad term that can be described for any given individual based on information regarding gene and allele number, the distribution of allelic and mutational effects, and patterns of pleiotropy, dominance, and epistasis.

A polygene is a member of a group of non-epistatic genes that interact additively to influence a phenotypic trait, thus contributing to multiple-gene inheritance, a type of non-Mendelian inheritance, as opposed to single-gene inheritance, which is the core notion of Mendelian inheritance. The term "monozygous" is usually used to refer to a hypothetical gene as it is often difficult to distinguish the effect of an individual gene from the effects of other genes and the environment on a particular phenotype. Advances in statistical methodology and high throughput sequencing are, however, allowing researchers to locate candidate genes for the trait. In the case that such a gene is identified, it is referred to as a quantitative trait locus (QTL). These genes are generally pleiotropic as well. The genes that contribute to type 2 diabetes are thought to be mostly polygenes. In July 2016, scientists reported identifying a set of 355 genes from the last universal common ancestor (LUCA) of all organisms living on Earth.

Marker assisted selection or marker aided selection (MAS) is an indirect selection process where a trait of interest is selected based on a marker linked to a trait of interest, rather than on the trait itself. This process has been extensively researched and proposed for plant- and animal- breeding.

<span class="mw-page-title-main">Genome-wide association study</span> Study of genetic variants in different individuals

In genomics, a genome-wide association study, is an observational study of a genome-wide set of genetic variants in different individuals to see if any variant is associated with a trait. GWA studies typically focus on associations between single-nucleotide polymorphisms (SNPs) and traits like major human diseases, but can equally be applied to any other genetic variants and any other organisms.

<span class="mw-page-title-main">RNA-Seq</span> Lab technique in cellular biology

RNA-Seq is a sequencing technique that uses next-generation sequencing (NGS) to reveal the presence and quantity of RNA in a biological sample, representing an aggregated snapshot of the cells' dynamic pool of RNAs, also known as transcriptome.

In genetics, association mapping, also known as "linkage disequilibrium mapping", is a method of mapping quantitative trait loci (QTLs) that takes advantage of historic linkage disequilibrium to link phenotypes to genotypes, uncovering genetic associations.

GeneNetwork is a combined database and open-source bioinformatics data analysis software resource for systems genetics. This resource is used to study gene regulatory networks that link DNA sequence differences to corresponding differences in gene and protein expression and to variation in traits such as health and disease risk. Data sets in GeneNetwork are typically made up of large collections of genotypes and phenotypes from groups of individuals, including humans, strains of mice and rats, and organisms as diverse as Drosophila melanogaster, Arabidopsis thaliana, and barley. The inclusion of genotypes makes it practical to carry out web-based gene mapping to discover those regions of genomes that contribute to differences among individuals in mRNA, protein, and metabolite levels, as well as differences in cell function, anatomy, physiology, and behavior.

A recombinant inbred strain or recombinant inbred line (RIL) is an organism with chromosomes that incorporate an essentially permanent set of recombination events between chromosomes inherited from two or more inbred strains. F1 and F2 generations are produced by intercrossing the inbred strains; pairs of the F2 progeny are then mated to establish inbred strains through long-term inbreeding.

Molecular breeding is the application of molecular biology tools, often in plant breeding and animal breeding. In the broad sense, molecular breeding can be defined as the use of genetic manipulation performed at the level of DNA to improve traits of interest in plants and animals, and it may also include genetic engineering or gene manipulation, molecular marker-assisted selection, and genomic selection. More often, however, molecular breeding implies molecular marker-assisted breeding (MAB) and is defined as the application of molecular biotechnologies, specifically molecular markers, in combination with linkage maps and genomics, to alter and improve plant or animal traits on the basis of genotypic assays.

<span class="mw-page-title-main">Michael Goddard</span>

Michael Edward "Mike" Goddard is a professorial fellow in animal genetics at the University of Melbourne, Australia.

<span class="mw-page-title-main">Complex traits</span>

Complex traits, also known as quantitative traits, are traits that do not behave according to simple Mendelian inheritance laws. More specifically, their inheritance cannot be explained by the genetic segregation of a single gene. Such traits show a continuous range of variation and are influenced by both environmental and genetic factors. Compared to strictly Mendelian traits, complex traits are far more common, and because they can be hugely polygenic, they are studied using statistical techniques such as quantitative genetics and quantitative trait loci (QTL) mapping rather than classical genetics methods. Examples of complex traits include height, circadian rhythms, enzyme kinetics, and many diseases including diabetes and Parkinson's disease. One major goal of genetic research today is to better understand the molecular mechanisms through which genetic variants act to influence complex traits.

SoyBase is a database created by the United States Department of Agriculture. It contains genetic information about soybeans. It includes genetic maps, information about Mendelian genetics and molecular data regarding genes and sequences. It was started in 1990 and is freely available to individuals and organizations worldwide.

<span class="mw-page-title-main">NAALADL2</span>

N-Acetylated Alpha-Linked Acidic Dipeptidase Like 2 (NAALADL2) is a protein, encoded by the gene NAALADL2 in humans. NAALADL2 shares 25%–26% sequence identity and 45% sequence similarity with the glutamate carboxypeptidase II family which includes prostate cancer marker PSMA (FOLH1/NAALAD1). The NAALADL2 gene is a giant gene spanning 1.37 Mb which is approximately 49 times larger than the average gene size of 28 kb. Gene length is correlated with the number of transcript variants of a gene, as such, NAALADL2 undergoes extensive alternative splicing and has 12 splice variants as defined by Ensembl.

Transcriptome-wide association study (TWAS) is a genetic methodology that can be used to compare the genetic components of gene expression and the genetic components of a trait to determine if an association is present between the two components. TWAS are useful for the identification and prioritization of candidate causal genes in candidate gene analysis following genome-wide association studies. TWAS looks at the RNA products of a specific tissue and gives researchers the abilities to look at the genes being expressed as well as gene expression levels, which varies by tissue type. TWAS are valuable and flexible bioinformatics tools that looks at the associations between the expressions of genes and complex traits and diseases. By looking at the association between gene expression and the trait expressed, genetic regulatory mechanisms can be investigated for the role that they play in the development of specific traits and diseases.

Eric R. Gamazon is a statistical geneticist in Vanderbilt University, with faculty affiliations in the Division of Genetic Medicine, Data Science Institute, and Center for Precision Medicine. He is a Life Member of Clare Hall, Cambridge University after election to a Visiting Fellowship (2018).

References

  1. Rockman MV, Kruglyak L (November 2006). "Genetics of global gene expression". Nature Reviews. Genetics. 7 (11): 862–72. doi:10.1038/nrg1964. PMID   17047685. S2CID   150368.
  2. Nica, Alexandra C.; Dermitzakis, Emmanouil T. (2013). "Expression quantitative trait loci: Present and future". Philosophical Transactions of the Royal Society B: Biological Sciences. 368 (1620): 20120362. doi:10.1098/rstb.2012.0362. PMC   3682727 . PMID   23650636.
  3. Fairfax, Benjamin P.; Makino, Seiko; Radhakrishnan, Jayachandran; Plant, Katharine; Leslie, Stephen; Dilthey, Alexander; Ellis, Peter; Langford, Cordelia; Vannberg, Fredrik O.; Knight, Julian C. (2012). "Genetics of gene expression in primary immune cells identifies cell type-specific master regulators and roles of HLA alleles". Nat. Genet. 44 (5): 502–510. doi:10.1038/ng.2205. PMC   3437404 . PMID   22446964.
  4. Liu S, Won H, Clarke D, Matoba N, Khullar S, Mu Y, Wang D, Gerstein M (2022). "Illuminating links between cis-regulators and trans-acting variants in the human prefrontal cortex". Genome Medicine. 14 (1): 133. doi: 10.1186/s13073-022-01133-8 . PMC   9685876 . PMID   36424644.
  5. Brem RB, Yvert G, Clinton R, Kruglyak L (April 2002). "Genetic dissection of transcriptional regulation in budding yeast". Science. 296 (5568): 752–5. Bibcode:2002Sci...296..752B. doi:10.1126/science.1069516. PMID   11923494. S2CID   9569352.
  6. Lonsdale, John; Thomas, Jeffrey; Salvatore, Mike; Phillips, Rebecca; Lo, Edmund; Shad, Saboor; Hasz, Richard; Walters, Gary; Garcia, Fernando; Young, Nancy; Foster, Barbara; Moser, Mike; Karasik, Ellen; Gillard, Bryan; Ramsey, Kimberley; Sullivan, Susan; Bridge, Jason; Magazine, Harold; Syron, John; Fleming, Johnelle; Siminoff, Laura; Traino, Heather; Mosavel, Maghboeba; Barker, Laura; Jewell, Scott; Rohrer, Dan; Maxim, Dan; Filkins, Dana; Harbach, Philip; et al. (June 2013). "The Genotype-Tissue Expression (GTEx) project". Nature Genetics. 45 (6): 580–5. doi:10.1038/ng.2653. PMC   4692118 . PMID   23715323.
  7. Tung J, Zhou X, Alberts SC, Stephens M, Gilad Y (February 2015). "The genetic architecture of gene expression levels in wild baboons". eLife. 4. doi: 10.7554/eLife.04729 . PMC   4383332 . PMID   25714927.
  8. Jasinska AJ, Zelaya I, Service SK, Peterson CB, Cantor RM, Choi OW, et al. (December 2017). "Genetic variation and gene expression across multiple tissues and developmental stages in a nonhuman primate". Nature Genetics. 49 (12): 1714–1721. doi:10.1038/ng.3959. PMC   5714271 . PMID   29083405.
  9. Doss S, Schadt EE, Drake TA, Lusis AJ (May 2005). "Cis-acting expression quantitative trait loci in mice". Genome Research. 15 (5): 681–91. doi:10.1101/gr.3216905. PMC   1088296 . PMID   15837804.
  10. Gerrits A, Li Y, Tesson BM, Bystrykh LV, Weersing E, Ausema A, Dontje B, Wang X, Breitling R, Jansen RC, de Haan G (October 2009). Gibson G (ed.). "Expression quantitative trait loci are highly sensitive to cellular differentiation state". PLOS Genetics. 5 (10): e1000692. doi: 10.1371/journal.pgen.1000692 . PMC   2757904 . PMID   19834560.
  11. Michaelson JJ, Loguercio S, Beyer A (July 2009). "Detection and interpretation of expression quantitative trait loci (eQTL)". Methods. 48 (3): 265–76. doi:10.1016/j.ymeth.2009.03.004. PMID   19303049.
  12. Cookson W, Liang L, Abecasis G, Moffatt M, Lathrop M (March 2009). "Mapping complex disease traits with global gene expression". Nature Reviews. Genetics. 10 (3): 184–94. doi:10.1038/nrg2537. PMC   4550035 . PMID   19223927.
  13. Cookson et al. Nat Rev Genet. 2009 Mar;10(3):184-94
  14. Nicolae DL, Gamazon E, Zhang W, Duan S, Dolan ME, Cox NJ (April 2010). Gibson G (ed.). "Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS". PLOS Genetics. 6 (4): e1000888. doi: 10.1371/journal.pgen.1000888 . PMC   2848547 . PMID   20369019.
  15. Gamazon ER, Huang RS, Cox NJ, Dolan ME (May 2010). "Chemotherapeutic drug susceptibility associated SNPs are enriched in expression quantitative trait loci". Proceedings of the National Academy of Sciences of the United States of America. 107 (20): 9287–92. Bibcode:2010PNAS..107.9287G. doi: 10.1073/pnas.1001827107 . PMC   2889115 . PMID   20442332.
  16. Gamazon ER, Wheeler HE, Shah KP, et al. (September 2015). "A gene-based association method for mapping traits using reference transcriptome data". Nature Genetics. 47 (9): 1091–1098. doi:10.1038/ng.3367. PMC   4552594 . PMID   26258848.
  17. Gusev A, Ko A, Shi H, et al. (March 2016). "Integrative approaches for large-scale transcriptome-wide association studies". Nature Genetics. 48 (3): 245–252. doi:10.1038/ng.3506. PMC   4767558 . PMID   26854917.
  18. Kulp DC, Jagalur M (2006). "Causal inference of regulator-target pairs by gene mapping of expression phenotypes". BMC Genomics. 7: 125. doi: 10.1186/1471-2164-7-125 . PMC   1481560 . PMID   16719927.
  19. Lee SI, Dudley AM, Drubin D, Silver PA, Krogan NJ, Pe'er D, Koller D (2009). "Learning a prior on regulatory potential from eQTL data". PLOS Genetics. 5 (1): e1000358. doi: 10.1371/journal.pgen.1000358 . PMC   2627940 . PMID   19180192.
  20. van der Wijst, M; de Vries, DH; Groot, HE; Trynka, G; Hon, CC; Bonder, MJ; Stegle, O; Nawijn, MC; Idaghdour, Y; van der Harst, P; Ye, CJ; Powell, J; Theis, FJ; Mahfouz, A; Heinig, M; Franke, L (9 March 2020). "The single-cell eQTLGen consortium". eLife. 9. doi: 10.7554/eLife.52155 . PMC   7077978 . PMID   32149610.
  21. Nathan, A; Asgari, S; Ishigaki, K; Valencia, C; Amariuta, T; Luo, Y; Beynor, JI; Baglaenko, Y; Suliman, S; Price, AL; Lecca, L; Murray, MB; Moody, DB; Raychaudhuri, S (June 2022). "Single-cell eQTL models reveal dynamic T cell state dependence of disease loci". Nature. 606 (7912): 120–128. Bibcode:2022Natur.606..120N. doi:10.1038/s41586-022-04713-1. PMC   9842455 . PMID   35545678. S2CID   248730439.