Expression quantitative trait loci

Last updated

An expression quantitative trait locus (eQTL) is a type of quantitative trait locus (QTL), a genomic locus (region of DNA) that is associated with phenotypic variation for a specific, quantifiable trait. While the term QTL can refer to a wide range of phenotypic traits, the more specific eQTL refers to traits measured by gene expression, such as mRNA levels. [1] [2] Although named "expression QTLs", not all measures of gene expression can be used for eQTLs. For example, traits quantified by protein levels are instead referred to as protein QTLs (pQTLs).

Contents

Distant and local, trans- and cis-eQTLs, respectively

An expression quantitative trait is an amount of an mRNA transcript or a protein. These are usually the product of a single gene with a specific chromosomal location. This distinguishes expression quantitative traits from most complex traits, which are not the product of the expression of a single gene. Chromosomal loci that explain variance in expression traits are called eQTLs. eQTLs located near the gene-of-origin (gene which produces the transcript or protein) are referred to as local eQTLs or cis-eQTLs. By contrast, those located distant from their gene of origin, often on different chromosomes, are referred to as distant eQTLs or trans-eQTLs. [3] [4] The first genome-wide study of gene expression was carried out in yeast and published in 2002. [5] The initial wave of eQTL studies employed microarrays to measure genome-wide gene expression; more recent studies have employed massively parallel RNA sequencing. Many expression QTL studies were performed in plants and animals, including humans, [6] non-human primates [7] [8] and mice. [9]

Some cis eQTLs are detected in many tissue types but the majority of trans eQTLs are tissue-dependent (dynamic). [10] eQTLs may act in cis (locally) or trans (at a distance) to a gene. [11] The abundance of a gene transcript is directly modified by polymorphism in regulatory elements. Consequently, transcript abundance might be considered as a quantitative trait that can be mapped with considerable power. These have been named expression QTLs (eQTLs). [12] The combination of whole-genome genetic association studies and the measurement of global gene expression allows the systematic identification of eQTLs. By assaying gene expression and genetic variation simultaneously on a genome-wide basis in a large number of individuals, statistical genetic methods can be used to map the genetic factors that underpin individual differences in quantitative levels of expression of many thousands of transcripts. [13] Studies have shown that single nucleotide polymorphisms (SNPs) reproducibly associated with complex disorders [14] as well as certain pharmacologic phenotypes [15] are found to be significantly enriched for eQTLs, relative to frequency-matched control SNPs. The integration of eQTLs with GWAS has led to development of the transcriptome-wide association study (TWAS) methodology. [16] [17]

Detecting eQTLs

Mapping eQTLs is done using standard QTL mapping methods that test the linkage between variation in expression and genetic polymorphisms. The only considerable difference is that eQTL studies can involve a million or more expression microtraits. Standard gene mapping software packages can be used, although it is often faster to use custom code such as QTL Reaper or the web-based eQTL mapping system GeneNetwork. GeneNetwork hosts many large eQTL mapping data sets and provide access to fast algorithms to map single loci and epistatic interactions. As is true in all QTL mapping studies, the final steps in defining DNA variants that cause variation in traits are usually difficult and require a second round of experimentation. This is especially the case for trans eQTLs that do not benefit from the strong prior probability that relevant variants are in the immediate vicinity of the parent gene. Statistical, graphical, and bioinformatic methods are used to evaluate positional candidate genes and entire systems of interactions. [18] [19] The development of single cell technologies, and parallel advances in statistical methods has made it possible to define even subtle changes in eQTLs as cell-states change. [20] [21]

See also

Related Research Articles

<span class="mw-page-title-main">Single-nucleotide polymorphism</span> Single nucleotide in genomic DNA at which different sequence alternatives exist

In genetics and bioinformatics, a single-nucleotide polymorphism is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in a sufficiently large fraction of the population, many publications do not apply such a frequency threshold.

A quantitative trait locus (QTL) is a locus that correlates with variation of a quantitative trait in the phenotype of a population of organisms. QTLs are mapped by identifying which molecular markers correlate with an observed trait. This is often an early step in identifying the actual genes that cause the trait variation.

Genetic architecture is the underlying genetic basis of a phenotypic trait and its variational properties. Phenotypic variation for quantitative traits is, at the most basic level, the result of the segregation of alleles at quantitative trait loci (QTL). Environmental factors and other external influences can also play a role in phenotypic variation. Genetic architecture is a broad term that can be described for any given individual based on information regarding gene and allele number, the distribution of allelic and mutational effects, and patterns of pleiotropy, dominance, and epistasis.

A polygene is a member of a group of non-epistatic genes that interact additively to influence a phenotypic trait, thus contributing to multiple-gene inheritance, a type of non-Mendelian inheritance, as opposed to single-gene inheritance, which is the core notion of Mendelian inheritance. The term "monozygous" is usually used to refer to a hypothetical gene as it is often difficult to distinguish the effect of an individual gene from the effects of other genes and the environment on a particular phenotype. Advances in statistical methodology and high throughput sequencing are, however, allowing researchers to locate candidate genes for the trait. In the case that such a gene is identified, it is referred to as a quantitative trait locus (QTL). These genes are generally pleiotropic as well. The genes that contribute to type 2 diabetes are thought to be mostly polygenes. In July 2016, scientists reported identifying a set of 355 genes from the last universal common ancestor (LUCA) of all organisms living on Earth.

Genetic association is when one or more genotypes within a population co-occur with a phenotypic trait more often than would be expected by chance occurrence.

<span class="mw-page-title-main">Locus (genetics)</span> Location of a gene or region on a chromosome

In genetics, a locus is a specific, fixed position on a chromosome where a particular gene or genetic marker is located. Each chromosome carries many genes, with each gene occupying a different position or locus; in humans, the total number of protein-coding genes in a complete haploid set of 23 chromosomes is estimated at 19,000–20,000.

<span class="mw-page-title-main">Genome-wide association study</span> Study of genetic variants in different individuals

In genomics, a genome-wide association study, is an observational study of a genome-wide set of genetic variants in different individuals to see if any variant is associated with a trait. GWA studies typically focus on associations between single-nucleotide polymorphisms (SNPs) and traits like major human diseases, but can equally be applied to any other genetic variants and any other organisms.

Population genomics is the large-scale comparison of DNA sequences of populations. Population genomics is a neologism that is associated with population genetics. Population genomics studies genome-wide effects to improve our understanding of microevolution so that we may learn the phylogenetic history and demography of a population.

<span class="mw-page-title-main">Neurogenetics</span> Study of role of genetics in the nervous system

Neurogenetics studies the role of genetics in the development and function of the nervous system. It considers neural characteristics as phenotypes, and is mainly based on the observation that the nervous systems of individuals, even of those belonging to the same species, may not be identical. As the name implies, it draws aspects from both the studies of neuroscience and genetics, focusing in particular how the genetic code an organism carries affects its expressed traits. Mutations in this genetic sequence can have a wide range of effects on the quality of life of the individual. Neurological diseases, behavior and personality are all studied in the context of neurogenetics. The field of neurogenetics emerged in the mid to late 20th century with advances closely following advancements made in available technology. Currently, neurogenetics is the center of much research utilizing cutting edge techniques.

In genetics, association mapping, also known as "linkage disequilibrium mapping", is a method of mapping quantitative trait loci (QTLs) that takes advantage of historic linkage disequilibrium to link phenotypes to genotypes, uncovering genetic associations.

GeneNetwork is a combined database and open-source bioinformatics data analysis software resource for systems genetics. This resource is used to study gene regulatory networks that link DNA sequence differences to corresponding differences in gene and protein expression and to variation in traits such as health and disease risk. Data sets in GeneNetwork are typically made up of large collections of genotypes and phenotypes from groups of individuals, including humans, strains of mice and rats, and organisms as diverse as Drosophila melanogaster, Arabidopsis thaliana, and barley. The inclusion of genotypes makes it practical to carry out web-based gene mapping to discover those regions of genomes that contribute to differences among individuals in mRNA, protein, and metabolite levels, as well as differences in cell function, anatomy, physiology, and behavior.

Molecular breeding is the application of molecular biology tools, often in plant breeding and animal breeding. In the broad sense, molecular breeding can be defined as the use of genetic manipulation performed at the level of DNA to improve traits of interest in plants and animals, and it may also include genetic engineering or gene manipulation, molecular marker-assisted selection, and genomic selection. More often, however, molecular breeding implies molecular marker-assisted breeding (MAB) and is defined as the application of molecular biotechnologies, specifically molecular markers, in combination with linkage maps and genomics, to alter and improve plant or animal traits on the basis of genotypic assays.

<span class="mw-page-title-main">Michael Goddard</span>

Michael Edward "Mike" Goddard is a professorial fellow in animal genetics at the University of Melbourne, Australia.

<span class="mw-page-title-main">Zinc transporter ZIP12</span> Protein found in humans

Solute carrier family 39 member 12 is a protein that in humans is encoded by the SLC39A12 gene.

<span class="mw-page-title-main">Complex traits</span>

Complex traits are phenotypes that are controlled by two or more genes and do not follow Mendel's Law of Dominance. They may have a range of expression which is typically continuous. Both environmental and genetic factors often impact the variation in expression. Human height is a continuous trait meaning that there is a wide range of heights. There are an estimated 50 genes that affect the height of a human. Environmental factors, like nutrition, also play a role in a human's height. Other examples of complex traits include: crop yield, plant color, and many diseases including diabetes and Parkinson's disease. One major goal of genetic research today is to better understand the molecular mechanisms through which genetic variants act to influence complex traits. Complex traits are also known as polygenic traits and multigenic traits.

SoyBase is a database created by the United States Department of Agriculture. It contains genetic information about soybeans. It includes genetic maps, information about Mendelian genetics and molecular data regarding genes and sequences. It was started in 1990 and is freely available to individuals and organizations worldwide.

<span class="mw-page-title-main">NAALADL2</span>

N-Acetylated Alpha-Linked Acidic Dipeptidase Like 2 (NAALADL2) is a protein, encoded by the gene NAALADL2 in humans. NAALADL2 shares 25%–26% sequence identity and 45% sequence similarity with the glutamate carboxypeptidase II family which includes prostate cancer marker PSMA (FOLH1/NAALAD1). The NAALADL2 gene is a giant gene spanning 1.37 Mb which is approximately 49 times larger than the average gene size of 28 kb. Gene length is correlated with the number of transcript variants of a gene, as such, NAALADL2 undergoes extensive alternative splicing and has 12 splice variants as defined by Ensembl.

Transcriptome-wide association study (TWAS) is a genetic methodology that can be used to compare the genetic components of gene expression and the genetic components of a trait to determine if an association is present between the two components. TWAS are useful for the identification and prioritization of candidate causal genes in candidate gene analysis following genome-wide association studies. TWAS looks at the RNA products of a specific tissue and gives researchers the abilities to look at the genes being expressed as well as gene expression levels, which varies by tissue type. TWAS are valuable and flexible bioinformatics tools that looks at the associations between the expressions of genes and complex traits and diseases. By looking at the association between gene expression and the trait expressed, genetic regulatory mechanisms can be investigated for the role that they play in the development of specific traits and diseases.

Eric R. Gamazon is a statistical geneticist in Vanderbilt University, with faculty affiliations in the Division of Genetic Medicine, Data Science Institute, and Center for Precision Medicine. He is a Life Member of Clare Hall, Cambridge University after election to a Visiting Fellowship (2018).

References

  1. Rockman MV, Kruglyak L (November 2006). "Genetics of global gene expression". Nature Reviews. Genetics. 7 (11): 862–72. doi:10.1038/nrg1964. PMID   17047685. S2CID   150368.
  2. Nica, Alexandra C.; Dermitzakis, Emmanouil T. (2013). "Expression quantitative trait loci: Present and future". Philosophical Transactions of the Royal Society B: Biological Sciences. 368 (1620): 20120362. doi:10.1098/rstb.2012.0362. PMC   3682727 . PMID   23650636.
  3. Fairfax, Benjamin P.; Makino, Seiko; Radhakrishnan, Jayachandran; Plant, Katharine; Leslie, Stephen; Dilthey, Alexander; Ellis, Peter; Langford, Cordelia; Vannberg, Fredrik O.; Knight, Julian C. (2012). "Genetics of gene expression in primary immune cells identifies cell type-specific master regulators and roles of HLA alleles". Nat. Genet. 44 (5): 502–510. doi:10.1038/ng.2205. PMC   3437404 . PMID   22446964.
  4. Liu S, Won H, Clarke D, Matoba N, Khullar S, Mu Y, Wang D, Gerstein M (2022). "Illuminating links between cis-regulators and trans-acting variants in the human prefrontal cortex". Genome Medicine. 14 (1): 133. doi: 10.1186/s13073-022-01133-8 . PMC   9685876 . PMID   36424644.
  5. Brem RB, Yvert G, Clinton R, Kruglyak L (April 2002). "Genetic dissection of transcriptional regulation in budding yeast". Science. 296 (5568): 752–5. Bibcode:2002Sci...296..752B. doi:10.1126/science.1069516. PMID   11923494. S2CID   9569352.
  6. Lonsdale, John; Thomas, Jeffrey; Salvatore, Mike; Phillips, Rebecca; Lo, Edmund; Shad, Saboor; Hasz, Richard; Walters, Gary; Garcia, Fernando; Young, Nancy; Foster, Barbara; Moser, Mike; Karasik, Ellen; Gillard, Bryan; Ramsey, Kimberley; Sullivan, Susan; Bridge, Jason; Magazine, Harold; Syron, John; Fleming, Johnelle; Siminoff, Laura; Traino, Heather; Mosavel, Maghboeba; Barker, Laura; Jewell, Scott; Rohrer, Dan; Maxim, Dan; Filkins, Dana; Harbach, Philip; et al. (June 2013). "The Genotype-Tissue Expression (GTEx) project". Nature Genetics. 45 (6): 580–5. doi:10.1038/ng.2653. PMC   4692118 . PMID   23715323.
  7. Tung J, Zhou X, Alberts SC, Stephens M, Gilad Y (February 2015). "The genetic architecture of gene expression levels in wild baboons". eLife. 4. doi: 10.7554/eLife.04729 . PMC   4383332 . PMID   25714927.
  8. Jasinska AJ, Zelaya I, Service SK, Peterson CB, Cantor RM, Choi OW, et al. (December 2017). "Genetic variation and gene expression across multiple tissues and developmental stages in a nonhuman primate". Nature Genetics. 49 (12): 1714–1721. doi:10.1038/ng.3959. PMC   5714271 . PMID   29083405.
  9. Doss S, Schadt EE, Drake TA, Lusis AJ (May 2005). "Cis-acting expression quantitative trait loci in mice". Genome Research. 15 (5): 681–91. doi:10.1101/gr.3216905. PMC   1088296 . PMID   15837804.
  10. Gerrits A, Li Y, Tesson BM, Bystrykh LV, Weersing E, Ausema A, Dontje B, Wang X, Breitling R, Jansen RC, de Haan G (October 2009). Gibson G (ed.). "Expression quantitative trait loci are highly sensitive to cellular differentiation state". PLOS Genetics. 5 (10): e1000692. doi: 10.1371/journal.pgen.1000692 . PMC   2757904 . PMID   19834560.
  11. Michaelson JJ, Loguercio S, Beyer A (July 2009). "Detection and interpretation of expression quantitative trait loci (eQTL)". Methods. 48 (3): 265–76. doi:10.1016/j.ymeth.2009.03.004. PMID   19303049.
  12. Cookson W, Liang L, Abecasis G, Moffatt M, Lathrop M (March 2009). "Mapping complex disease traits with global gene expression". Nature Reviews. Genetics. 10 (3): 184–94. doi:10.1038/nrg2537. PMC   4550035 . PMID   19223927.
  13. Cookson et al. Nat Rev Genet. 2009 Mar;10(3):184-94
  14. Nicolae DL, Gamazon E, Zhang W, Duan S, Dolan ME, Cox NJ (April 2010). Gibson G (ed.). "Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS". PLOS Genetics. 6 (4): e1000888. doi: 10.1371/journal.pgen.1000888 . PMC   2848547 . PMID   20369019.
  15. Gamazon ER, Huang RS, Cox NJ, Dolan ME (May 2010). "Chemotherapeutic drug susceptibility associated SNPs are enriched in expression quantitative trait loci". Proceedings of the National Academy of Sciences of the United States of America. 107 (20): 9287–92. Bibcode:2010PNAS..107.9287G. doi: 10.1073/pnas.1001827107 . PMC   2889115 . PMID   20442332.
  16. Gamazon ER, Wheeler HE, Shah KP, et al. (September 2015). "A gene-based association method for mapping traits using reference transcriptome data". Nature Genetics. 47 (9): 1091–1098. doi:10.1038/ng.3367. PMC   4552594 . PMID   26258848.
  17. Gusev A, Ko A, Shi H, et al. (March 2016). "Integrative approaches for large-scale transcriptome-wide association studies". Nature Genetics. 48 (3): 245–252. doi:10.1038/ng.3506. PMC   4767558 . PMID   26854917.
  18. Kulp DC, Jagalur M (2006). "Causal inference of regulator-target pairs by gene mapping of expression phenotypes". BMC Genomics. 7: 125. doi: 10.1186/1471-2164-7-125 . PMC   1481560 . PMID   16719927.
  19. Lee SI, Dudley AM, Drubin D, Silver PA, Krogan NJ, Pe'er D, Koller D (2009). "Learning a prior on regulatory potential from eQTL data". PLOS Genetics. 5 (1): e1000358. doi: 10.1371/journal.pgen.1000358 . PMC   2627940 . PMID   19180192.
  20. van der Wijst, M; de Vries, DH; Groot, HE; Trynka, G; Hon, CC; Bonder, MJ; Stegle, O; Nawijn, MC; Idaghdour, Y; van der Harst, P; Ye, CJ; Powell, J; Theis, FJ; Mahfouz, A; Heinig, M; Franke, L (9 March 2020). "The single-cell eQTLGen consortium". eLife. 9. doi: 10.7554/eLife.52155 . PMC   7077978 . PMID   32149610.
  21. Nathan, A; Asgari, S; Ishigaki, K; Valencia, C; Amariuta, T; Luo, Y; Beynor, JI; Baglaenko, Y; Suliman, S; Price, AL; Lecca, L; Murray, MB; Moody, DB; Raychaudhuri, S (June 2022). "Single-cell eQTL models reveal dynamic T cell state dependence of disease loci". Nature. 606 (7912): 120–128. Bibcode:2022Natur.606..120N. doi:10.1038/s41586-022-04713-1. PMC   9842455 . PMID   35545678. S2CID   248730439.