Candidate gene

Last updated

The candidate gene approach to conducting genetic association studies focuses on associations between genetic variation within pre-specified genes of interest, and phenotypes or disease states. This is in contrast to genome-wide association studies (GWAS), which is a hypothesis-free approach that scans the entire genome for associations between common genetic variants (typically SNPs) and traits of interest. Candidate genes are most often selected for study based on a priori knowledge of the gene's biological functional impact on the trait or disease in question. [1] [2] The rationale behind focusing on allelic variation in specific, biologically relevant regions of the genome is that certain alleles within a gene may directly impact the function of the gene in question and lead to variation in the phenotype or disease state being investigated. This approach often uses the case-control study design to try to answer the question, "Is one allele of a candidate gene more frequently seen in subjects with the disease than in subjects without the disease?" [1] Candidate genes hypothesized to be associated with complex traits have generally not been replicated by subsequent GWASs [3] [4] [5] [6] or highly powered replication attempts. [7] [8] The failure of candidate gene studies to shed light on the specific genes underlying such traits has been ascribed to insufficient statistical power, low prior probability that scientists can correctly guess a specific allele within a specific gene that is related to a trait, poor methodological practices, and data dredging. [9] [6] [10]

Contents

Selection

Suitable candidate genes are generally selected based on known biological, physiological, or functional relevance to the disease in question. This approach is limited by its reliance on existing knowledge about known or theoretical biology of disease. However, molecular tools are allowing insight into disease mechanisms and pinpointing potential regions of interest in the genome. Genome-wide association studies (GWAS) and quantitative trait locus (QTL) mapping examine common variation across the entire genome, and as such can detect a new region of interest that is in or near a potential candidate gene. Microarray data allow researchers to examine differential gene expression between cases and controls, and can help pinpoint new potential genes of interest. [11]

The great variability between organisms can sometimes make it difficult to distinguish normal variation in single-nucleotide polymorphisms (SNP) from a candidate gene with disease-associated variation. [12] In analyzing large amounts of data, there are several other factors that can help lead to the most probable variant. These factors include priorities in SNPs, relative risk of functional change in genes, and linkage disequilibrium among SNPs. [13]

In addition, the availability of genetic information through online databases enables researchers to mine existing data and web-based resources for new candidate gene targets. [14] Many online databases are available to research genes across species.

Prior to the candidate-gene approach

Before the candidate-gene approach was fully developed, various other methods were used to identify genes linked to disease-states. These methods studied genetic linkage and positional cloning through the use of a genetic screen, and were effective at identifying relative risk genes in Mendelian diseases. [13] [19] However, these methods are not as beneficial when studying complex diseases for several reasons: [13]

  1. Complex diseases tend to vary in both age of onset and severity. This can be due to variation in penetrance and expressivity. [20] For most human diseases, variable expressivity of the disease phenotype is the norm. This makes choosing one specific age group or phenotypic marker more difficult to select for study. [13]
  2. The origins of complex disease involve many biological pathways, some of which may differ between disease phenotypes. [13]
  3. Most importantly, complex diseases often illustrate genetic heterogeneity – multiple genes can be found that interact and produce one disease state. Oftentimes, each single gene is partially responsible for the phenotype produced and overall risk for the disorder. [13] [21]

Criticisms

A study of candidate genes seeks to balance the use of data while attempting to minimize the chance of creating false positive or negative results. [13] Because this balance can often be difficult, there are several criticisms of the candidate gene approach that are important to understand before beginning such a study. For instance, the candidate-gene approach has been shown to produce a high rate of false positives, [22] which requires that the findings of single genetic associations be treated with great caution. [23]

One critique is that findings of association within candidate-gene studies have not been easily replicated in follow up studies. [24] For instance, a recent investigation on 18 well-studied candidate genes for depression (10 publications or more each) failed to identify any significant association with depression, despite using samples orders of magnitude larger than those from the original publications. [25] In addition to statistical issues (e.g. underpowered studies), population stratification has often been blamed for this inconsistency; therefore caution must also be taken in regards to what criteria define a certain phenotype, as well as other variations in design study. [13]

Additionally, because these studies incorporate a priori knowledge, some critics argue that our knowledge is not sufficient to make valid predictions. Therefore, results gained from these 'hypothesis-driven' approaches are dependent on the ability to select plausible candidates from the genome, rather than use a hypothesis-free approach.

Use in research studies

One of the earliest successes using the candidate gene approach was finding a single base mutation in the non-coding region of the APOC3 (apolipoprotein C3 gene) that associated with higher risks of hypertriglyceridemia and atherosclerosis. [26] In a study by Kim et al., genes linked to the obesity trait in both pigs and humans were discovered using comparative genomics and chromosomal heritability. [27] By using these two methods, the researchers were able to overcome the criticism that candidate gene studies are solely focused on prior knowledge. Comparative genomics was completed by examining both human and pig quantitative trait loci through a method known as genome-wide complex trait analysis (GCTA), which allowed the researchers to then map genetic variance to specific chromosomes. This allowed the parameter of heritability to provide understanding of where phenotypic variation was on specific chromosomal regions, thus extending to candidate markers and genes within these regions. Other studies may also use computational methods to find candidate genes in a widespread, complementary way, such as one study by Tiffin et al. studying genes linked to type 2 diabetes. [12]

Many studies have similarly used candidate genes as part of a multi-disciplinary approach to examining a trait or phenotype. One example of manipulating candidate genes can be seen in a study completed by Martin E. Feder on heat-shock proteins and their function in Drosophila melanogaster . [28] Feder designed a holistic approach to study Hsp70 , a candidate gene that was hypothesized to play a role in how an organism adapted to stress. D. melanogaster is a highly useful model organism for studying this trait due to the way it can support a diverse number of genetic approaches for studying a candidate gene. The different approaches this study took included both genetically modifying the candidate gene (using site-specific homologous recombination and the expression of various proteins), as well as examining the natural variation of Hsp70. He concluded that the results of these studies gave a multi-faceted view of Hsp70. The manipulation of candidate genes is also seen in Caspar C. Chater's study of the origin and function of stomata in Physcomitrella patens , a moss. PpSMF1, PpSMF2 and PpSCRM1 were the three candidate genes that were knocked down by homologous recombination to see any changes in the development of stomata. With the knock down experiment, Chater observed that PpSMF1 and PpSCRM1 were responsible for stomata development in P. patens. [29] By engineering and modifying these candidate genes, they were able to confirm the ways in which this gene was linked to a change phenotype. Understanding the natural and historical context in which these phenotypes operate by examining the natural genome structure complemented this.

Related Research Articles

A genetic screen or mutagenesis screen is an experimental technique used to identify and select individuals who possess a phenotype of interest in a mutagenized population. Hence a genetic screen is a type of phenotypic screen. Genetic screens can provide important information on gene function as well as the molecular events that underlie a biological process or pathway. While genome projects have identified an extensive inventory of genes in many different organisms, genetic screens can provide valuable insight as to how those genes function.

<span class="mw-page-title-main">Single-nucleotide polymorphism</span> Single nucleotide in genomic DNA at which different sequence alternatives exist

In genetics and bioinformatics, a single-nucleotide polymorphism is a germline substitution of a single nucleotide at a specific position in the genome that is present in a sufficiently large fraction of considered population.

Genetic architecture is the underlying genetic basis of a phenotypic trait and its variational properties. Phenotypic variation for quantitative traits is, at the most basic level, the result of the segregation of alleles at quantitative trait loci (QTL). Environmental factors and other external influences can also play a role in phenotypic variation. Genetic architecture is a broad term that can be described for any given individual based on information regarding gene and allele number, the distribution of allelic and mutational effects, and patterns of pleiotropy, dominance, and epistasis.

<span class="mw-page-title-main">Gene–environment interaction</span> Response to the same environmental variation differently by different genotypes

Gene–environment interaction is when two different genotypes respond to environmental variation in different ways. A norm of reaction is a graph that shows the relationship between genes and environmental factors when phenotypic differences are continuous. They can help illustrate GxE interactions. When the norm of reaction is not parallel, as shown in the figure below, there is a gene by environment interaction. This indicates that each genotype responds to environmental variation in a different way. Environmental variation can be physical, chemical, biological, behavior patterns or life events.

<span class="mw-page-title-main">Heritability of autism</span>

The heritability of autism is the proportion of differences in expression of autism that can be explained by genetic variation; if the heritability of a condition is high, then the condition is considered to be primarily genetic. Autism has a strong genetic basis. Although the genetics of autism are complex, autism spectrum disorder (ASD) is explained more by multigene effects than by rare mutations with large effects.

A phene is an individual genetically determined characteristic or trait which can be possessed by an organism, such as eye colour, height, behavior, tooth shape or any other observable characteristic.

<span class="mw-page-title-main">Genome-wide association study</span> Study of genetic variants in different individuals

In genomics, a genome-wide association study, is an observational study of a genome-wide set of genetic variants in different individuals to see if any variant is associated with a trait. GWA studies typically focus on associations between single-nucleotide polymorphisms (SNPs) and traits like major human diseases, but can equally be applied to any other genetic variants and any other organisms.

In genetic epidemiology, endophenotype is a term used to separate behavioral symptoms into more stable phenotypes with a clear genetic connection. By seeing the EP notion as a special case of a larger collection of multivariate genetic models, which may be fitted using currently accessible methodology, it is possible to maximize its valuable potential lessons for etiological study in psychiatric disorders. The concept was coined by Bernard John and Kenneth R. Lewis in a 1966 paper attempting to explain the geographic distribution of grasshoppers. They claimed that the particular geographic distribution could not be explained by the obvious and external "exophenotype" of the grasshoppers, but instead must be explained by their microscopic and internal "endophenotype". The endophenotype idea represents the influence of two important conceptual currents in biology and psychology research. An adequate technology would be required to perceive the endophenotype, which represents an unobservable latent entity that cannot be directly observed with the unaided naked eye. In the investigation of anxiety and affective disorders, the endophenotype idea has gained popularity.

<span class="mw-page-title-main">Neurogenetics</span>

Neurogenetics studies the role of genetics in the development and function of the nervous system. It considers neural characteristics as phenotypes, and is mainly based on the observation that the nervous systems of individuals, even of those belonging to the same species, may not be identical. As the name implies, it draws aspects from both the studies of neuroscience and genetics, focusing in particular how the genetic code an organism carries affects its expressed traits. Mutations in this genetic sequence can have a wide range of effects on the quality of life of the individual. Neurological diseases, behavior and personality are all studied in the context of neurogenetics. The field of neurogenetics emerged in the mid to late 20th century with advances closely following advancements made in available technology. Currently, neurogenetics is the center of much research utilizing cutting edge techniques.

<span class="mw-page-title-main">Neurogenomics</span> Bitcoin is like the ENIAC

Neurogenomics is the study of how the genome of an organism influences the development and function of its nervous system. This field intends to unite functional genomics and neurobiology in order to understand the nervous system as a whole from a genomic perspective.

Behavioural genetics, also referred to as behaviour genetics, is a field of scientific research that uses genetic methods to investigate the nature and origins of individual differences in behaviour. While the name "behavioural genetics" connotes a focus on genetic influences, the field broadly investigates the extent to which genetic and environmental factors influence individual differences, and the development of research designs that can remove the confounding of genes and environment. Behavioural genetics was founded as a scientific discipline by Francis Galton in the late 19th century, only to be discredited through association with eugenics movements before and during World War II. In the latter half of the 20th century, the field saw renewed prominence with research on inheritance of behaviour and mental illness in humans, as well as research on genetically informative model organisms through selective breeding and crosses. In the late 20th and early 21st centuries, technological advances in molecular genetics made it possible to measure and modify the genome directly. This led to major advances in model organism research and in human studies, leading to new scientific discoveries.

Expression quantitative trait loci (eQTLs) are genomic loci that explain variation in expression levels of mRNAs.

The missing heritability problem is the fact that single genetic variations cannot account for much of the heritability of diseases, behaviors, and other phenotypes. This is a problem that has significant implications for medicine, since a person's susceptibility to disease may depend more on the combined effect of all the genes in the background than on the disease genes in the foreground, or the role of genes may have been severely overestimated.

In bioinformatics, a Gene Disease Database is a systematized collection of data, typically structured to model aspects of reality, in a way to comprehend the underlying mechanisms of complex diseases, by understanding multiple composite interactions between phenotype-genotype relationships and gene-disease mechanisms. Gene Disease Databases integrate human gene-disease associations from various expert curated databases and text mining derived associations including Mendelian, complex and environmental diseases.

A human disease modifier gene is a modifier gene that alters expression of a human gene at another locus that in turn causes a genetic disease. Whereas medical genetics has tended to distinguish between monogenic traits, governed by simple, Mendelian inheritance, and quantitative traits, with cumulative, multifactorial causes, increasing evidence suggests that human diseases exist on a continuous spectrum between the two.

<span class="mw-page-title-main">Epigenome-wide association study</span>

An epigenome-wide association study (EWAS) is an examination of a genome-wide set of quantifiable epigenetic marks, such as DNA methylation, in different individuals to derive associations between epigenetic variation and a particular identifiable phenotype/trait. When patterns change such as DNA methylation at specific loci, discriminating the phenotypically affected cases from control individuals, this is considered an indication that epigenetic perturbation has taken place that is associated, causally or consequentially, with the phenotype.

<span class="mw-page-title-main">Complex traits</span>

Complex traits, also known as quantitative traits, are traits that do not behave according to simple Mendelian inheritance laws. More specifically, their inheritance cannot be explained by the genetic segregation of a single gene. Such traits show a continuous range of variation and are influenced by both environmental and genetic factors. Compared to strictly Mendelian traits, complex traits are far more common, and because they can be hugely polygenic, they are studied using statistical techniques such as quantitative genetics and quantitative trait loci (QTL) mapping rather than classical genetics methods. Examples of complex traits include height, circadian rhythms, enzyme kinetics, and many diseases including diabetes and Parkinson's disease. One major goal of genetic research today is to better understand the molecular mechanisms through which genetic variants act to influence complex traits.

Matthew C. Keller is an American behavioral and psychiatric geneticist. He is the Director of the Institute for Behavioral Genetics and a professor in the Department of Psychology and Neuroscience at the University of Colorado Boulder. He is known for his criticism of the candidate gene approach and for development of approaches in quantitative genetics.

Personality traits are patterns of thoughts, feelings and behaviors that reflect the tendency to respond in certain ways under certain circumstances.

Transcriptome-wide association study (TWAS) is a genetic methodology that can be used to compare the genetic components of gene expression and the genetic components of a trait to determine if an association is present between the two components. TWAS are useful for the identification and prioritization of candidate causal genes in candidate gene analysis following genome-wide association studies. TWAS looks at the RNA products of a specific tissue and gives researchers the abilities to look at the genes being expressed as well as gene expression levels, which varies by tissue type. TWAS are valuable and flexible bioinformatics tools that looks at the associations between the expressions of genes and complex traits and diseases. By looking at the association between gene expression and the trait expressed, genetic regulatory mechanisms can be investigated for the role that they play in the development of specific traits and diseases.

References

  1. 1 2 Kwon JM, Goate AM (2000). "The candidate gene approach" (PDF). Alcohol Research & Health. 24 (3): 164–168. PMC   6709736 . PMID   11199286.
  2. Zhu M, Zhao S (October 2007). "Candidate gene identification approach: progress and challenges". International Journal of Biological Sciences. 3 (7): 420–427. doi:10.7150/ijbs.3.420. PMC   2043166 . PMID   17998950.
  3. Johnson EC, Border R, Melroy-Greif WE, de Leeuw CA, Ehringer MA, Keller MC (November 2017). "No Evidence That Schizophrenia Candidate Genes Are More Associated With Schizophrenia Than Noncandidate Genes". Biological Psychiatry. 82 (10): 702–708. doi:10.1016/j.biopsych.2017.06.033. PMC   5643230 . PMID   28823710.
  4. Chabris CF, Hebert BM, Benjamin DJ, Beauchamp J, Cesarini D, van der Loos M, et al. (2012-09-24). "Most reported genetic associations with general intelligence are probably false positives". Psychological Science. 23 (11): 1314–1323. doi:10.1177/0956797611435528. PMC   3498585 . PMID   23012269.
  5. Bosker FJ, Hartman CA, Nolte IM, Prins BP, Terpstra P, Posthuma D, et al. (May 2011). "Poor replication of candidate genes for major depressive disorder using genome-wide association data". Molecular Psychiatry. 16 (5): 516–532. doi: 10.1038/mp.2010.38 . PMID   20351714.
  6. 1 2 Border R, Johnson EC, Evans LM, Smolen A, Berley N, Sullivan PF, Keller MC (May 2019). "No Support for Historical Candidate Gene or Candidate Gene-by-Interaction Hypotheses for Major Depression Across Multiple Large Samples". The American Journal of Psychiatry. 176 (5): 376–387. doi:10.1176/appi.ajp.2018.18070881. PMC   6548317 . PMID   30845820.
  7. Duncan LE, Keller MC (October 2011). "A critical review of the first 10 years of candidate gene-by-environment interaction research in psychiatry". The American Journal of Psychiatry. 168 (10): 1041–1049. doi:10.1176/appi.ajp.2011.11020191. PMC   3222234 . PMID   21890791.
  8. Culverhouse RC, Saccone NL, Horton AC, Ma Y, Anstey KJ, Banaschewski T, et al. (January 2018). "Collaborative meta-analysis finds no evidence of a strong interaction between stress and 5-HTTLPR genotype contributing to the development of depression". Molecular Psychiatry. 23 (1): 133–142. doi:10.1038/mp.2017.44. PMC   5628077 . PMID   28373689.
  9. Farrell MS, Werge T, Sklar P, Owen MJ, Ophoff RA, O'Donovan MC, et al. (May 2015). "Evaluating historical candidate genes for schizophrenia". Molecular Psychiatry. 20 (5): 555–562. doi:10.1038/mp.2015.16. PMC   4414705 . PMID   25754081.
  10. Risch N, Herrell R, Lehner T, Liang KY, Eaves L, Hoh J, et al. (June 2009). "Interaction between the serotonin transporter gene (5-HTTLPR), stressful life events, and risk of depression: a meta-analysis". JAMA. 301 (23): 2462–71. doi:10.1001/jama.2009.878. PMC   2938776 . PMID   19531786.
  11. Wayne ML, McIntyre LM (November 2002). "Combining mapping and arraying: An approach to candidate gene identification". Proceedings of the National Academy of Sciences of the United States of America. 99 (23): 14903–14906. Bibcode:2002PNAS...9914903W. doi: 10.1073/pnas.222549199 . PMC   137517 . PMID   12415114.
  12. 1 2 Tiffin N, Adie E, Turner F, Brunner HG, van Driel MA, Oti M, et al. (2006). "Computational disease gene identification: a concert of methods prioritizes type 2 diabetes and obesity candidate genes". Nucleic Acids Research. 34 (10): 3067–3081. doi:10.1093/nar/gkl381. PMC   1475747 . PMID   16757574.
  13. 1 2 3 4 5 6 7 8 Tabor HK, Risch NJ, Myers RM (May 2002). "Candidate-gene approaches for studying complex genetic traits: practical considerations". Nature Reviews. Genetics. 3 (5): 391–397. doi:10.1038/nrg796. PMID   11988764. S2CID   23314997.
  14. Zhu M, Zhao S (October 2007). "Candidate gene identification approach: progress and challenges". International Journal of Biological Sciences. 3 (7): 420–427. doi:10.7150/ijbs.3.420. PMC   2043166 . PMID   17998950.
  15. Chen J, Bardes EE, Aronow BJ, Jegga AG (July 2009). "ToppGene Suite for gene list enrichment analysis and candidate gene prioritization". Nucleic Acids Research. 37 (Web Server issue): W305–W311. doi:10.1093/nar/gkp427. PMC   2703978 . PMID   19465376.
  16. Sulakhe D, Balasubramanian S, Xie B, Feng B, Taylor A, Wang S, et al. (January 2014). "Lynx: a database and knowledge extraction engine for integrative medicine". Nucleic Acids Research. 42 (Database issue): D1007–D1012. doi:10.1093/nar/gkt1166. PMC   3965040 . PMID   24270788.
  17. Xie B, Agam G, Balasubramanian S, Xu J, Gilliam TC, Maltsev N, Börnigen D (April 2015). "Disease gene prioritization using network and feature". Journal of Computational Biology. 22 (4): 313–323. doi:10.1089/cmb.2015.0001. PMC   4808289 . PMID   25844670.
  18. Nitsch D, Gonçalves JP, Ojeda F, de Moor B, Moreau Y (September 2010). "Candidate gene prioritization by network analysis of differential expression using machine learning approaches". BMC Bioinformatics. 11 (1): 460. doi: 10.1186/1471-2105-11-460 . PMC   2945940 . PMID   20840752.
  19. Teixeira LV, Lezirovitz K, Mandelbaum KL, Pereira LV, Perez AB (August 2011). "Candidate gene linkage analysis indicates genetic heterogeneity in Marfan syndrome". Brazilian Journal of Medical and Biological Research. 44 (8): 793–800. doi: 10.1590/s0100-879x2011007500095 . PMID   21789464.
  20. Lobo I (2008). "Same genetic mutation, different genetic disease phenotype". Nature Education. 1 (1): 64.
  21. Gizer IR, Ficks C, Waldman ID (July 2009). "Candidate gene studies of ADHD: a meta-analytic review". Human Genetics. 126 (1): 51–90. doi:10.1007/s00439-009-0694-x. PMID   19506906. S2CID   166017.
  22. Border, Richard; Johnson, Emma C.; Evans, Luke M.; Smolen, Andrew; Berley, Noah; Sullivan, Patrick F.; Keller, Matthew C. (2019-05-01). "No Support for Historical Candidate Gene or Candidate Gene-by-Interaction Hypotheses for Major Depression Across Multiple Large Samples". American Journal of Psychiatry. 176 (5): 376–387. doi:10.1176/appi.ajp.2018.18070881. ISSN   0002-953X. PMC   6548317 . PMID   30845820.
  23. Sullivan PF (May 2007). "Spurious genetic associations". Biological Psychiatry. 61 (10): 1121–1126. doi:10.1016/j.biopsych.2006.11.010. PMID   17346679. S2CID   35033987.
  24. Hutchison KE, Stallings M, McGeary J, Bryan A (January 2004). "Population stratification in the candidate gene study: fatal threat or red herring?". Psychological Bulletin. 130 (1): 66–79. doi:10.1037/0033-2909.130.1.66. PMID   14717650.
  25. Border R, Johnson EC, Evans LM, Smolen A, Berley N, Sullivan PF, Keller MC (May 2019). "No Support for Historical Candidate Gene or Candidate Gene-by-Interaction Hypotheses for Major Depression Across Multiple Large Samples". The American Journal of Psychiatry. 176 (5): 376–387. doi:10.1176/appi.ajp.2018.18070881. PMC   6548317 . PMID   30845820.
  26. Rees A, Shoulders CC, Stocks J, Galton DJ, Baralle FE (February 1983). "DNA polymorphism adjacent to human apoprotein A-1 gene: relation to hypertriglyceridaemia". Lancet. 1 (8322): 444–446. doi:10.1016/s0140-6736(83)91440-x. PMID   6131168. S2CID   29511911.
  27. Kim J, Lee T, Kim TH, Lee KT, Kim H (December 2012). "An integrated approach of comparative genomics and heritability analysis of pig and human on obesity trait: evidence for candidate genes on human chromosome 2". BMC Genomics. 13: 711. doi: 10.1186/1471-2164-13-711 . PMC   3562524 . PMID   23253381.
  28. Feder ME (July 1999). "Engineering Candidate Genes in Studies of Adaptation: The Heat-Shock Protein Hsp70 in Drosophila melanogaster". The American Naturalist. 154 (S1): S55–S66. doi:10.1086/303283. PMID   29586709. S2CID   4394996.
  29. Chater CC, Caine RS, Tomek M, Wallace S, Kamisugi Y, Cuming AC, et al. (November 2016). "Origin and function of stomata in the moss Physcomitrella patens". Nature Plants. 2 (12): 16179. doi:10.1038/nplants.2016.179. PMC   5131878 . PMID   27892923.