Predictive genomics

Last updated

Predictive genomics is at the intersection of multiple disciplines: predictive medicine, personal genomics and translational bioinformatics. Specifically, predictive genomics deals with the future phenotypic outcomes via prediction in areas such as complex multifactorial diseases in humans. [1] To date, the success of predictive genomics has been dependent on the genetic framework underlying these applications, typically explored in genome-wide association (GWA) studies. [2] The identification of associated single-nucleotide polymorphisms (variation of a DNA sequence in a population) underpin GWA studies in complex diseases that have ranged from Type 2 Diabetes (T2D), Age-related macular degeneration (AMD) and Crohn's disease.

Contents

Although the Human Genome Project has progressively improved the fidelity of sequence determination, the overbearing complexity of the genome hinders the identification of associated or ultimately causal variants. [3] In particular, there are likely to be a large number of implicated genetic loci which exhibit small marginal effects. [4]

Objectives

A number of short- and long-term goals exist for predictive genomics. The identification of associated variants underpin all other downstream endeavors that point toward better data-cum-knowledge outcomes. In particular, those outcomes that facilitate clinical improvement and individualised healthcare further lead to actionable measures in diagnosis, prognosis and prevention.

Objectives as a hierarchy for predictive genomics. Predictive Genomics Objectives.png
Objectives as a hierarchy for predictive genomics.

Identify associated variants to disease

Whilst the single-gene, single-disease hypothesis holds for Mendelian disorders such as Huntington's disease and cystic fibrosis, complex diseases and traits are affected by a number of gene loci and genetic variants with varying risk. [1] A precursor to the development of preventative, prognostic and diagnostic tools in these diseases requires mapping genetic loci in disease etiology and discovering causal mutations. [5] Creating a ‘genomic profile’ of individuals with the number of variants at the genome-wide level facilitates not only the prediction of disease prior to onset, but also serves as a primer to increasing the knowledge of causal variants. [6]

The foremost difficulty in achieving this goal is to understand the functionality of these variants with respect to areas of physiological and molecular importance in conjunction with phenotype. [4] If associated variants are mapped to sequences with unknown function, then this restricts the ability for specific targeting in areas of interest. Therefore, the ability for predictive genomics to succeed also depends upon other related areas such as the functional annotation of the genome (ENCODE). [7]

Translation: research to clinical

The identification of causal variants, genes and pathways leads to opportunities that bridge the divide between research and clinical usage. If successful, the subsequent discovery of therapeutic targets within implicated biological pathways have consequences for both treatment and prevention. [8] Furthermore, the downstream effect of identifying disease-relevant biomarkers allow for improvements to monitoring disease progression and response-to-treatment, where the implementation of these results into clinical decision support systems (CDSS) facilitate personalised medicine and outcomes. [9] Even if only marginally effectual, the repeated replication of associated variants can offer significant translational value. [7]

Individualising healthcare

The significance of translation from research to clinical usage relates to use of the complete knowledge of an individual to develop personalised approaches to disease management. The caveat with this is that there have been difficulties in both prediction and inference for complex diseases. [10] Therefore, unless individuals have an overwhelming high or low number of risk alleles, there is a limit to the predictive accuracy of their ‘genomic profiles’. However, preliminary examples of predictive genomics for personalising healthcare include: using an individual's gene expression data to monitor progress to treatment, [11] or using the genomic profile of the P450 drug metabolising system of individuals to assist dosage and selection. [12]

Applications in complex human diseases

In the table below is a performance comparison of diseases selected on disease frequency and known heritability estimates, with use of single-nucleotide polymorphism (SNP) based models reflecting known genetic factors for a European population (subject to change as more associations are discovered). [13] denotes lifetime morbid risk, denotes heritability of liability, denotes area under the ROC curve.

Disease
Coronary artery disease 0.4020.490.584
Type 2 diabetes 0.3390.300.592
Atrial fibrillation 0.2450.620.593
Stroke 0.1900.170.528
Prostate cancer 0.1650.420.614
Alzheimer disease 0.1320.790.648
Breast cancer 0.1230.250.586
Lung cancer 0.0690.080.525
Bipolar disorder 0.0510.600.550
Colorectal cancer 0.0510.130.564
Age-related macular degeneration 0.0470.710.758
Bladder cancer 0.0240.080.577
Multiple sclerosis 0.0200.510.622
Melanoma 0.0200.210.640
Type 1 diabetes 0.0180.870.638
Parkinson disease 0.0160.270.592
Pancreatic cancer 0.0150.360.557
Ovarian cancer 0.0140.220.548
Thyroid cancer 0.0100.530.614
Ulcerative colitis 0.0090.530.666
Schizophrenia 0.0070.660.540
Celiac disease 0.0070.750.733
Crohn's disease 0.0050.560.717

In the applications of predictive genomics below, these complex diseases either lack or are lacking reliable diagnostics for disease. Given the medical consequences of these diseases, the economic impact is also significant. However, none of the use cases below has been translated into the clinic.

Age-related macular degeneration (AMD) is one of the flagship complex diseases from the genomic revolution with over 19 associated genetic loci replicated in GWA studies. [14] In particular, the first significant genetic risk variant was identified in the complement factor H(CFH) gene in 2005 motivating the search for more genetic variants in the disease. Over the past decade, a number of models have been proposed to assess individual risk to AMD. [15] The genetic predisposition of AMD risk varies from 45% to 71% where highly effectual odds ratios (OR) have been reported (greater than 2.0 per allele in some cases). [14] In 2013, a comprehensive case-control GWA study with approximately 77,000 observations involving 18 international research groups from the International AMD Genetics Consortium implicated 19 gene loci and 9 biological pathways including the regulation of complement, lipid metabolism and angiogenic activity. The predictive performance of the full model including all 19 loci exhibited 0.74 AUC - according to Jakobsdottir et al., 0.75 AUC is sufficient to distinguish between extreme cases and controls. [16] In particular, of the 19 associated gene loci, there were 7 newly discovered loci, which the authors point to as additional entry points into AMD etiology and drug targets. [14]

Type 2 diabetes

Type 2 diabetes (T2D), an extremely common metabolic disorder, has demonstrated interplay between many environmental and genetic risk factors leading to disease onset. [17] A number of risk assessment models incorporating a number of demographic, environmental and clinical risk factors are already shown to elicit reasonable discrimination in case-control studies; it has been proposed that identifying genetic variants that contribute to T2D as for standalone prediction or in conjunction with current risk models can improve prediction of T2D risk, if current models lack sufficient coverage of the full effect of an individual's genotype. [18] Approximately 20 associated SNPs have been replicated in T2D; however, their effect sizes do not seem to be substantial: OR 1.37 for SNPs in the TCF7L2 gene purported to give highest genetic risk. [19]

In 2009, a study was conducted on the WTCCC (GWA study involving 7 cohorts with 7 diseases: including bipolar disorder, Crohn's disease, hypertension, rheumatoid arthritis, Type I Diabetes (T1D) and Type II Diabetes (T2D)). [20] With particular attention to T2D, Evans et al. were able to discern a marginal increase in AUC (+0.04) based on genome-wide information with respect to known susceptible variants. [20] However, non-genetic based tests such as the Cambridge and Framingham offspring risk scores have been purported to perform better than genetic-risk models with 20 loci. [18] Moreover, the addition of genetic risk with these phenotypical models did not produce statistically significant AUC results.

Celiac disease

Celiac disease (CD) is a complex immune disorder that has been found to have strong genetic links in disease. In particular, human leukocyte antigen (HLA) genes are strongly implicated in CD development and HLA testing is undertaken in clinical practice. However, although there are serological and histological tests available for CD, these clinical screenings have been found to generate false positives. [21] In 2014, Abraham et al. used a genomic risk score (GRS) generated over 6 cohorts with an AUC of 0.86 to 0.90. [21]

Limitations

For predictive genomics to address their objectives, there must be an improvement in the accuracy of prediction through added methods or improvements to current techniques and to demonstrate that there is bonafide improvement in patient outcomes. Currently, although AUC (Area Under the ROC) is the de facto metric in comparing and evaluating the performance of predictive models, there is no consensus as to what kind of score is sufficient for clinical use. Jakobsdottir et al. states that 0.75 AUC is sufficient for discriminating between clear cases and controls; however, this is still arbitrary. [16] The positive predictive value (PPV) must be high enough to avoid a higher prevalence of false-positives.

Variants in prediction: SNPs and alternatives

SNPs identified in GWA studies are considered to give better predictive performance if they have high effect sizes of Odds Ratios (OR). A case study involving 5 use cases of genomic prediction demonstrate that SNPs with extremely small p-values, and by implication extreme OR do not give extreme differences in discrimination. [16] They point out that use of significantly associated genetic variants does not necessarily lead to better classification. Alternatively, CNV (copy number variants) have been proposed to usurp SNPs as better candidates for prediction with BMI stratified across different ethnicities demonstrating better, although marginal, improvement of CNVs over SNPs for prediction. [22] Furthermore, a comparison of over 10 complex disorders in prediction with respect to family history and SNPs for prediction did not suggest better discrimination with SNPs. [13]

Interacting variants: higher order analysis

Currently, the prevailing standard of risk models focus on univariate analysis rather than focusing upon interactions of higher order. Therefore, although typical GWA studies are able to detect a number of statistically significant loci, they have not been sufficient to fully explain the estimates of theoretical genetic heritability. [10] It has been demonstrated by Goudey et al. that both 2-way and 3-way interactions between SNPs are able to explain trait variance relative to single SNPs. [23] Goudey et al. also states that the barrier to expansion of higher order interactions has been limited by the intractability of exhaustive search techniques (see NP-complete).

Population: size and scope

The issues surrounding sample size and number of variants become exacerbated particularly when GWA studies consider variants of volume in the order of millions. Therefore, due to the current constraints in the curse of dimensionality, prior screening methods that decrease the number of loci to below the number of observations may be used before modelling disease risk. [24] Hayes et al. states that population size must be >100,000 in order to achieve high accuracy under their model assumptions; the exception is the case where there is a small effective population size. [25] Furthermore, ethnic specific GWA studies show that each group has varied detectability of variants in terms of: frequency, linkage disequilibrium  – the co-inheritance of SNPs through generations – and the actual loci themselves. [26]

Other applications

Predictive genomics has not been constrained to prediction of complex diseases. For instance, Hayes et al. uses genomic prediction for livestock, crop and forage species selection, where predicted results are currently in use. [25] Furthermore, Kambouris et al. discusses the use of ‘genomic profiles’ for the performance of elite athletes noting individualised and personalised training regimens for both dietary and physical aspects. [27] Additionally, Kayser et al. point to DNA profiling in forensics as a beneficiary of the genomic revolution. [28] Functionally validating novel genetic findings is crucial in rare disease. However, the analysis of individual genetic variants often requires several years of work. Variants that are most likely to occur and present as disease-causing can be predicted; distinct from and supplementary to pathogenicity prediction. This application guides research to test the effect of top candidate variants in preparation for novel disease cases. [29]

See also

Related Research Articles

<span class="mw-page-title-main">Single-nucleotide polymorphism</span> Single nucleotide in genomic DNA at which different sequence alternatives exist

In genetics, a single-nucleotide polymorphism is a germline substitution of a single nucleotide at a specific position in the genome and is present in a sufficiently large fraction of the population. Single nucleotide substitutions with an allele frequency of less than 1% are called "single-nucleotide variants", not SNPs.

In molecular biology, SNP array is a type of DNA microarray which is used to detect polymorphisms within a population. A single nucleotide polymorphism (SNP), a variation at a single site in DNA, is the most frequent type of variation in the genome. Around 335 million SNPs have been identified in the human genome, 15 million of which are present at frequencies of 1% or higher across different populations worldwide.

A tag SNP is a representative single nucleotide polymorphism (SNP) in a region of the genome with high linkage disequilibrium that represents a group of SNPs called a haplotype. It is possible to identify genetic variation and association to phenotypes without genotyping every SNP in a chromosomal region. This reduces the expense and time of mapping genome areas associated with disease, since it eliminates the need to study every individual SNP. Tag SNPs are useful in whole-genome SNP association studies in which hundreds of thousands of SNPs across the entire genome are genotyped.

The common disease-common variant hypothesis predicts that common disease-causing alleles, or variants, will be found in all human populations which manifest a given disease. Common variants are known to exist in coding and regulatory sequences of genes. According to the CD-CV hypothesis, some of those variants lead to susceptibility to complex polygenic diseases. Each variant at each gene influencing a complex disease will have a small additive or multiplicative effect on the disease phenotype. These diseases, or traits, are evolutionarily neutral in part because so many genes influence the traits. The hypothesis has held in the case of putative causal variants in apolipoprotein E, including APOE ε4, associated with Alzheimer's disease. IL23R has been found to be associated with Crohn's disease; the at-risk allele has a frequency 93% in the general population.

<span class="mw-page-title-main">Genome-wide association study</span> Study of genetic variants in different individuals

In genomics, a genome-wide association study, is an observational study of a genome-wide set of genetic variants in different individuals to see if any variant is associated with a trait. GWA studies typically focus on associations between single-nucleotide polymorphisms (SNPs) and traits like major human diseases, but can equally be applied to any other genetic variants and any other organisms.

In multivariate quantitative genetics, a genetic correlation is the proportion of variance that two traits share due to genetic causes, the correlation between the genetic influences on a trait and the genetic influences on a different trait estimating the degree of pleiotropy or causal overlap. A genetic correlation of 0 implies that the genetic effects on one trait are independent of the other, while a correlation of 1 implies that all of the genetic influences on the two traits are identical. The bivariate genetic correlation can be generalized to inferring genetic latent variable factors across > 2 traits using factor analysis. Genetic correlation models were introduced into behavioral genetics in the 1970s–1980s.

Expression quantitative trait loci (eQTLs) are genomic loci that explain variation in expression levels of mRNAs.

In genetics, association mapping, also known as "linkage disequilibrium mapping", is a method of mapping quantitative trait loci (QTLs) that takes advantage of historic linkage disequilibrium to link phenotypes to genotypes, uncovering genetic associations.

<span class="mw-page-title-main">Exome sequencing</span> Sequencing of all the exons of a genome

Exome sequencing, also known as whole exome sequencing (WES), is a genomic technique for sequencing all of the protein-coding regions of genes in a genome. It consists of two steps: the first step is to select only the subset of DNA that encodes proteins. These regions are known as exons—humans have about 180,000 exons, constituting about 1% of the human genome, or approximately 30 million base pairs. The second step is to sequence the exonic DNA using any high-throughput DNA sequencing technology.

The missing heritability problem is the fact that single genetic variations cannot account for much of the heritability of diseases, behaviors, and other phenotypes. This is a problem that has significant implications for medicine, since a person's susceptibility to disease may depend more on the combined effect of all the genes in the background than on the disease genes in the foreground, or the role of genes may have been severely overestimated.

Single nucleotide polymorphism annotation is the process of predicting the effect or function of an individual SNP using SNP annotation tools. In SNP annotation the biological information is extracted, collected and displayed in a clear form amenable to query. SNP functional annotation is typically performed based on the available information on nucleic acid and protein sequences.

<span class="mw-page-title-main">Michael Goddard</span>

Michael Edward "Mike" Goddard is a professorial fellow in animal genetics at the University of Melbourne, Australia.

<span class="mw-page-title-main">Polygenic score</span> Numerical score aimed at predicting a trait based on variation in multiple genetic loci

In genetics, a polygenic score (PGS), also called a polygenic index (PGI), polygenic risk score (PRS), genetic risk score, or genome-wide score, is a number that summarizes the estimated effect of many genetic variants on an individual's phenotype, typically calculated as a weighted sum of trait-associated alleles. It reflects an individual's estimated genetic predisposition for a given trait and can be used as a predictor for that trait. In other words, it gives an estimate of how likely an individual is to have a given trait only based on genetics, without taking environmental factors into account. Polygenic scores are widely used in animal breeding and plant breeding due to their efficacy in improving livestock breeding and crops. In humans, polygenic scores are typically generated from genome-wide association study (GWAS) data.

Gene Relationships Across Implicated Loci(GRAIL) is a free web application developed by Soumya Raychaudhuri at the Broad Institute with the goal of determining the relationships among genes in different disease associated loci through statistical analysis.

Naomi Ruth Wray is an Australian statistical geneticist at the University of Queensland, where she is a Professorial Research Fellow at the Institute for Molecular Bioscience and an Affiliate Professor in the Queensland Brain Institute. She is also a National Health and Medical Research Council (NHMRC) Principal Research Fellow and, along with Peter Visscher and Jian Yang, is one of the three executive team members of the NHMRC-funded Program in Complex Trait Genomics.

<span class="mw-page-title-main">ANNOVAR</span> Bioinformatics software

ANNOVAR is a bioinformatics software tool for the interpretation and prioritization of single nucleotide variants (SNVs), insertions, deletions, and copy number variants (CNVs) of a given genome. It has the ability to annotate human genomes hg18, hg19, hg38, and model organisms genomes such as: mouse, zebrafish, fruit fly, roundworm, yeast and many others. The annotations could be used to determine the functional consequences of the mutations on the genes and organisms, infer cytogenetic bands, report functional importance scores, and/or find variants in conserved regions. ANNOVAR along with SNP effect (SnpEFF) and Variant Effect Predictor (VEP) are three of the most commonly used variant annotation tools.

The GWAS catalog is a free online database that compiles data of genome-wide association studies (GWAS), summarizing unstructured data from different literature sources into accessible high quality data. It was created by the National Human Genome Research Institute (NHGRI) in 2008 and have become a collaborative project between the NHGRI and the European Bioinformatics Institute (EBI) since 2010. As of September 2018, it has included 71,673 SNP–trait associations in 3,567 publications.

Impute.me. was an open-source non-profit web application that allowed members of the public to use their data from direct-to-consumer (DTC) genetic tests to calculate polygenic risk scores (PRS) for complex diseases and cognitive and personality traits. In July 2022, Lasse Folkerson, initiator and operator of impute.me, took the website offline.

<span class="mw-page-title-main">NAALADL2</span>

N-Acetylated Alpha-Linked Acidic Dipeptidase Like 2 (NAALADL2) is a protein, encoded by the gene NAALADL2 in humans. NAALADL2 shares 25%–26% sequence identity and 45% sequence similarity with the glutamate carboxypeptidase II family which includes prostate cancer marker PSMA (FOLH1/NAALAD1). The NAALADL2 gene is a giant gene spanning 1.37 Mb which is approximately 49 times larger than the average gene size of 28 kb. Gene length is correlated with the number of transcript variants of a gene, as such, NAALADL2 undergoes extensive alternative splicing and has 12 splice variants as defined by Ensembl.

<span class="mw-page-title-main">Andre Franke</span> German geneticist

Andre Franke, born on 16 October 1978, is a geneticist, academic, and university professor. He is a Full W3 Professor of Molecular Medicine at the Christian-Albrechts-University of Kiel, and a managing director at the Institute of Clinical Molecular Biology.

References

  1. 1 2 Wray, N. R.; Goddard, M. E.; Visscher, P. M. (2007). "Prediction of individual genetic risk to disease from genome-wide association studies". Genome Research . 17 (10): 1520–8. doi:10.1101/gr.6665407. PMC   1987352 . PMID   17785532.
  2. Hirschhorn, J. N.; Daly, M. J. (2005). "Genome-wide association studies for common diseases and complex traits". Nature Reviews Genetics. 6 (2): 95–108. doi:10.1038/nrg1521. PMID   15716906. S2CID   2813666.
  3. Janssens, A. C. J. W.; Van Duijn, C. M. (2008). "Genome-based prediction of common diseases: Advances and prospects". Human Molecular Genetics. 17 (R2): R166–73. doi: 10.1093/hmg/ddn250 . PMID   18852206.
  4. 1 2 Hindorff, L. A.; Sethupathy, P.; Junkins, H. A.; Ramos, E. M.; Mehta, J. P.; Collins, F. S.; Manolio, T. A. (2009). "Potential etiologic and functional implications of genome-wide association loci for human diseases and traits". Proceedings of the National Academy of Sciences. 106 (23): 9362–7. Bibcode:2009PNAS..106.9362H. doi: 10.1073/pnas.0903103106 . PMC   2687147 . PMID   19474294.
  5. Wray, N. R.; Goddard, M. E.; Visscher, P. M. (2008). "Prediction of individual genetic risk of complex disease". Current Opinion in Genetics & Development. 18 (3): 257–263. doi:10.1016/j.gde.2008.07.006. PMID   18682292.
  6. Ginsburg, G. S.; Willard, H. F. (2009). "Genomic and personalized medicine: Foundations and applications". Translational Research. 154 (6): 277–87. doi:10.1016/j.trsl.2009.09.005. PMID   19931193.
  7. 1 2 McCarthy, M. I.; Abecasis, G. A. R.; Cardon, L. R.; Goldstein, D. B.; Little, J.; Ioannidis, J. P. A.; Hirschhorn, J. N. (2008). "Genome-wide association studies for complex traits: Consensus, uncertainty and challenges". Nature Reviews Genetics. 9 (5): 356–69. doi:10.1038/nrg2344. PMID   18398418. S2CID   15032294.
  8. Swanton, C.; Larkin, J. M.; Gerlinger, M.; Eklund, A. C.; Howell, M.; Stamp, G.; Downward, J.; Gore, M.; Futreal, P. A.; Escudier, B.; Andre, F.; Albiges, L.; Beuselinck, B.; Oudard, S.; Hoffmann, J.; Gyorffy, B. Z.; Torrance, C. J.; Boehme, K. A.; Volkmer, H.; Toschi, L.; Nicke, B.; Beck, M.; Szallasi, Z. (2010). "Predictive biomarker discovery through the parallel integration of clinical trial and functional genomics datasets". Genome Medicine. 2 (8): 53. doi:10.1186/gm174. PMC   2945010 . PMID   20701793.
  9. Kawamoto, K.; Lobach, D. F.; Willard, H. F.; Ginsburg, G. S. (2009). "A national clinical decision support infrastructure to enable the widespread and consistent practice of genomic and personalized medicine". BMC Medical Informatics and Decision Making. 9: 17. doi:10.1186/1472-6947-9-17. PMC   2666673 . PMID   19309514.
  10. 1 2 Manolio, T. A.; Collins, F. S.; Cox, N. J.; Goldstein, D. B.; Hindorff, L. A.; Hunter, D. J.; McCarthy, M. I.; Ramos, E. M.; Cardon, L. R.; Chakravarti, A.; Cho, J. H.; Guttmacher, A. E.; Kong, A.; Kruglyak, L.; Mardis, E.; Rotimi, C. N.; Slatkin, M.; Valle, D.; Whittemore, A. S.; Boehnke, M.; Clark, A. G.; Eichler, E. E.; Gibson, G.; Haines, J. L.; MacKay, T. F. C.; McCarroll, S. A.; Visscher, P. M. (2009). "Finding the missing heritability of complex diseases". Nature. 461 (7265): 747–753. Bibcode:2009Natur.461..747M. doi:10.1038/nature08494. PMC   2831613 . PMID   19812666.
  11. Gunther, E. C.; Stone, D. J.; Gerwien, R. W.; Bento, P; Heyes, M. P. (2003). "Prediction of clinical drug efficacy by classification of drug-induced genomic expression profiles in vitro". Proceedings of the National Academy of Sciences. 100 (16): 9608–13. Bibcode:2003PNAS..100.9608G. doi: 10.1073/pnas.1632587100 . PMC   170965 . PMID   12869696.
  12. De Leon, J; Susce, M. T.; Murray-Carmichael, E (2006). "The Ampli Chip CYP450 genotyping test: Integrating a new clinical tool". Molecular Diagnosis & Therapy. 10 (3): 135–51. doi:10.1007/bf03256453. PMID   16771600. S2CID   27626247.
  13. 1 2 Do, C. B.; Hinds, D. A.; Francke, U.; Eriksson, N. (2012). "Comparison of Family History and SNPs for Predicting Risk of Complex Disease". PLOS Genetics. 8 (10): e1002973. doi:10.1371/journal.pgen.1002973. PMC   3469463 . PMID   23071447.
  14. 1 2 3 Fritsche, L. G.; Chen, W.; Schu, M.; Yaspan, B. L.; Yu, Y.; Thorleifsson, G.; Zack, D. J.; Arakawa, S.; Cipriani, V.; Ripke, S.; Igo, R. P.; Buitendijk, G. L. H. S.; Sim, X.; Weeks, D. E.; Guymer, R. H.; Merriam, J. E.; Francis, P. J.; Hannum, G.; Agarwal, A.; Armbrecht, A. M.; Audo, I.; Aung, T.; Barile, G. R.; Benchaboune, M.; Bird, A. C.; Bishop, P. N.; Branham, K. E.; Brooks, M.; Brucker, A. J.; et al. (2013). "Seven new loci associated with age-related macular degeneration". Nature Genetics. 45 (4): 433–9, 439e1–2. doi:10.1038/ng.2578. PMC   3739472 . PMID   23455636.
  15. Grassmann, F.; Heid, I. M.; Weber, B. H. F. (2014). "Genetic Risk Models in Age-Related Macular Degeneration". Retinal Degenerative Diseases. Advances in Experimental Medicine and Biology. Vol. 801. pp. 291–300. doi:10.1007/978-1-4614-3209-8_37. ISBN   978-1-4614-3208-1. PMID   24664710.
  16. 1 2 3 Jakobsdottir, J.; Gorin, M. B.; Conley, Y. P.; Ferrell, R. E.; Weeks, D. E. (2009). "Interpretation of Genetic Association Studies: Markers with Replicated Highly Significant Odds Ratios May Be Poor Classifiers". PLOS Genetics. 5 (2): e1000337. doi:10.1371/journal.pgen.1000337. PMC   2629574 . PMID   19197355.
  17. Lyssenko, V.; Almgren, P.; Anevski, D.; Orho-Melander, M.; Sjögren, M.; Saloranta, C.; Tuomi, T.; Groop, L. (2005). "Genetic Prediction of Future Type 2 Diabetes". PLOS Medicine. 2 (12): e345. doi:10.1371/journal.pmed.0020345. PMC   1274281 . PMID   17570749.
  18. 1 2 Talmud, P. J.; Hingorani, A. D.; Cooper, J. A.; Marmot, M. G.; Brunner, E. J.; Kumari, M.; Kivimaki, M.; Humphries, S. E. (2010). "Utility of genetic and non-genetic risk factors in prediction of type 2 diabetes: Whitehall II prospective cohort study". BMJ. 340: b4838. doi:10.1136/bmj.b4838. PMC   2806945 . PMID   20075150.
  19. Meigs, J. B.; Shrader, P.; Sullivan, L. M.; McAteer, J. B.; Fox, C. S.; Dupuis, J. E.; Manning, A. K.; Florez, J. C.; Wilson, P. W. F.; d'Agostino, R. B.; Cupples, L. A. (2008). "Genotype Score in Addition to Common Risk Factors for Prediction of Type 2 Diabetes". New England Journal of Medicine. 359 (21): 2208–19. doi:10.1056/NEJMoa0804742. PMC   2746946 . PMID   19020323.
  20. 1 2 Evans, D. M.; Visscher, P. M.; Wray, N. R. (2009). "Harnessing the information contained within genome-wide association studies to improve individual prediction of complex disease risk". Human Molecular Genetics. 18 (18): 3525–31. doi: 10.1093/hmg/ddp295 . PMID   19553258.
  21. 1 2 Abraham, G.; Tye-Din, J. A.; Bhalala, O. G.; Kowalczyk, A.; Zobel, J.; Inouye, M. (2014). "Accurate and Robust Genomic Prediction of Celiac Disease Using Statistical Learning". PLOS Genetics. 10 (2): e1004137. doi:10.1371/journal.pgen.1004137. PMC   3923679 . PMID   24550740.
  22. Peterson, R. E.; Maes, H. H.; Lin, P.; Kramer, J. R.; Hesselbrock, V. M.; Bauer, L. O.; Nurnberger, J. I.; Edenberg, H. J.; Dick, D. M.; Webb, B. T. (2014). "On the association of common and rare genetic variation influencing body mass index: A combined SNP and CNV analysis". BMC Genomics. 15 (1): 368. doi:10.1186/1471-2164-15-368. PMC   4035084 . PMID   24884913.
  23. Goudey, B.; Rawlinson, D.; Wang, Q.; Shi, F.; Ferra, H.; Campbell, R. M.; Stern, L.; Inouye, M. T.; Ong, C. S.; Kowalczyk, A. (2013). "GWIS - model-free, fast and exhaustive search for epistatic interactions in case-control GWAS". BMC Genomics. 14 (Suppl 3): S10. doi:10.1186/1471-2164-14-S3-S10. PMC   3665501 . PMID   23819779.
  24. Thornton-Wells, T. A.; Moore, J. H.; Haines, J. L. (2004). "Genetics, statistics and human disease: Analytical retooling for complexity". Trends in Genetics. 20 (12): 640–7. CiteSeerX   10.1.1.325.3919 . doi:10.1016/j.tig.2004.09.007. PMID   15522460.
  25. 1 2 Hayes, B. J.; Pryce, J.; Chamberlain, A. J.; Bowman, P. J.; Goddard, M. E. (2010). "Genetic Architecture of Complex Traits and Accuracy of Genomic Prediction: Coat Colour, Milk-Fat Percentage, and Type in Holstein Cattle as Contrasting Model Traits". PLOS Genetics . 6 (9): e1001139. doi:10.1371/journal.pgen.1001139. PMC   2944788 . PMID   20927186. Open Access logo PLoS transparent.svg
  26. Goddard, M. (2008). "Genomic selection: Prediction of accuracy and maximisation of long term response". Genetica. 136 (2): 245–57. doi:10.1007/s10709-008-9308-0. PMID   18704696. S2CID   1780250.
  27. Kambouris, M; Ntalouka, F; Ziogas, G; Maffulli, N (2012). "Predictive genomics DNA profiling for athletic performance". Recent Patents on DNA & Gene Sequences. 6 (3): 229–39. doi:10.2174/187221512802717321. PMID   22827597.
  28. Kayser, M.; De Knijff, P. (2011). "Improving human forensics through advances in genetics, genomics and molecular biology". Nature Reviews Genetics. 12 (3): 179–92. doi:10.1038/nrg2952. PMID   21331090. S2CID   6448781.
  29. Lawless, D; Lango Allen, H; Thaventhiran, J; NIHR BioResource–Rare Diseases Consortium; Hodel, F; Anwar, R; Fellay, J; Walter, J; Savic, S (2019). "Predicting the Occurrence of Variants in RAG1 and RAG2". Journal of Clinical Immunology. 39 (7): 688–701. doi: 10.1007/s10875-019-00670-z . PMC   6754361 . PMID   31388879.