Linkage disequilibrium score regression

Last updated

In statistical genetics, linkage disequilibrium score regression (LDSR [1] or LDSC [2] ) is a technique that aims to quantify the separate contributions of polygenic effects and various confounding factors, such as population stratification, based on summary statistics from genome-wide association studies (GWASs). The approach involves using regression analysis to examine the relationship between linkage disequilibrium scores and the test statistics of the single-nucleotide polymorphisms (SNPs) from the GWAS. Here, the "linkage disequilibrium score" for a SNP "is the sum of LD r2 measured with all other SNPs". [3]

LDSC can be used to produce SNP-based heritability estimates, to partition this heritability into separate categories, and to calculate genetic correlations between separate phenotypes. Because the LDSC approach relies only on summary statistics from an entire GWAS, it can be used efficiently even with very large sample sizes. [4] In LDSC, genetic correlations are calculated based on the deviation between chi-square statistics and what would be expected assuming the null hypothesis. [1]

Extensions

LDSC can also be applied across traits to estimate genetic correlations. This extension of LDSC, known as cross-trait LD score regression, has the advantage of not being biased if used on overlapping samples. [5] There is also another extension of LDSC, known as stratified LD score regression (abbreviated SLDSR), [6] that aims to partition heritability by functional annotation by taking into account genetic linkage between markers. [7] [8]

Related Research Articles

Heritability Estimation of effect of genetic variation on phenotypic variation of a trait

Heritability is a statistic used in the fields of breeding and genetics that estimates the degree of variation in a phenotypic trait in a population that is due to genetic variation between individuals in that population. It measures how much of the variation of a trait can be attributed to variation of genetic factors, as opposed to variation of environmental factors. The concept of heritability can be expressed in the form of the following question: "What is the proportion of the variation in a given trait within a population that is not explained by the environment or random chance?"

A quantitative trait locus (QTL) is a locus that correlates with variation of a quantitative trait in the phenotype of a population of organisms. QTLs are mapped by identifying which molecular markers correlate with an observed trait. This is often an early step in identifying and sequencing the actual genes that cause the trait variation.

The candidate gene approach to conducting genetic association studies focuses on associations between genetic variation within pre-specified genes of interest, and phenotypes or disease states. This is in contrast to genome-wide association studies (GWAS), which scan the entire genome for common genetic variation. Candidate genes are most often selected for study based on a priori knowledge of the gene's biological functional impact on the trait or disease in question. The rationale behind focusing on allelic variation in specific, biologically relevant regions of the genome is that certain mutations will directly impact the function of the gene in question, and lead to the phenotype or disease state being investigated. This approach usually uses the case-control study design to try to answer the question, "Is one allele of a candidate gene more frequently seen in subjects with the disease than in subjects without the disease?" Candidate genes hypothesized to be associated with complex traits have generally not been replicated by subsequent GWASs. The failure of candidate gene studies to shed light on the specific genes underlying such traits has been ascribed to insufficient statistical power.

A tag SNP is a representative single nucleotide polymorphism (SNP) in a region of the genome with high linkage disequilibrium that represents a group of SNPs called a haplotype. It is possible to identify genetic variation and association to phenotypes without genotyping every SNP in a chromosomal region. This reduces the expense and time of mapping genome areas associated with disease, since it eliminates the need to study every individual SNP. Tag SNPs are useful in whole-genome SNP association studies in which hundreds of thousands of SNPs across the entire genome are genotyped.

Genome-wide association study Study to research genome-wide set of genetic variants in different individuals to see if any variant is associated with a trait.

In genetics, a genome-wide association study, also known as whole genome association study, is an observational study of a genome-wide set of genetic variants in different individuals to see if any variant is associated with a trait. GWA studies typically focus on associations between single-nucleotide polymorphisms (SNPs) and traits like major human diseases, but can equally be applied to any other genetic variants and any other organisms.

In multivariate quantitative genetics, a genetic correlation is the proportion of variance that two traits share due to genetic causes, the correlation between the genetic influences on a trait and the genetic influences on a different trait estimating the degree of pleiotropy or causal overlap. A genetic correlation of 0 implies that the genetic effects on one trait are independent of the other, while a correlation of 1 implies that all of the genetic influences on the two traits are identical. The bivariate genetic correlation can be generalized to inferring genetic latent variable factors across > 2 traits using factor analysis. Genetic correlation models were introduced into behavioral genetics in the 1970s–1980s.

Behavioural genetics, also referred to as behaviour genetics, is a field of scientific research that uses genetic methods to investigate the nature and origins of individual differences in behaviour. While the name "behavioural genetics" connotes a focus on genetic influences, the field broadly investigates genetic and environmental influences, using research designs that allow removal of the confounding of genes and environment. Behavioural genetics was founded as a scientific discipline by Francis Galton in the late 19th century, only to be discredited through association with eugenics movements before and during World War II. In the latter half of the 20th century, the field saw renewed prominence with research on inheritance of behaviour and mental illness in humans, as well as research on genetically informative model organisms through selective breeding and crosses. In the late 20th and early 21st centuries, technological advances in molecular genetics made it possible to measure and modify the genome directly. This led to major advances in model organism research and in human studies, leading to new scientific discoveries.

WGAViewer is a bioinformatics software tool which is designed to visualize, annotate, and help interpret the results generated from a genome wide association study (GWAS). Alongside the P values of association, WGAViewer allows a researcher to visualize and consider other supporting evidence, such as the genomic context of the SNP, linkage disequilibrium (LD) with ungenotyped SNPs, gene expression database, and the evidence from other GWAS projects, when determining the potential importance of an individual SNP.

In genetics, association mapping, also known as "linkage disequilibrium mapping", is a method of mapping quantitative trait loci (QTLs) that takes advantage of historic linkage disequilibrium to link phenotypes to genotypes, uncovering genetic associations.

The "missing heritability" problem is the fact that single genetic variations cannot account for much of the heritability of diseases, behaviors, and other phenotypes. This is a problem that has significant implications for medicine, since a person's susceptibility to disease may depend more on 'the combined effect of all the genes in the background than on the disease genes in the foreground', or the role of genes may have been severely overestimated.

Predictive genomics is at the intersection of multiple disciplines: predictive medicine, personal genomics and translational bioinformatics. Specifically, predictive genomics deals with the future phenotypic outcomes via prediction in areas such as complex multifactorial diseases in humans. To date, the success of predictive genomics has been dependent on the genetic framework underlying these applications, typically explored in genome-wide association (GWA) studies. The identification of associated single-nucleotide polymorphisms underpin GWA studies in complex diseases that have ranged from Type 2 Diabetes (T2D), Age-related macular degeneration (AMD) and Crohn's disease.

Genome-wide complex trait analysis (GCTA) Genome-based restricted maximum likelihood (GREML) is a statistical method for variance component estimation in genetics which quantifies the total narrow-sense (additive) contribution to a trait's heritability of a particular subset of genetic variants. This is done by directly quantifying the chance genetic similarity of unrelated individuals and comparing it to their measured similarity on a trait; if two unrelated individuals are relatively similar genetically and also have similar trait measurements, then the measured genetics are likely to causally influence that trait, and the correlation can to some degree tell how much. This can be illustrated by plotting the squared pairwise trait differences between individuals against their estimated degree of relatedness. The GCTA framework can be applied in a variety of settings. For example, it can be used to examine changes in heritability over aging and development. It can also be extended to analyse bivariate genetic correlations between traits. There is an ongoing debate about whether GCTA generates reliable or stable estimates of heritability when used on current SNP data. The method is based on the outdated and false dichotomy of genes versus the environment. It also suffers from serious methodological weaknesses, such as susceptibility to population stratification.

Polygenic score Numerical score aimed at predicting a trait based on variation in multiple genetic loci

In genetics, a polygenic score, also called a polygenic risk score (PRS), genetic risk score, or genome-wide score, is a number that summarises the estimated effect of many genetic variants on an individual's phenotype, typically calculated as a weighted sum of trait-associated alleles. It reflects an individuals estimated genetic predisposition for a given trait and can be used as a predictor for that trait. Polygenic scores are widely used in animal breeding and plant breeding due to their efficacy in improving livestock breeding and crops. They are also increasingly being used for risk prediction in humans for complex diseases which are typically affected by many genetic variants that each confer a small effect on overall risk.

Polygenic adaptation describes a process in which a population adapts through small changes in allele frequencies at hundreds or thousands of loci.

Complex traits

Complex traits, also known as quantitative traits, are traits that do not behave according to simple Mendelian inheritance laws. More specifically, their inheritance cannot be explained by the genetic segregation of a single gene. Such traits show a continuous range of variation and are influenced by both environmental and genetic factors. Compared to strictly Mendelian traits, complex traits are far more common, and because they can be hugely polygenic, they are studied using statistical techniques such as QTL mapping rather than classical genetics methods. Examples of complex traits include height, circadian rhythms, enzyme kinetics, and many diseases including diabetes and Parkinson's disease. One major goal of genetic research today is to better understand the molecular mechanisms through which genetic variants act to influence complex traits.

Benjamin Michael Neale is a statistical geneticist with a specialty in psychiatric genetics. He is an institute member at the Broad Institute as well as an associate professor at both Harvard Medical School and the Analytic and Translational Genetics Unit at Massachusetts General Hospital. Neale specializes in genome-wide association studies (GWAS). He was responsible for the data analysis of the first GWAS on attention-deficit/hyperactivity-disorder, and he developed new analysis software such as PLINK, which allows for whole-genome data to be analyzed for specific gene markers. Related to his work on GWAS, Neale is the lead of the ADHD psychiatric genetics and also a member of the Psychiatric GWAS Consortium analysis committee.

In population genetics, cryptic relatedness occurs when individuals in a genetic association study are more closely related to another than assumed by the investigators. This can act as a confounding factor in both case-control and genome-wide association studies, as well as in studies of genetic diversity. Along with population stratification, it is one of the most prominent confounding factors that can lead to inflated false positive rates in gene-association studies. It is often corrected for by including a polygenic component in the statistical model being used to detect genetic associations. Other approaches that have been developed to attempt to control for cryptic relatedness are the genomic control method and the use of extended likelihood ratio tests.

Personality traits are patterns of thoughts, feelings and behaviors that reflect the tendency to respond in certain ways under certain circumstances.

In statistical genetics, Haseman–Elston (HE) regression is a form of statistical regression originally proposed for linkage analysis of quantitative traits for sibling pairs. It was first developed by Joseph K. Haseman and Robert C. Elston in 1972. A much earlier source of sib-pair linkage implementation was, in 1935 and 1938, proposed by Lionel S. Penrose, who is father of Nobel laureate theoretical physicist Roger Penrose. In 2000, Elston et al. proposed a "revisited", extended form of Haseman–Elston regression. Since then, further extensions to the "revisited" form of HE regression have been proposed. Although HE regression "...seems a rusty weapon in the genomics analysis armory of the GWAS era. This is because the HE regression relies on relatedness measured on IBD but not identity by state (IBS)...", HE has been adapted for association analysis in unrelated samples, whose relatedness is measured in IBS.

Hilary Kiyo Finucane is an American computational biologist who is Co-Director of the Program in Medical and Population Genetics at the Broad Institute. Her group combines genetic data with molecular data to understand the origins and mechanisms of disease.

References

  1. 1 2 Levinson, Douglas F.; Noordsy, Douglas L.; Hardy, Kate V.; Ballon, Jacob S.; Shen, Hanyang; Duncan, Laramie E. (2018-10-17). "Genetic Correlation Profile of Schizophrenia Mirrors Epidemiological Results and Suggests Link Between Polygenic and Rare Variant (22q11.2) Cases of Schizophrenia". Schizophrenia Bulletin. 44 (6): 1350–1361. doi:10.1093/schbul/sbx174. ISSN   0586-7614. PMC   6192473 . PMID   29294133.
  2. Ni, Guiyan; Moser, Gerhard; Wray, Naomi R.; Lee, S. Hong; Ripke, Stephan; Neale, Benjamin M.; Corvin, Aiden; Walters, James T.R.; Farh, Kai-How (June 2018). "Estimation of Genetic Correlation via Linkage Disequilibrium Score Regression and Genomic Restricted Maximum Likelihood". The American Journal of Human Genetics. 102 (6): 1185–1194. doi:10.1016/j.ajhg.2018.03.021. ISSN   0002-9297. PMC   5993419 . PMID   29754766.
  3. Neale, Benjamin M.; Price, Alkes L.; Daly, Mark J.; Patterson, Nick; Consortium, Schizophrenia Working Group of the Psychiatric Genomics; Yang, Jian; Ripke, Stephan; Finucane, Hilary K.; Loh, Po-Ru (March 2015). "LD Score regression distinguishes confounding from polygenicity in genome-wide association studies". Nature Genetics. 47 (3): 291–295. doi:10.1038/ng.3211. ISSN   1546-1718. PMC   4495769 . PMID   25642630.
  4. Neale, Benjamin M.; Evans, David M.; Gaunt, Tom R.; Paternoster, Lavinia; Anttila, Verneri; Bulik-Sullivan, Brendan K.; Price, Alkes L.; Finucane, Hilary K.; Warrington, Nicole M. (2017-01-15). "LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis". Bioinformatics. 33 (2): 272–279. doi:10.1093/bioinformatics/btw613. hdl:2381/38771. ISSN   1367-4803. PMC   5542030 . PMID   27663502.
  5. Neale, Benjamin M.; Price, Alkes L.; Daly, Mark J.; Robinson, Elise B.; Patterson, Nick; Perry, John R. B.; Duncan, Laramie; Consortium 3, Genetic Consortium for Anorexia Nervosa of the Wellcome Trust Case Control; Consortium, Psychiatric Genomics (November 2015). "An atlas of genetic correlations across human diseases and traits". Nature Genetics. 47 (11): 1236–1241. doi:10.1038/ng.3406. ISSN   1546-1718. PMC   4797329 . PMID   26414676.
  6. Nivard, Michel G.; Boomsma, Dorret I.; Consortium, UK Brain Expression; Bartels, Meike; Abdellaoui, Abdel; Jansen, Rick; Ip, Hill F. (2018-09-01). "Characterizing the Relation Between Expression QTLs and Complex Traits: Exploring the Role of Tissue Specificity". Behavior Genetics. 48 (5): 374–385. doi:10.1007/s10519-018-9914-2. ISSN   1573-3297. PMC   6097736 . PMID   30030655.
  7. Price, Alkes L.; Neale, Benjamin M.; Patterson, Nick; Daly, Mark J.; Raychaudhuri, Soumya; Okada, Yukinori; Perry, John R. B.; Lindstrom, Sara; Stahl, Eli (November 2015). "Partitioning heritability by functional annotation using genome-wide association summary statistics". Nature Genetics. 47 (11): 1228–1235. doi:10.1038/ng.3404. ISSN   1546-1718. PMC   4626285 . PMID   26414678.
  8. Smoller, Jordan W.; Sabuncu, Mert R.; Neale, Benjamin M.; Chen, Chia-Yen; Ge, Tian (2017-04-07). "Phenome-wide heritability analysis of the UK Biobank". PLOS Genetics. 13 (4): e1006711. doi:10.1371/journal.pgen.1006711. ISSN   1553-7404. PMC   5400281 . PMID   28388634.