Genome-wide complex trait analysis

Last updated

Genome-wide complex trait analysis (GCTA) Genome-based restricted maximum likelihood (GREML) is a statistical method for heritability estimation in genetics, which quantifies the total additive contribution of a set of genetic variants to a trait. GCTA is typically applied to common single nucleotide polymorphisms (SNPs) on a genotyping array (or "chip") and thus termed "chip" or "SNP" heritability.

Contents

GCTA operates by directly quantifying the chance genetic similarity of unrelated individuals and comparing it to their measured similarity on a trait; if two unrelated individuals are relatively similar genetically and also have similar trait measurements, then the measured genetics are likely to causally influence that trait, and the correlation can to some degree tell how much. This can be illustrated by plotting the squared pairwise trait differences between individuals against their estimated degree of relatedness. [1] GCTA makes a number of modeling assumptions and whether/when these assumptions are satisfied continues to be debated.

The GCTA framework has also been extended in a number of ways: quantifying the contribution from multiple SNP categories (i.e. functional partitioning); quantifying the contribution of Gene-Environment interactions; quantifying the contribution of non-additive/non-linear effects of SNPs; and bivariate analyses of multiple phenotypes to quantify their genetic covariance (co-heritability or genetic correlation).

GCTA estimates have implications for the potential for discovery from Genome-wide Association Studies (GWAS) as well as the design and accuracy of polygenic scores. GCTA estimates from common variants are typically substantially lower than other estimates of total or narrow-sense heritability (such as from twin or kinship studies), which has contributed to the debate over the Missing heritability problem.

History

Estimation in biology/animal breeding using standard ANOVA/REML methods of variance components such as heritability, shared-environment, maternal effects etc. typically requires individuals of known relatedness such as parent/child; this is often unavailable or the pedigree data unreliable, leading to inability to apply the methods or requiring strict laboratory control of all breeding (which threatens the external validity of all estimates), and several authors have noted that relatedness could be measured directly from genetic markers (and if individuals were reasonably related, economically few markers would have to be obtained for statistical power), leading Kermit Ritland to propose in 1996 that directly measured pairwise relatedness could be compared to pairwise phenotype measurements (Ritland 1996, "A Marker-based Method for Inferences About Quantitative Inheritance in Natural Populations" Archived 2009-06-11 at the Wayback Machine [2] ).

As genome sequencing costs dropped steeply over the 2000s, acquiring enough markers on enough subjects for reliable estimates using very distantly related individuals became possible. An early application of the method to humans came with Visscher et al. 2006 [3] /2007, [4] which used SNP markers to estimate the actual relatedness of siblings and estimate heritability from the direct genetics. In humans, unlike the original animal/plant applications, relatedness is usually known with high confidence in the 'wild population', and the benefit of GCTA is connected more to avoiding assumptions of classic behavioral genetics designs and verifying their results, and partitioning heritability by SNP class and chromosomes. The first use of GCTA proper in humans was published in 2010, finding 45% of variance in human height can be explained by the included SNPs. [5] [6] (Large GWASes on height have since confirmed the estimate. [7] ) The GCTA algorithm was then described and a software implementation published in 2011. [8] It has since been used to study a wide variety of biological, medical, psychiatric, and psychological traits in humans, and inspired many variant approaches.

Benefits

Robust heritability

Twin and family studies have long been used to estimate variance explained by particular categories of genetic and environmental causes. Across a wide variety of human traits studied, there is typically minimal shared-environment influence, considerable non-shared environment influence, and a large genetic component (mostly additive), which is on average ~50% and sometimes much higher for some traits such as height or intelligence. [9] However, the twin and family studies have been criticized for their reliance on a number of assumptions that are difficult or impossible to verify, such as the equal environments assumption (that the environments of monozygotic and dizygotic twins are equally similar), that there is no misclassification of zygosity (mistaking identical for fraternal & vice versa), that twins are unrepresentative of the general population, and that there is no assortative mating. Violations of these assumptions can result in both upwards and downwards bias of the parameter estimates. [10] (This debate & criticism have particularly focused on the heritability of IQ.)

The use of SNP or whole-genome data from unrelated subject participants (with participants too related, typically >0.025 or ~fourth cousins levels of similarity, being removed, and several principal components included in the regression to avoid & control for population stratification) bypasses many heritability criticisms: twins are often entirely uninvolved, there are no questions of equal treatment, relatedness is estimated precisely, and the samples are drawn from a broad variety of subjects.

In addition to being more robust to violations of the twin study assumptions, SNP data can be easier to collect since it does not require rare twins and thus also heritability for rare traits can be estimated (with due correction for ascertainment bias).

GWAS power

GCTA estimates can be used to resolve the missing heritability problem and design GWASes which will yield genome-wide statistically-significant hits. This is done by comparing the GCTA estimate with the results of smaller GWASes. If a GWAS of n=10k using SNP data fails to turn up any hits, but the GCTA indicates a high heritability accounted for by SNPs, then that implies that a large number of variants are involved (polygenicity) and thus that much larger GWASes will be required to accurately estimate each SNP's effect and directly account for a fraction of the GCTA heritability.

Disadvantages

  1. Limited inference: GCTA estimates are inherently limited in that they cannot estimate broadsense heritability like twin/family studies as they only estimate the heritability due to SNPs. Hence, while they serve as a critical check on the unbiasedness of the twin/family studies, GCTAs cannot replace them for estimating total genetic contributions to a trait.
  2. Substantial data requirements: the number of SNPs genotyped per person should be in the thousands and ideally the hundreds of thousands for reasonable estimates of genetic similarity (although this is no longer such an issue for current commercial chips which default to hundreds of thousands or millions of markers); and the number of persons, for somewhat stable estimates of plausible SNP heritability, should be at least n>1000 and ideally n>10000. [11] In contrast, twin studies can offer precise estimates with a fraction of the sample size.
  3. Computational inefficiency: The original GCTA implementation scales poorly with increasing data size (), so even if enough data is available for precise GCTA estimates, the computational burden may be unfeasible. GCTA can be meta-analyzed as a standard precision-weighted fixed-effect meta-analysis, [12] so research groups sometimes estimate cohorts or subsets and then pool them meta-analytically (at the cost of additional complexity and some loss of precision). This has motivated the creation of faster implementations and variant algorithms which make different assumptions, such as using moment matching. [13]
  4. Need for raw data: GCTA requires genetic similarity of all subjects and thus their raw genetic information; due to privacy concerns, individual patient data is rarely shared. GCTA cannot be run on the summary statistics reported publicly by many GWAS projects, and if pooling multiple GCTA estimates, a meta-analysis must be performed.
    In contrast, there are alternative techniques which operate on summaries reported by GWASes without requiring the raw data [14] e.g. "LD score regression" [15] contrasts linkage disequilibrium statistics (available from public datasets like 1000 Genomes) with the public summary effect-sizes to infer heritability and estimate genetic correlations/overlaps of multiple traits. The Broad Institute runs LD Hub Archived 2016-05-11 at the Wayback Machine which provides a public web interface to >=177 traits with LD score regression. [16] Another method using summary data is HESS. [17]
  5. Confidence intervals may be incorrect, or outside the 0-1 range of heritability, and highly imprecise due to asymptotics. [18]
  6. Underestimation of SNP heritability: GCTA implicitly assumes all classes of SNPs, rarer or commoner, newer or older, more or less in linkage disequilibrium, have the same effects on average; in humans, rarer and newer variants tend to have larger and more negative effects [19] as they represent mutation load being purged by negative selection. As with measurement error, this will bias GCTA estimates towards underestimating heritability.

Interpretation

GCTA provides an unbiased estimate of the total variance in phenotype explained by all variants included in the relatedness matrix (and any variation correlated with those SNPs). This estimate can also be interpreted as the maximum prediction accuracy (R^2) that could be achieved from a linear predictor using all SNPs in the relatedness matrix. The latter interpretation is particularly relevant to the development of Polygenic Risk Scores, as it defines their maximum accuracy. GCTA estimates are sometimes misinterpreted as estimates of total (or narrow-sense, i.e. additive) heritability, but this is not a guarantee of the method. GCTA estimates are likewise sometimes misinterpreted as "lower bounds" on the narrow-sense heritability but this is also incorrect: first because GCTA estimates can be biased (including biased upwards) if the model assumptions are violated, and second because, by definition (and when model assumptions are met), GCTA can provide an unbiased estimate of the narrow-sense heritability if all causal variants are included in the relatedness matrix. The interpretation of the GCTA estimate in relation to the narrow-sense heritability thus depends on the variants used to construct the relatedness matrix.

Most frequently, GCTA is run with a single relatedness matrix constructed from common SNPs and will not capture (or not fully capture) the contribution of the following factors:

  1. Any rare or low-frequency variants that are not directly genotyped/imputed.
  2. Any non-linear, dominance, or epistatic genetic effects. Note that GCTA can be extended to estimate the contribution of these effects through more complex relatedness matrices.
  3. The effects of Gene-Environment interactions. Note that GCTA can be extended to estimate the contribution of GxE interactions when the E is known, by including additional variance components.
  4. Structural variants, which are typically not genotyped or imputed.
  5. Measurement error: GCTA does not model any uncertainty or error on the measured trait.

GCTA makes several model assumptions and may produce biased estimates under the following conditions:

  1. The distribution of causal variants is systematically different from the distribution of variants included in the relatedness matrix (even if all causal variants are included in the relatedness matrix). For example, if causal variants are systematically at a higher/lower frequency or in higher/lower correlation than all genotyped variants. This can produce either an upwards or downwards bias depending on the relationship between the causal variants and variants used. Various extensions to GCTA have been proposed (for example, GREML-LDMS) to account for these distributional shifts.
  2. Population stratification is not fully accounted for by covariates. GCTA (specifically GREML) accounts for stratification through the inclusion of fixed effect covariates, typically principal components. If these covariates do not fully capture the stratification the GCTA estimate will be biased, generally upwards. Accounting for recent population structure is particularly challenging for studies of rare variants.
  3. Residual genetic or environmental relatedness present in the data. GCTA assumes a homogenous population with an independent and identically distributed environmental term. This assumption is violated if related individuals and/or individuals with substantially shared environments are included in the data. In this case, the GCTA estimate will additionally capture the contribution of any genetic variation correlated with the genetic relationship: either direct genetic effects or correlated environment.
  4. The presence of "indirect" genetic effects. When genetic variants present in the relatedness matrix are correlated with variants present in other individuals that influence the participant's environment, those effects will also be captured in the GCTA estimate. For example, if variants inherited by a participant from their mother influenced their phenotype through their maternal environment, then the effect of those variants will be included in the GCTA estimate even though it is "indirect" (i.e. mediated by parental genetics). This may be interpreted as an upward bias as such "indirect" effects are not strictly causal (altering them in the participant would not lead to a change in phenotype in expectation).

Implementations

Other implementations and variant algorithms include:

See also

Related Research Articles

<span class="mw-page-title-main">Heritability</span> Estimation of effect of genetic variation on phenotypic variation of a trait

Heritability is a statistic used in the fields of breeding and genetics that estimates the degree of variation in a phenotypic trait in a population that is due to genetic variation between individuals in that population. The concept of heritability can be expressed in the form of the following question: "What is the proportion of the variation in a given trait within a population that is not explained by the environment or random chance?"

Twin studies are studies conducted on identical or fraternal twins. They aim to reveal the importance of environmental and genetic influences for traits, phenotypes, and disorders. Twin research is considered a key tool in behavioral genetics and in related fields, from biology to psychology. Twin studies are part of the broader methodology used in behavior genetics, which uses all data that are genetically informative – siblings studies, adoption studies, pedigree, etc. These studies have been used to track traits ranging from personal behavior to the presentation of severe mental illnesses such as schizophrenia.

A quantitative trait locus (QTL) is a locus that correlates with variation of a quantitative trait in the phenotype of a population of organisms. QTLs are mapped by identifying which molecular markers correlate with an observed trait. This is often an early step in identifying the actual genes that cause the trait variation.

Genetic association is when one or more genotypes within a population co-occur with a phenotypic trait more often than would be expected by chance occurrence.

In genetics, concordance is the probability that a pair of individuals will both have a certain characteristic given that one of the pair has the characteristic. Concordance can be measured with concordance rates, reflecting the odds of one person having the trait if the other does. Important clinical examples include the chance of offspring having a certain disease if the mother has it, if the father has it, or if both parents have it. Concordance among siblings is similarly of interest: what are the odds of a subsequent offspring having the disease if an older child does? In research, concordance is often discussed in the context of both members of a pair of twins. Twins are concordant when both have or both lack a given trait. The ideal example of concordance is that of identical twins, because the genome is the same, an equivalence that helps in discovering causation via deconfounding, regarding genetic effects versus epigenetic and environmental effects.

Coalescent theory is a model of how alleles sampled from a population may have originated from a common ancestor. In the simplest case, coalescent theory assumes no recombination, no natural selection, and no gene flow or population structure, meaning that each variant is equally likely to have been passed from one generation to the next. The model looks backward in time, merging alleles into a single ancestral copy according to a random process in coalescence events. Under this model, the expected time between successive coalescence events increases almost exponentially back in time. Variance in the model comes from both the random passing of alleles from one generation to the next, and the random occurrence of mutations in these alleles.

<span class="mw-page-title-main">Genome-wide association study</span> Study of genetic variants in different individuals

In genomics, a genome-wide association study, is an observational study of a genome-wide set of genetic variants in different individuals to see if any variant is associated with a trait. GWA studies typically focus on associations between single-nucleotide polymorphisms (SNPs) and traits like major human diseases, but can equally be applied to any other genetic variants and any other organisms.

In multivariate quantitative genetics, a genetic correlation is the proportion of variance that two traits share due to genetic causes, the correlation between the genetic influences on a trait and the genetic influences on a different trait estimating the degree of pleiotropy or causal overlap. A genetic correlation of 0 implies that the genetic effects on one trait are independent of the other, while a correlation of 1 implies that all of the genetic influences on the two traits are identical. The bivariate genetic correlation can be generalized to inferring genetic latent variable factors across > 2 traits using factor analysis. Genetic correlation models were introduced into behavioral genetics in the 1970s–1980s.

Behavioural genetics, also referred to as behaviour genetics, is a field of scientific research that uses genetic methods to investigate the nature and origins of individual differences in behaviour. While the name "behavioural genetics" connotes a focus on genetic influences, the field broadly investigates the extent to which genetic and environmental factors influence individual differences, and the development of research designs that can remove the confounding of genes and environment. Behavioural genetics was founded as a scientific discipline by Francis Galton in the late 19th century, only to be discredited through association with eugenics movements before and during World War II. In the latter half of the 20th century, the field saw renewed prominence with research on inheritance of behaviour and mental illness in humans, as well as research on genetically informative model organisms through selective breeding and crosses. In the late 20th and early 21st centuries, technological advances in molecular genetics made it possible to measure and modify the genome directly. This led to major advances in model organism research and in human studies, leading to new scientific discoveries.

Nested association mapping (NAM) is a technique designed by the labs of Edward Buckler, James Holland, and Michael McMullen for identifying and dissecting the genetic architecture of complex traits in corn. It is important to note that nested association mapping is a specific technique that cannot be performed outside of a specifically designed population such as the Maize NAM population, the details of which are described below.

The missing heritability problem refers to the difference between heritability estimates from genetic data and heritability estimates from twin and family data across many physical and mental traits, including diseases, behaviors, and other phenotypes. This is a problem that has significant implications for medicine, since a person's susceptibility to disease may depend more on the combined effect of all the genes in the background than on the disease genes in the foreground, or the role of genes may have been severely overestimated.

Predictive genomics is at the intersection of multiple disciplines: predictive medicine, personal genomics and translational bioinformatics. Specifically, predictive genomics deals with the future phenotypic outcomes via prediction in areas such as complex multifactorial diseases in humans. To date, the success of predictive genomics has been dependent on the genetic framework underlying these applications, typically explored in genome-wide association (GWA) studies. The identification of associated single-nucleotide polymorphisms underpin GWA studies in complex diseases that have ranged from Type 2 Diabetes (T2D), Age-related macular degeneration (AMD) and Crohn's disease.

<span class="mw-page-title-main">Michael Goddard</span>

Michael Edward "Mike" Goddard is a professorial fellow in animal genetics at the University of Melbourne, Australia.

<span class="mw-page-title-main">Genetic variance</span> Biological concept

Genetic variance is a concept outlined by the English biologist and statistician Ronald Fisher in his fundamental theorem of natural selection. In his 1930 book The Genetical Theory of Natural Selection, Fisher postulates that the rate of change of biological fitness can be calculated by the genetic variance of the fitness itself. Fisher tried to give a statistical formula about how the change of fitness in a population can be attributed to changes in the allele frequency. Fisher made no restrictive assumptions in his formula concerning fitness parameters, mate choices or the number of alleles and loci involved.

A human disease modifier gene is a modifier gene that alters expression of a human gene at another locus that in turn causes a genetic disease. Whereas medical genetics has tended to distinguish between monogenic traits, governed by simple, Mendelian inheritance, and quantitative traits, with cumulative, multifactorial causes, increasing evidence suggests that human diseases exist on a continuous spectrum between the two.

<span class="mw-page-title-main">Polygenic score</span> Numerical score aimed at predicting a trait based on variation in multiple genetic loci

In genetics, a polygenic score (PGS) is a number that summarizes the estimated effect of many genetic variants on an individual's phenotype. The PGS is also called the polygenic index (PGI) or genome-wide score; in the context of disease risk, it is called a polygenic risk score or genetic risk score. The score reflects an individual's estimated genetic predisposition for a given trait and can be used as a predictor for that trait. It gives an estimate of how likely an individual is to have a given trait based only on genetics, without taking environmental factors into account; and it is typically calculated as a weighted sum of trait-associated alleles.

<span class="mw-page-title-main">Complex traits</span>

Complex traits are phenotypes that are controlled by two or more genes and do not follow Mendel’s Law of Dominance. They may have a range of expression which is typically continuous. Both environmental and genetic factors often impact the variation in expression. Human height is a continuous trait meaning that there is a wide range of heights. There are an estimated 50 genes that affect the height of a human. Environmental factors, like nutrition, also play a role in a human’s height. Other examples of complex traits include: crop yield, plant color, and many diseases including diabetes and Parkinson's disease. One major goal of genetic research today is to better understand the molecular mechanisms through which genetic variants act to influence complex traits. Complex Traits are also known as polygenic traits and multigenic traits.

The infinitesimal model, also known as the polygenic model, is a widely used statistical model in quantitative genetics and in genome-wide association studies. Originally developed in 1918 by Ronald Fisher, it is based on the idea that variation in a quantitative trait is influenced by an infinitely large number of genes, each of which makes an infinitely small (infinitesimal) contribution to the phenotype, as well as by environmental factors. In "The Correlation between Relatives on the Supposition of Mendelian Inheritance", the original 1918 paper introducing the model, Fisher showed that if a trait is polygenic, "then the random sampling of alleles at each gene produces a continuous, normally distributed phenotype in the population". However, the model does not necessarily imply that the trait must be normally distributed, only that its genetic component will be so around the average of that of the individual's parents. The model served to reconcile Mendelian genetics with the continuous distribution of quantitative traits documented by Francis Galton.

The Omnigenic Model, first proposed by Evan A. Boyle, Yang I. Li, and Jonathan K. Pritchard, describes a hypothesis regarding the heritability of complex traits. Expanding beyond polygenes, the authors propose that all genes expressed within a cell affect the expression of a given trait. In addition, the model states that the peripheral genes, ones that do not have a direct impact on expression, explain more heritability of traits than core genes, ones that have a direct impact on expression. The process that the authors propose that facilitates this effect is called “network pleiotropy”, in which peripheral genes can affect core genes, not by having a direct effect, but rather by virtue of being mediated within the same cell.

Personality traits are patterns of thoughts, feelings and behaviors that reflect the tendency to respond in certain ways under certain circumstances.

References

  1. Figure 3 of Yang et al 2010, or Figure 3 of Ritland & Ritland 1996
  2. see also Ritland 1996b, "Estimators for pairwise relatedness and individual inbreeding coefficients" Archived 2017-01-16 at the Wayback Machine ; Ritland & Ritland 1996, "Inferences about quantitative inheritance based on natural population structure in the yellow monkeyflower, Mimulus guttatus" Archived 2016-09-24 at the Wayback Machine ; Lynch & Ritland 1999, "Estimation of Pairwise Relatedness With Molecular Markers"; Ritland 2000, "Marker-inferred relatedness as a tool for detecting heritability in nature" Archived 2016-09-25 at the Wayback Machine ; Thomas 2005, "The estimation of genetic relationships using molecular markers and their efficiency in estimating heritability in natural populations"
  3. Visscher et al 2006, "Assumption-free estimation of heritability from genome-wide identity-by-descent sharing between full siblings"
  4. Visscher et al 2007, "Genome partitioning of genetic variation for height from 11,214 sibling pairs"
  5. "Common SNPs explain a large proportion of heritability for human height", Yang et al 2010
  6. "A Commentary on ‘Common SNPs Explain a Large Proportion of the Heritability for Human Height’ by Yang et al. (2010)", Visscher et al 2010
  7. "Defining the role of common variation in the genomic and biological architecture of adult human height", Wood et al 2014
  8. "GCTA: A Tool for Genome-wide Complex Trait Analysis", Yang et al 2011
  9. "Meta-analysis of the heritability of human traits based on fifty years of twin studies", Polderman et al 2015
  10. Barnes, J. C.; Wright, John Paul; Boutwell, Brian B.; Schwartz, Joseph A.; Connolly, Eric J.; Nedelec, Joseph L.; Beaver, Kevin M. (2014-11-01). "Demonstrating the Validity of Twin Research in Criminology". Criminology. 52 (4): 588–626. doi:10.1111/1745-9125.12049. ISSN   1745-9125.
  11. "GCTA will eventually provide direct DNA tests of quantitative genetic results based on twin and adoption studies. One problem is that many thousands of individuals are required to provide reliable estimates. Another problem is that more SNPs are needed than even the million SNPs genotyped on current SNP microarrays because there is much DNA variation not captured by these SNPs. As a result, GCTA cannot estimate all heritability, perhaps only about half of the heritability. The first reports of GCTA analyses estimate heritability to be about half the heritability estimates from twin and adoption studies for height (Lee, Wray, Goddard, & Visscher, 2011; Yang et al., 2010; Yang, Manolio, et al" 2011), and intelligence (Davies et al., 2011)." pg110, Behavioral Genetics, Plomin et al 2012
  12. "Meta-analysis of GREML results from multiple cohorts", Yang 2015
  13. Ge, Tian; Chen, Chia-Yen; Neale, Benjamin M.; Sabuncu, Mert R.; Smoller, Jordan W. (2016). "Phenome-wide Heritability Analysis of the UK Biobank". bioRxiv   10.1101/070177 .
  14. Pasaniuc & Price 2016, "Dissecting the genetics of complex traits using summary association statistics"
  15. Bulik-Sullivan, B. K.; Loh, P. R.; Finucane, H.; Ripke, S.; Yang, J.; Schizophrenia Working Group of the Psychiatric Genomics Consortium; Patterson, N.; Daly, M. J.; Price, A. L.; Neale, B. M. (2015). "LD Score Regression Distinguishes Confounding from Polygenicity in Genome-Wide Association Studies". Nature Genetics. 47 (3): 291–295. doi:10.1038/ng.3211. PMC   4495769 . PMID   25642630.
  16. "LD Hub: a centralized database and web interface to LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis", Zheng et al 2016
  17. "Contrasting the genetic architecture of 30 complex traits from summary association data", Shi et al 2016
  18. Schweiger, Regev; Kaufman, Shachar; Laaksonen, Reijo; Kleber, Marcus E.; März, Winfried; Eskin, Eleazar; Rosset, Saharon; Halperin, Eran (2 June 2016). "Fast and Accurate Construction of Confidence Intervals for Heritability". The American Journal of Human Genetics. 98 (6): 1181–1192. doi:10.1016/j.ajhg.2016.04.016. PMC   4908190 . PMID   27259052.
  19. "Linkage disequilibrium–dependent architecture of human complex traits shows action of negative selection", Gazal et al 2017
  20. 1 2 3 4 5 "GCTA document". cnsgenomics.com. Retrieved 2021-04-08.
  21. "Fast linear mixed models for genome-wide association studies", Lippert 2011
  22. "Improved linear mixed models for genome-wide association studies", Listgarten et al 2012
  23. "Advantages and pitfalls in the application of mixed-model association methods", Yang et al 2014
  24. "A lasso multi-marker mixed model for association mapping with population structure correction", Rakitsch et al 2012
  25. "Genome-wide efficient mixed-model analysis for association studies", Zhou & Stephens 2012
  26. "Variance component model to account for sample structure in genome-wide association studies", Kang et al 2012
  27. "Advanced Complex Trait Analysis", Gray et al 2012
  28. "Regional Heritability Advanced Complex Trait Analysis for GPU and Traditional Parallel Architecture", Cebamanos et al 2012
  29. "Efficient Bayesian mixed model analysis increases association power in large cohorts", Loh et al 2012
  30. "Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis", Loh et al 2015; see also "Contrasting regional architectures of schizophrenia and other complex diseases using fast variance components analysis", Loh et al 2015
  31. "Mixed Models for Meta-Analysis and Sequencing", Bulik-Sullivan 2015
  32. "Massively expedited genome-wide heritability analysis (MEGHA)", Ge et al 2015
  33. Speed et al 2016, "Re-evaluation of SNP heritability in complex human traits"
  34. Evans et al 2017, "Narrow-sense heritability estimation of complex traits using identity-by-descent information."

Further reading

GCTA
Original author(s) Jian Yang
Initial releaseAugust 30, 2010;13 years ago (2010-08-30) [20]
Stable release(s)
1.26.0 / June 22, 2016;7 years ago (2016-06-22) [20]