The missing heritability problem [1] [2] [3] [4] [5] [6] refers to a difference between heritability estimates obtained from early genome-wide association studies (GWAS) and heritability estimates from twin and family data across many physical and mental traits, including diseases, behaviors, and other phenotypes.
An influential review article [7] in 2008 noted that the amount of phenotypic variance explained by significant loci in GWAS studies up to that point was usually far less than expected based on family studies. This gap was referred to as "missing heritability". Using height as a model trait, a paper in 2010 showed that most of the missing heritability can be explained by the presence of large numbers of low variants whose effect sizes were too small to detect at the sample sizes that were then available [8] . This conclusion has subsequently been confirmed using much larger sample sizes, including a study of 5.4 million individuals that identified around 12,000 independent variants that affect human height [9] . While studies of height have particularly large power due to their very large sample size, other complex traits likely have similar genetic architecture. Thus, the missing heritability problem is largely resolved by the presence of tens of thousands of variants of small effects that could not be detected in early GWAS studies.
The missing heritability problem was named as such in 2008. The Human Genome Project led to optimistic forecasts that the large genetic contributions to many traits and diseases (which were identified by quantitative genetics and behavioral genetics in particular) would soon be mapped and pinned down to specific genes and their genetic variants by methods such as candidate-gene studies which used small samples with limited genetic sequencing to focus on specific genes believed to be involved, examining single-nucleotide polymorphisms (SNPs). While many hits were found, they often failed to replicate in other studies. The exponential fall in genome genotyping costs led to the use of genome-wide association studies (GWASes) which could simultaneously examine all candidate-genes in larger samples than the earlier candidate-gene studies. For the first time these produced replicatable signals; however by 2008 investigators were surprised to find that the detected signals could only explain a small fraction of the expected genetic variance.
Standard genetics methods have long estimated large heritabilities such as 80% for traits such as height or intelligence, yet none of the genes had been found despite sample sizes that, while small, should have been able to detect variants of reasonable effect size such as 1 inch or 5 IQ points. If genes have such strong cumulative effects - where were they? Several resolutions have been proposed, that the missing heritability is some combination of: