Genomic control

Last updated

Genomic control (GC) is a statistical method that is commonly used to control for the confounding effects of population stratification in genetic association studies. The method was originally outlined by Bernie Devlin and Kathryn Roeder in a 1999 paper. [1] It involves using a set of anonymous genetic markers to estimate the effect of population structure on the distribution of the chi-square statistic. The distribution of the chi-square statistics for a given allele that is suspected to be associated with a given trait can then be compared to the distribution of the same statistics for an allele that is expected not to be related to the trait. [2] [3] The method is supposed to involve the use of markers that are not linked to the marker being tested for a possible association. [4] In theory, it takes advantage of the tendency of population structure to cause overdispersion of test statistics in association analyses. [5] The genomic control method is as robust as family-based designs, despite being applied to population-based data. [6] It has the potential to lead to a decrease in statistical power to detect a true association, and it may also fail to completely eliminate the biasing effects of population stratification. [7] A more robust form of the genomic control method can be performed by expressing the association being studied as two Cochran–Armitage trend tests, and then applying the method to each test separately. [8]

Related Research Articles

Single-nucleotide polymorphism Single nucleotide position in genomic DNA at which different sequence alternatives exist

In genetics, a single-nucleotide polymorphism is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in a sufficiently large fraction of the population, many publications do not apply such a frequency threshold.

Genetic association is when one or more genotypes within a population co-occur with a phenotypic trait more often than would be expected by chance occurrence.

Medical genetics

Medical genetics is the branch of medicine that involves the diagnosis and management of hereditary disorders. Medical genetics differs from human genetics in that human genetics is a field of scientific research that may or may not apply to medicine, while medical genetics refers to the application of genetics to medical care. For example, research on the causes and inheritance of genetic disorders would be considered within both human genetics and medical genetics, while the diagnosis, management, and counselling people with genetic disorders would be considered part of medical genetics.

Ancestry-informative marker

In population genetics, an ancestry-informative marker (AIM) is a single-nucleotide polymorphism that exhibits substantially different frequencies between different populations. A set of many AIMs can be used to estimate the proportion of ancestry of an individual derived from each population.

In molecular biology, SNP array is a type of DNA microarray which is used to detect polymorphisms within a population. A single nucleotide polymorphism (SNP), a variation at a single site in DNA, is the most frequent type of variation in the genome. Around 335 million SNPs have been identified in the human genome, 15 million of which are present at frequencies of 1% or higher across different populations worldwide.

Genome-wide association study Study to research genome-wide set of genetic variants in different individuals to see if any variant is associated with a trait.

In genomics, a genome-wide association study, also known as whole genome association study, is an observational study of a genome-wide set of genetic variants in different individuals to see if any variant is associated with a trait. GWA studies typically focus on associations between single-nucleotide polymorphisms (SNPs) and traits like major human diseases, but can equally be applied to any other genetic variants and any other organisms.

Population structure is the presence of a systematic difference in allele frequencies between subpopulations in a population as a result of non-random mating between individuals. It can be informative of genetic ancestry, and in the context of medical genetics it is an important confounding variable in genome wide association studies (GWAS).

In genetics, association mapping, also known as "linkage disequilibrium mapping", is a method of mapping quantitative trait loci (QTLs) that takes advantage of historic linkage disequilibrium to link phenotypes to genotypes, uncovering genetic associations.

The Haplotype-relative-risk (HRR) method is a family-based method for determining gene allele association to a disease in the presence of actual genetic linkage. Nuclear families with one affected child are sampled using the parental haplotypes not transmitted as a control. While similar to the genotype relative risk (RR), the HRR provides a solution to the problem of population stratification by only sampling within family trios. The HRR method was first proposed by Rubinstein in 1981 then detailed in 1987 by Rubinstein and Falk and is an important tool in genetic association studies.

Gene polymorphism

A gene is said to be polymorphic if more than one allele occupies that gene's locus within a population. In addition to having more than one allele at a specific locus, each allele must also occur in the population at a rate of at least 1% to generally be considered polymorphic.

Quantitative trait loci mapping or QTL mapping is the process of identifying genomic regions that potentially contain genes responsible for important economic, health or environmental characters. Mapping QTLs is an important activity that plant breeders and geneticists routinely use to associate potential causal genes with phenotypes of interest. Family-based QTL mapping is a variant of QTL mapping where multiple-families are used.

Polygenic score Numerical score aimed at predicting a trait based on variation in multiple genetic loci

In genetics, a polygenic score (PGS), also called a polygenic risk score (PRS), genetic risk score, or genome-wide score, is a number that summarises the estimated effect of many genetic variants on an individual's phenotype, typically calculated as a weighted sum of trait-associated alleles. It reflects an individual's estimated genetic predisposition for a given trait and can be used as a predictor for that trait. In other words, it gives an estimate of how likely an individual is to have a given trait only based on genetics, without taking environmental factors into account. Polygenic scores are widely used in animal breeding and plant breeding due to their efficacy in improving livestock breeding and crops. In humans, polygenic scores are typically generated from genome-wide association study (GWAS) data.

Mark Joseph Daly is Director of the Finnish Institute for Molecular Medicine (FIMM) at the University of Helsinki, a Professor of Genetics at Harvard Medical School, Chief of the Analytic and Translational Genetic Unit at Massachusetts General Hospital, and a member of the Broad Institute of MIT and Harvard. In the early days of the Human Genome Project, Daly helped develop the genetic model by which linkage disequilibrium could be used to map the haplotype structure of the human genome. In addition, he developed statistical methods to find associations between genes and disorders such as Crohn's disease, inflammatory bowel disease, autism and schizophrenia.

Kathryn M. Roeder is an American statistician known for her development of statistical methods to uncover the genetic basis of complex disease and her contributions to mixture models, semiparametric inference, and multiple testing. Roeder holds positions as professor of statistics and professor of computational biology at Carnegie Mellon University, where she leads a project focused on discovering genes associated with autism.

Landscape genomics is one of many strategies used to identify relationships between environmental factors and the genetic adaptation of organisms in response to these factors. Landscape genomics combines aspects of landscape ecology, population genetics and landscape genetics. The latter addresses how landscape features influence the population structure and gene flow of organisms across time and space. The field of landscape genomics is distinct from landscape genetics in that it is not focused on the neutral genetic processes, but considers, in addition to neutral processes such as drift and gene flow, explicitly adaptive processes, i.e. the role of natural selection.

Bernard J. Devlin is an American psychiatrist who is Professor of Psychiatry and Clinical and Translational Science at the University of Pittsburgh. An expert on statistical and psychiatric genetics, he is a fellow of the statistics section of the American Association for the Advancement of Science. He is also a member of the American Society of Human Genetics, the Genetics Society of America, and the International Society for Autism Research. Before joining the faculty of the University of Pittsburgh, he worked at the Yale School of Medicine, where he conducted research with Neil Risch on the utility of DNA tests. He is married to Kathryn Roeder, a professor at Carnegie Mellon University, with whom he often collaborates on research. Topics that Devlin and Roeder have studied together include the genetic basis of autism. Devlin and Roeder have a daughter, Summer.

In statistical genetics, linkage disequilibrium score regression is a technique that aims to quantify the separate contributions of polygenic effects and various confounding factors, such as population stratification, based on summary statistics from genome-wide association studies (GWASs). The approach involves using regression analysis to examine the relationship between linkage disequilibrium scores and the test statistics of the single-nucleotide polymorphisms (SNPs) from the GWAS. Here, the "linkage disequilibrium score" for a SNP "is the sum of LD r2 measured with all other SNPs".

In population genetics, cryptic relatedness occurs when individuals in a genetic association study are more closely related to another than assumed by the investigators. This can act as a confounding factor in both case-control and genome-wide association studies, as well as in studies of genetic diversity. Along with population stratification, it is one of the most prominent confounding factors that can lead to inflated false positive rates in gene-association studies. It is often corrected for by including a polygenic component in the statistical model being used to detect genetic associations. Other approaches that have been developed to attempt to control for cryptic relatedness are the genomic control method and the use of extended likelihood ratio tests.

Landscape genetics Combination of population genetics and landscape ecology

Landscape genetics is the scientific discipline that combines population genetics and landscape ecology. It broadly encompasses any study that analyses plant or animal population genetic data in conjunction with data on the landscape features and matrix quality where the sampled population lives. This allows for the analysis of microevolutionary processes affecting the species in light of landscape spatial patterns, providing a more realistic view of how populations interact with their environments. Landscape genetics attempts to determine which landscape features are barriers to dispersal and gene flow, how human-induced landscape changes affect the evolution of populations, the source-sink dynamics of a given population, and how diseases or invasive species spread across landscapes.

Human genetic clustering refers to patterns of relative genetic similarity among human individuals and populations, as well as the wide range of scientific and statistical methods used to study this aspect of human genetic variation.

References

  1. Devlin, Bernie; Roeder, Kathryn (1999). "Genomic Control for Association Studies". Biometrics . 55 (4): 997–1004. CiteSeerX   10.1.1.420.1751 . doi:10.1111/j.0006-341X.1999.00997.x. ISSN   1541-0420. PMID   11315092.
  2. Donnelly, Peter; Phillips, Michael S.; Cardon, Lon R.; Marchini, Jonathan (May 2004). "The effects of human population structure on large genetic association studies". Nature Genetics . 36 (5): 512–517. doi: 10.1038/ng1337 . ISSN   1546-1718. PMID   15052271.
  3. Altshuler, David; Hirschhorn, Joel N.; Henderson, Brian; Sklar, Pamela; Lander, Eric S.; Kolonel, Laurence N.; Petryshen, Tracey L.; Pato, Michele T.; Pato, Carlos N. (April 2004). "Assessing the impact of population stratification on genetic association studies". Nature Genetics. 36 (4): 388–393. doi: 10.1038/ng1333 . ISSN   1546-1718. PMID   15052270.
  4. Krawczak, Michael; Dempfle, Astrid; Lieb, Wolfgang; Freitag-Wolf, Sandra; Yadav, Pankaj (2015-10-01). "Allowing for population stratification in case-only studies of gene–environment interaction, using genomic control". Human Genetics . 134 (10): 1117–1125. doi:10.1007/s00439-015-1593-y. ISSN   1432-1203. PMID   26297539. S2CID   18146948.
  5. Devlin, Bernie; Roeder, Kathryn; Wasserman, Larry (2001-11-01). "Genomic Control, a New Approach to Genetic-Based Association Studies". Theoretical Population Biology . 60 (3): 155–166. doi:10.1006/tpbi.2001.1542. ISSN   0040-5809. PMID   11855950. S2CID   11547174.
  6. Roeder, Kathryn; Devlin, B.; Bacanu, Silviu-Alin (2000-06-01). "The Power of Genomic Control". The American Journal of Human Genetics . 66 (6): 1933–1944. doi:10.1086/302929. ISSN   1537-6605. PMC   1378064 . PMID   10801388.
  7. Greenberg, David A.; Zhang, Junying; Shmulewitz, Dvora (2004). "Case-Control Association Studies in Mixed Populations: Correcting Using Genomic Control". Human Heredity . 58 (3–4): 145–153. doi:10.1159/000083541. ISSN   1423-0062. PMID   15812171. S2CID   24635575.
  8. Gastwirth, Joseph L.; Freidlin, Boris; Zheng, Gang (2006-02-01). "Robust Genomic Control for Association Studies". The American Journal of Human Genetics. 78 (2): 350–356. doi:10.1086/500054. ISSN   1537-6605. PMC   1380242 . PMID   16400614.