Minor allele frequency

Last updated

Minor allele frequency (MAF) is the frequency at which the second most common allele occurs in a given population. They play a surprising role in heritability since MAF variants which occur only once, known as "singletons", drive an enormous amount of selection. [1]

Contents

Single nucleotide polymorphisms (SNPs) with a minor allele frequency of 0.05 (5%) or greater were targeted by the HapMap project. [2]

MAF is widely used in population genetics studies because it provides information to differentiate between common and rare variants in the population. As an example, a 2015 study sequenced the whole genomes of 2,120 Sardinian individuals. The authors classified the variants found in the study in three classes according to their MAF. It was observed that rare variants (MAF < 0.05) appeared more frequently in coding regions than common variants (MAF > 0.05) in this population. [3]

Interpreting MAF data

1. Introduce the reference of a SNP of interest, as an example: rs429358, in a database (dbSNP or other).

2. Find MAF/MinorAlleleCount link. MAF/MinorAlleleCount: C=0.1506/754 (1000 Genomes, where number of genomes sampled = N = 2504); [4] where C is the minor allele for that particular locus; 0.1506 is the frequency of the C allele (MAF), i.e. 15% within the 1000 Genomes database; and 754 is the number of times this SNP has been observed in the population of the study.

See also

Related Research Articles

Single-nucleotide polymorphism Single nucleotide position in genomic DNA at which different sequence alternatives exist

In genetics, a single-nucleotide polymorphism is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in a sufficiently large fraction of the population, many publications do not apply such a frequency threshold.

Haplotype Group of genes from one parent

A haplotype is a group of alleles in an organism that are inherited together from a single parent.

The International HapMap Project was an organization that aimed to develop a haplotype map (HapMap) of the human genome, to describe the common patterns of human genetic variation. HapMap is used to find genetic variants affecting health, disease and responses to drugs and environmental factors. The information produced by the project is made freely available for research.

Genetic association is when one or more genotypes within a population co-occur with a phenotypic trait more often than would be expected by chance occurrence.

Human genetic variation Genetic diversity in human populations

Human genetic variation is the genetic differences in and among populations. There may be multiple variants of any given gene in the human population (alleles), a situation called polymorphism.

In molecular biology, SNP array is a type of DNA microarray which is used to detect polymorphisms within a population. A single nucleotide polymorphism (SNP), a variation at a single site in DNA, is the most frequent type of variation in the genome. Around 335 million SNPs have been identified in the human genome, 15 million of which are present at frequencies of 1% or higher across different populations worldwide.

A tag SNP is a representative single nucleotide polymorphism (SNP) in a region of the genome with high linkage disequilibrium that represents a group of SNPs called a haplotype. It is possible to identify genetic variation and association to phenotypes without genotyping every SNP in a chromosomal region. This reduces the expense and time of mapping genome areas associated with disease, since it eliminates the need to study every individual SNP. Tag SNPs are useful in whole-genome SNP association studies in which hundreds of thousands of SNPs across the entire genome are genotyped.

Genome-wide association study Study of genetic variants in different individuals

In genomics, a genome-wide association study, also known as whole genome association study, is an observational study of a genome-wide set of genetic variants in different individuals to see if any variant is associated with a trait. GWA studies typically focus on associations between single-nucleotide polymorphisms (SNPs) and traits like major human diseases, but can equally be applied to any other genetic variants and any other organisms.

1000 Genomes Project International research effort on genetic variation

The 1000 Genomes Project, launched in January 2008, was an international research effort to establish by far the most detailed catalogue of human genetic variation. Scientists planned to sequence the genomes of at least one thousand anonymous participants from a number of different ethnic groups within the following three years, using newly developed technologies which were faster and less expensive. In 2010, the project finished its pilot phase, which was described in detail in a publication in the journal Nature. In 2012, the sequencing of 1092 genomes was announced in a Nature publication. In 2015, two papers in Nature reported results and the completion of the project and opportunities for future research.

dbSNP

The Single Nucleotide Polymorphism Database (dbSNP) is a free public archive for genetic variation within and across different species developed and hosted by the National Center for Biotechnology Information (NCBI) in collaboration with the National Human Genome Research Institute (NHGRI). Although the name of the database implies a collection of one class of polymorphisms only, it in fact contains a range of molecular variation: (1) SNPs, (2) short deletion and insertion polymorphisms (indels/DIPs), (3) microsatellite markers or short tandem repeats (STRs), (4) multinucleotide polymorphisms (MNPs), (5) heterozygous sequences, and (6) named variants. The dbSNP accepts apparently neutral polymorphisms, polymorphisms corresponding to known phenotypes, and regions of no variation. It was created in September 1998 to supplement GenBank, NCBI’s collection of publicly available nucleic acid and protein sequences.

Population structure is the presence of a systematic difference in allele frequencies between subpopulations. In a randomly mating population, allele frequencies are expected to be roughly similar between groups. However, mating tends to be non-random to some degree, causing structure to arise. For example, a barrier like a river can separate two groups of the same species and make it difficult for potential mates to cross; if a mutation occurs, over many generations it can spread and become common in one subpopulation while being completely absent in the other.

Behavioural genetics Study of genetic-environment interactions influencing behaviour

Behavioural genetics, also referred to as behaviour genetics, is a field of scientific research that uses genetic methods to investigate the nature and origins of individual differences in behaviour. While the name "behavioural genetics" connotes a focus on genetic influences, the field broadly investigates the extent to which genetic and environmental factors influence individual differences, using research designs that allow removal of the confounding of genes and environment. Behavioural genetics was founded as a scientific discipline by Francis Galton in the late 19th century, only to be discredited through association with eugenics movements before and during World War II. In the latter half of the 20th century, the field saw renewed prominence with research on inheritance of behaviour and mental illness in humans, as well as research on genetically informative model organisms through selective breeding and crosses. In the late 20th and early 21st centuries, technological advances in molecular genetics made it possible to measure and modify the genome directly. This led to major advances in model organism research and in human studies, leading to new scientific discoveries.

The "missing heritability" problem is the fact that single genetic variations cannot account for much of the heritability of diseases, behaviors, and other phenotypes. This is a problem that has significant implications for medicine, since a person's susceptibility to disease may depend more on 'the combined effect of all the genes in the background than on the disease genes in the foreground', or the role of genes may have been severely overestimated.

In genetics, haplotype estimation refers to the process of statistical estimation of haplotypes from genotype data. The most common situation arises when genotypes are collected at a set of polymorphic sites from a group of individuals. For example in human genetics, genome-wide association studies collect genotypes in thousands of individuals at between 200,000-5,000,000 SNPs using microarrays. Haplotype estimation methods are used in the analysis of these datasets and allow genotype imputation of alleles from reference databases such as the HapMap Project and the 1000 Genomes Project.

Imputation in genetics refers to the statistical inference of unobserved genotypes. It is achieved by using known haplotypes in a population, for instance from the HapMap or the 1000 Genomes Project in humans, thereby allowing to test for association between a trait of interest and experimentally untyped genetic variants, but whose genotypes have been statistically inferred ("imputed"). Genotype imputation is usually performed on SNPs, the most common kind of genetic variation.

SNV calling from NGS data is any of a range of methods for identifying the existence of single nucleotide variants (SNVs) from the results of next generation sequencing (NGS) experiments. These are computational techniques, and are in contrast to special experimental methods based on known population-wide single nucleotide polymorphisms. Due to the increasing abundance of NGS data, these techniques are becoming increasingly popular for performing SNP genotyping, with a wide variety of algorithms designed for specific experimental designs and applications. In addition to the usual application domain of SNP genotyping, these techniques have been successfully adapted to identify rare SNPs within a population, as well as detecting somatic SNVs within an individual using multiple tissue samples.

A rare functional variant is a genetic variant which alters gene function, and which occurs at low frequency in a population. Rare variants may play a significant role in complex disease, as well as some Mendelian conditions. Rare variants may be responsible for a portion of the missing heritability of complex diseases. The theoretical case for a significant role of rare variants is that alleles that strongly predispose an individual to disease will be kept at low frequencies in populations by purifying selection. Rare variants are increasingly being studied, as a consequence of whole exome and whole genome sequencing efforts. While these variants are individually infrequent in populations, there are many in human populations, and they can be unique to specific populations. They are more likely to be deleterious than common variants, as a result of rapid population growth and weak purifying selection. They have been suspected of acting independently or along with common variants to cause disease states.

Predictive genomics is at the intersection of multiple disciplines: predictive medicine, personal genomics and translational bioinformatics. Specifically, predictive genomics deals with the future phenotypic outcomes via prediction in areas such as complex multifactorial diseases in humans. To date, the success of predictive genomics has been dependent on the genetic framework underlying these applications, typically explored in genome-wide association (GWA) studies. The identification of associated single-nucleotide polymorphisms underpin GWA studies in complex diseases that have ranged from Type 2 Diabetes (T2D), Age-related macular degeneration (AMD) and Crohn's disease.

Complex traits

Complex traits, also known as quantitative traits, are traits that do not behave according to simple Mendelian inheritance laws. More specifically, their inheritance cannot be explained by the genetic segregation of a single gene. Such traits show a continuous range of variation and are influenced by both environmental and genetic factors. Compared to strictly Mendelian traits, complex traits are far more common, and because they can be hugely polygenic, they are studied using statistical techniques such as QTL mapping rather than classical genetics methods. Examples of complex traits include height, circadian rhythms, enzyme kinetics, and many diseases including diabetes and Parkinson's disease. One major goal of genetic research today is to better understand the molecular mechanisms through which genetic variants act to influence complex traits.

The GWAS catalog is a free online database that compiles data of genome-wide association studies (GWAS), summarizing unstructured data from different literature sources into accessible high quality data. It was created by the National Human Genome Research Institute (NHGRI) in 2008 and have become a collaborative project between the NHGRI and the European Bioinformatics Institute (EBI) since 2010. As of September 2018, it has included 71,673 SNP–trait associations in 3,567 publications.

References

  1. Hernandez, Ryan D.; Uricchio, Lawrence H.; Hartman, Kevin; Ye, Chun; Dahl, Andrew; Zaitlen, Noah (September 2019). "Ultrarare variants drive substantial cis heritability of human gene expression". Nature Genetics. 51 (9): 1349–1355. doi:10.1038/s41588-019-0487-7. ISSN   1546-1718. PMC   6730564 . PMID   31477931.
  2. The International HapMap Consortium (2005). "A haplotype map of the human genome". Nature. 437 (7063): 1299–1320. Bibcode:2005Natur.437.1299T. doi:10.1038/nature04226. PMC   1880871 . PMID   16255080.
  3. Sidore, C., y colaboradores (2015). "Genome sequencing elucidates Sardinian genetic architecture and augments association analyses for lipid and blood inflammatory markers". Nature Genetics. 47 (11): 1272–1281. doi:10.1038/ng.3368. PMC   4627508 . PMID   26366554.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  4. National Center for Biotechnology Information: New SNP Attributes