Fay and Wu's H

Last updated

Fay and Wu's H is a statistical test created by and named after two researchers Justin Fay and Chung-I Wu. [1] The purpose of the test is to distinguish between a DNA sequence evolving randomly ("neutrally") and one evolving under positive selection. This test is an advancement over Tajima's D, [2] which is used to differentiate neutrally evolving sequences from those evolving non-randomly (through directional selection or balancing selection, demographic expansion or contraction or genetic hitchhiking). Fay and Wu's H is frequently used to identify sequences which have experienced selective sweeps in their evolutionary history.

Contents

Concept

Imagine a DNA sequence which has very few polymorphisms in its alleles across different populations. This could arise due to at least three causes:

  1. The sequence is experiencing heavy negative selection, so any new mutation in the sequence is deleterious and is purged off immediately, or
  2. The sequence just experienced a bout of selective sweep (an allele rose to fixation/near fixation), so all alleles became homogenized. The rare polymorphisms you see are very recent, or
  3. There was a population bottleneck, so all individuals in the population are derived from a small set (or one) common ancestor

Now, when you calculate Tajima's D using all the alleles across all populations, because there is an excess of rare polymorphisms, Tajima's D will show up negative and will tell you that the particular sequence was evolving non-randomly. However, you don't know whether this is because of some selection acting or whether there was some selective sweep recently or due to population expansion/contraction. To know that, you calculate Fay and Wu's H. [3]

Fay and Wu's H not only uses population polymorphism data but also data from an outgroup species. Due to the outgroup species, you can now tell what the ancestral state of the allele was before the two lineages split. If, for example, the ancestral allele was different, you can now say that there was a selective sweep in that region (could be due to linkage too). The magnitude of the selective sweep will be decided by the strength of H. If the allele was the same, it means the sequence is experiencing negative selection and the ancestral state is maintained. On the other hand, an H close to 0 means that there is no evidence of deviation from neutrality.

Interpretation

A significantly positive Fay and Wu's H indicates a deficit of moderate- and high-frequency derived single nucleotide polymorphisms (SNPs) relative to equilibrium expectations, whereas a significant negative Fay and Wu's H indicates an excess of high-frequency derived SNPs. [4]

Related Research Articles

<span class="mw-page-title-main">Molecular evolution</span> Process of change in the sequence composition of cellular molecules across generations

Molecular evolution is the process of change in the sequence composition of cellular molecules such as DNA, RNA, and proteins across generations. The field of molecular evolution uses principles of evolutionary biology and population genetics to explain patterns in these changes. Major topics in molecular evolution concern the rates and impacts of single nucleotide changes, neutral evolution vs. natural selection, origins of new genes, the genetic nature of complex traits, the genetic basis of speciation, evolution of development, and ways that evolutionary forces influence genomic and phenotypic changes.

<span class="mw-page-title-main">Neutral theory of molecular evolution</span>

The neutral theory of molecular evolution holds that most evolutionary changes occur at the molecular level, and most of the variation within and between species are due to random genetic drift of mutant alleles that are selectively neutral. The theory applies only for evolution at the molecular level, and is compatible with phenotypic evolution being shaped by natural selection as postulated by Charles Darwin. The neutral theory allows for the possibility that most mutations are deleterious, but holds that because these are rapidly removed by natural selection, they do not make significant contributions to variation within and between species at the molecular level. A neutral mutation is one that does not affect an organism's ability to survive and reproduce. The neutral theory assumes that most mutations that are not deleterious are neutral rather than beneficial. Because only a fraction of gametes are sampled in each generation of a species, the neutral theory suggests that a mutant allele can arise within a population and reach fixation by chance, rather than by selective advantage.

<span class="mw-page-title-main">Population genetics</span> Subfield of genetics

Population genetics is a subfield of genetics that deals with genetic differences within and among populations, and is a part of evolutionary biology. Studies in this branch of biology examine such phenomena as adaptation, speciation, and population structure.

Allele frequency, or gene frequency, is the relative frequency of an allele at a particular locus in a population, expressed as a fraction or percentage. Specifically, it is the fraction of all chromosomes in the population that carry that allele over the total population or sample size. Microevolution is the change in allele frequencies that occurs over time within a population.

<span class="mw-page-title-main">Genetic diversity</span> Total number of genetic characteristics in a species

Genetic diversity is the total number of genetic characteristics in the genetic makeup of a species, it ranges widely from the number of species to differences within species and can be attributed to the span of survival for a species. It is distinguished from genetic variability, which describes the tendency of genetic characteristics to vary.

Coalescent theory is a model of how alleles sampled from a population may have originated from a common ancestor. In the simplest case, coalescent theory assumes no recombination, no natural selection, and no gene flow or population structure, meaning that each variant is equally likely to have been passed from one generation to the next. The model looks backward in time, merging alleles into a single ancestral copy according to a random process in coalescence events. Under this model, the expected time between successive coalescence events increases almost exponentially back in time. Variance in the model comes from both the random passing of alleles from one generation to the next, and the random occurrence of mutations in these alleles.

Genetic hitchhiking, also called genetic draft or the hitchhiking effect, is when an allele changes frequency not because it itself is under natural selection, but because it is near another gene that is undergoing a selective sweep and that is on the same DNA chain. When one gene goes through a selective sweep, any other nearby polymorphisms that are in linkage disequilibrium will tend to change their allele frequencies too. Selective sweeps happen when newly appeared mutations are advantageous and increase in frequency. Neutral or even slightly deleterious alleles that happen to be close by on the chromosome 'hitchhike' along with the sweep. In contrast, effects on a neutral locus due to linkage disequilibrium with newly appeared deleterious mutations are called background selection. Both genetic hitchhiking and background selection are stochastic (random) evolutionary forces, like genetic drift.

<i>The Neutral Theory of Molecular Evolution</i>

The Neutral Theory of Molecular Evolution is an influential monograph written in 1983 by Japanese evolutionary biologist Motoo Kimura. While the neutral theory of molecular evolution existed since his article in 1968, Kimura felt the need to write a monograph with up-to-date information and evidences showing the importance of his theory in evolution.

In genetics, a selective sweep is the process through which a new beneficial mutation that increases its frequency and becomes fixed in the population leads to the reduction or elimination of genetic variation among nucleotide sequences that are near the mutation. In selective sweep, positive selection causes the new mutation to reach fixation so quickly that linked alleles can "hitchhike" and also become fixed.

Neutral mutations are changes in DNA sequence that are neither beneficial nor detrimental to the ability of an organism to survive and reproduce. In population genetics, mutations in which natural selection does not affect the spread of the mutation in a species are termed neutral mutations. Neutral mutations that are inheritable and not linked to any genes under selection will be lost or will replace all other alleles of the gene. That loss or fixation of the gene proceeds based on random sampling known as genetic drift. A neutral mutation that is in linkage disequilibrium with other alleles that are under selection may proceed to loss or fixation via genetic hitchhiking and/or background selection.

Background selection describes the loss of genetic diversity at a non-deleterious locus due to negative selection against linked deleterious alleles. It is one form of linked selection, where the maintenance or removal of an allele from a population is dependent upon the alleles in its linkage group. The name emphasizes the fact that the genetic background, or genomic environment, of a neutral mutation has a significant impact on whether it will be preserved or purged from a population. In some cases, the term background selection is used broadly to refer to all forms of linked selection, but most often it is used only when neutral variation is reduced due to negative selection against deleterious mutations. Background selection and all forms of linked selection contradict the assumption of the neutral theory of molecular evolution that the fixation or loss of neutral alleles is entirely stochastic, the result of genetic drift. Instead, these models predict that neutral variation is correlated with the selective pressures acting on linked non-neutral genes, that neutral traits are not necessarily oblivious to selection. Because they segregate together, non-neutral mutations linked to neutral polymorphisms result in decreased levels of genetic variation relative to predictions of neutral evolution.

In population genetics, fixation is the change in a gene pool from a situation where there exists at least two variants of a particular gene (allele) in a given population to a situation where only one of the alleles remains. In the absence of mutation or heterozygote advantage, any allele must eventually be lost completely from the population or fixed. Whether a gene will ultimately be lost or fixed is dependent on selection coefficients and chance fluctuations in allelic proportions. Fixation can refer to a gene in general or particular nucleotide position in the DNA chain (locus).

Tajima's D is a population genetic test statistic created by and named after the Japanese researcher Fumio Tajima. Tajima's D is computed as the difference between two measures of genetic diversity: the mean number of pairwise differences and the number of segregating sites, each scaled so that they are expected to be the same in a neutrally evolving population of constant size.

Population genomics is the large-scale comparison of DNA sequences of populations. Population genomics is a neologism that is associated with population genetics. Population genomics studies genome-wide effects to improve our understanding of microevolution so that we may learn the phylogenetic history and demography of a population.

<span class="mw-page-title-main">Fixed allele</span> Type of allele

A fixed allele is an allele that is the only variant that exists for that gene in a population. A fixed allele is homozygous for all members of the population. The term allele normally refers to one variant gene out of several possible for a particular locus in the DNA. When all but one allele go extinct and only one remains, that allele is said to be fixed.

Adaptive evolution results from the propagation of advantageous mutations through positive selection. This is the modern synthesis of the process which Darwin and Wallace originally identified as the mechanism of evolution. However, in the last half century, there has been considerable debate as to whether evolutionary changes at the molecular level are largely driven by natural selection or random genetic drift. Unsurprisingly, the forces which drive evolutionary changes in our own species’ lineage have been of particular interest. Quantifying adaptive evolution in the human genome gives insights into our own evolutionary history and helps to resolve this neutralist-selectionist debate. Identifying specific regions of the human genome that show evidence of adaptive evolution helps us find functionally significant genes, including genes important for human health, such as those associated with diseases.

The McDonald–Kreitman test is a statistical test often used by evolutionary and population biologists to detect and measure the amount of adaptive evolution within a species by determining whether adaptive evolution has occurred, and the proportion of substitutions that resulted from positive selection. To do this, the McDonald–Kreitman test compares the amount of variation within a species (polymorphism) to the divergence between species (substitutions) at two types of sites, neutral and nonneutral. A substitution refers to a nucleotide that is fixed within one species, but a different nucleotide is fixed within a second species at the same base pair of homologous DNA sequences. A site is nonneutral if it is either advantageous or deleterious. The two types of sites can be either synonymous or nonsynonymous within a protein-coding region. In a protein-coding sequence of DNA, a site is synonymous if a point mutation at that site would not change the amino acid, also known as a silent mutation. Because the mutation did not result in a change in the amino acid that was originally coded for by the protein-coding sequence, the phenotype, or the observable trait, of the organism is generally unchanged by the silent mutation. A site in a protein-coding sequence of DNA is nonsynonymous if a point mutation at that site results in a change in the amino acid, resulting in a change in the organism's phenotype. Typically, silent mutations in protein-coding regions are used as the "control" in the McDonald–Kreitman test.

The HKA Test, named after Richard R. Hudson, Martin Kreitman, and Montserrat Aguadé, is a statistical test used in genetics to evaluate the predictions of the Neutral Theory of molecular evolution. By comparing the polymorphism within each species and the divergence observed between two species at two or more loci, the test can determine whether the observed difference is likely due to neutral evolution or rather due to adaptive evolution. Developed in 1987, the HKA test is a precursor to the McDonald-Kreitman test, which was derived in 1991. The HKA test is best used to look for balancing selection, recent selective sweeps or other variation-reducing forces.

In population genetics, the allele frequency spectrum, sometimes called the site frequency spectrum, is the distribution of the allele frequencies of a given set of loci in a population or sample. Because an allele frequency spectrum is often a summary of or compared to sequenced samples of the whole population, it is a histogram with size depending on the number of sequenced individual chromosomes. Each entry in the frequency spectrum records the total number of loci with the corresponding derived allele frequency. Loci contributing to the frequency spectrum are assumed to be independently changing in frequency. Furthermore, loci are assumed to be biallelic, although extensions for multiallelic frequency spectra exist.

In genetics, when multiple copies of a beneficial mutation become established and fix together it is called soft sweep. Depending on the origin of these copies, linked variants might then be retained and emerge as haplotype structures in the population. There are two major forms of soft sweeps:

References

  1. Fay, JC.; Wu, CI. (July 2000). "Hitchhiking under positive Darwinian selection". Genetics. 155 (3): 1405–13. PMC   1461156 . PMID   10880498.
  2. Tajima F (November 1989). "Statistical method for testing the neutral mutation hypothesis by DNA polymorphism". Genetics. 123 (3): 585–95. doi:10.1093/genetics/123.3.585. PMC   1203831 . PMID   2513255.
  3. Hedrick, Philip W. (2005). Genetics of Populations. Jones & Bartlett Learning. p. 436. ISBN   978-0-7637-4772-5.
  4. Sterken R, Kiekens R, Coppens E, Vercauteren I, Zabeau M, Inzé D, Flowers J, Vuylsteke M (October 2009). "A population genomics study of the Arabidopsis core cell cycle genes shows the signature of natural selection". Plant Cell. 21 (10): 2987–98. doi:10.1105/tpc.109.067017. PMC   2782269 . PMID   19880799.

Further reading

Computational tools: