Soft selective sweep

Last updated

In genetics, when multiple copies of a beneficial mutation become established and fix together it is called soft sweep. [1] [2] Depending on the origin of these copies, linked variants might then be retained and emerge as haplotype structures in the population. There are two major forms of soft sweeps:

Contents

  1. A beneficial mutation previously separated in the population neutrally and therefore existed as multiple haplotypes at the time of the selective shift in which the mutation became beneficial. In this way, a single beneficial mutation may carry multiple haplotypes to an intermediate frequency, while itself becomes fixed.
  2. Another model happening when multiple beneficial mutations independently occur in short succession of one another — consequently, a second copy occur through mutation before the selective fixation of the first copy. [3]

Soft sweeps can occur from both standing variation and rapidly repeating beneficial mutations. [4] [5] [6]

Overview

Overview of two soft selective sweep models (Jensen, J., 2014). Soft selective sweep.png
Overview of two soft selective sweep models (Jensen, J., 2014).

A selective sweep occurs when, due to strong positive natural selection, beneficial alleles quickly go to fixation in a population and results in the reduction or elimination of variation among the nucleotides near that allele. [7] A selective sweep can occur when a rare or a formerly absent allele that improves the fitness of the carrier relative to other members of the population increases in frequency quickly due to natural selection. As the frequency of such a beneficial allele increases, genetic variants that happen to be present in the DNA neighborhood of the beneficial allele will also become more prevalent; this phenomenon called genetic hitchhiking. [6] [8] A Selective sweep arise if rapid changes within the frequency of a beneficial allele, driven by positive selection, distort the genealogical history of samples from the region around the selected locus. It is now recognized that not all sweeps reduce genetic variation in the same way, but rather selective sweeps can be categorized into three main categories: [9]

  1. The classic selective sweep or hard sweep is expected to occur when beneficial mutations are rare but when a beneficial mutation that has occurred increases in frequency rapidly, drastically reducing genetic variation in the population.
  2. Soft sweeps from standing genetic variation (SGV) occurs when previously neutral mutations that were present in a population become beneficial because of an environmental change. Such a mutation may be present on several genomic backgrounds so that when it rapidly increases in frequency it does not erase all genetic variation in the population.
  3. A multiple origin soft sweep happens when mutations are common, for example in a large population, so that the same or similar beneficial mutations occur on a different genomic background such that no single genomic background can hitchhike the high frequency. [2]

Whether the selective sweep has occurred can be explored in various ways. One method is to measure linkage disequilibrium, that is whether a given haplotype is overrepresented in the population. Under neutral evolution, genetic recombination will result in the reshuffling of the different alleles within the haplotypes, and no single haplotype will dominate the population. However, during a selective sweep, selection for a positively selected gene variant will also result in hitchhiking of neighboring alleles and less opportunity for recombination. Therefore, the presence of strong linkage disequilibrium might indicate that there has been a selective sweep and can be used to identify sites recently under selection. There have been many scans for selective sweeps in humans and other species using a variety of statistical approaches and assumptions. [9]

Differences between soft and hard sweeps

The main difference between soft and hard selective sweeps lies in the expected number of different haplotypes carrying the beneficial mutation or mutations, and therefore in the expected number of haplotypes that hitchhike to considerable frequency during the selective sweep, and which remain in the population at the time of fixation. This key difference results in different expectations in both the site frequency spectrum and in linkage disequilibrium, and consequently in the frequent test statistics based on these forms. [2] If hard sweeps facilitate evolutionary rescue, then just a single ancestor is responsible for the spread of the advantageous variants and so genetic diversity will be removed from the population as a consequence of adaptation as well as demographic decline. On the other hand, a soft sweep, in which the beneficial allele is independently derived in multiple ancestors, will keep certain ancestral diversity that existed prior to the environmental shift that initiated the fitness changes. [9] [7]

Detecting soft sweeps

Is there any way to separate soft and hard sweeps? Obviously, only recent adaptive events leave a measurable signal at all (hard or soft). Signals from the site frequency spectrum (like the excess of rare alleles that is picked up by Tajima 1989 [10] ) usually fade on time scales of ~ 0.1Ne generations, while signals based on linkage disequilibrium or haplotype statistics only last ~ 0.01Ne generations. [11] [12] To find it easily, selection must be strong (4NeSb≫100). Even then, soft sweeps can be difficult to discriminate from neutrality if they are ‘super soft’, i.e., if there are numerous independent origins of the beneficial allele, or if its starting frequency in the SGV is high. [13] [14] For a strong interpretation of selection versus neutrality, we need a test statistic with reliably high power for hard and soft sweeps. Based on above-described patterns, and as exhibited, [12] [15] tests based on the site frequency spectrum (looking for low- or high-frequency derived alleles) have low power to reveal soft sweeps, whereas haplotype tests can detect both types of sweeps. [16] In contrast to single-origin soft sweeps (which always leave a weaker footprint), the capability to detect multiple-origin soft sweeps can be higher than the capability to detect completed hard sweeps due to the clear haplotype structure right at the selected site. [12] Detecting soft sweeps with a single origin is difficult. Some studies and tests based on a combination of summary statistics have been developed by Peter, Huerta-Sanchez & Nielsen (2012) [13] and by Schrider & Kern (2016). [17] Both tests have reliable power to find soft sweeps for robust selection and a high starting frequency (5–20%) of the selected allele. In addition, well-defined practical instances typically rely on other indications, go with footprint: [18] e.g., a source population is recognized with the selected allele in the SGV (e.g., marine and freshwater sticklebacks, [19] or identified and very recent selection pressure does not leave enough time for the allele to increase from a single copy to the frequency observed today (for example CCR5 adaptation to HIV in humans). [20] On the whole, soft sweeps with multiple origins have better chances to be detected. [12] [16]

See also

Related Research Articles

<span class="mw-page-title-main">Natural selection</span> Mechanism of evolution by differential survival and reproduction of individuals

Natural selection is the differential survival and reproduction of individuals due to differences in phenotype. It is a key mechanism of evolution, the change in the heritable traits characteristic of a population over generations. Charles Darwin popularised the term "natural selection", contrasting it with artificial selection, which is intentional, whereas natural selection is not.

Genetic drift, also known as random genetic drift, allelic drift or the Wright effect, is the change in the frequency of an existing gene variant (allele) in a population due to random chance.

<span class="mw-page-title-main">Neutral theory of molecular evolution</span> Theory of evolution by changes at the molecular level

The neutral theory of molecular evolution holds that most evolutionary changes occur at the molecular level, and most of the variation within and between species are due to random genetic drift of mutant alleles that are selectively neutral. The theory applies only for evolution at the molecular level, and is compatible with phenotypic evolution being shaped by natural selection as postulated by Charles Darwin.

Population genetics is a subfield of genetics that deals with genetic differences within and among populations, and is a part of evolutionary biology. Studies in this branch of biology examine such phenomena as adaptation, speciation, and population structure.

In population genetics, linkage disequilibrium (LD) is the non-random association of alleles at different loci in a given population. Loci are said to be in linkage disequilibrium when the frequency of association of their different alleles is higher or lower than expected if the loci were independent and associated randomly.

<span class="mw-page-title-main">Chromosomal inversion</span> Chromosome rearrangement in which a segment of a chromosome is reversed

An inversion is a chromosome rearrangement in which a segment of a chromosome becomes inverted within its original position. An inversion occurs when a chromosome undergoes a two breaks within the chromosomal arm, and the segment between the two breaks inserts itself in the opposite direction in the same chromosome arm. The breakpoints of inversions often happen in regions of repetitive nucleotides, and the regions may be reused in other inversions. Chromosomal segments in inversions can be as small as 1 kilobases or as large as 100 megabases. The number of genes captured by an inversion can range from a handful of genes to hundreds of genes. Inversions can happen either through ectopic recombination between repetitive sequences, or through chromosomal breakage followed by non-homologous end joining.

In population genetics and population ecology, population size is a countable quantity representing the number of individual organisms in a population. Population size is directly associated with amount of genetic drift, and is the underlying cause of effects like population bottlenecks and the founder effect. Genetic drift is the major source of decrease of genetic diversity within populations which drives fixation and can potentially lead to speciation events.

Mutation–selection balance is an equilibrium in the number of deleterious alleles in a population that occurs when the rate at which deleterious alleles are created by mutation equals the rate at which deleterious alleles are eliminated by selection. The majority of genetic mutations are neutral or deleterious; beneficial mutations are relatively rare. The resulting influx of deleterious mutations into a population over time is counteracted by negative selection, which acts to purge deleterious mutations. Setting aside other factors, the equilibrium number of deleterious alleles is then determined by a balance between the deleterious mutation rate and the rate at which selection purges those mutations.

Genetic association is when one or more genotypes within a population co-occur with a phenotypic trait more often than would be expected by chance occurrence.

Genetic hitchhiking, also called genetic draft or the hitchhiking effect, is when an allele changes frequency not because it itself is under natural selection, but because it is near another gene that is undergoing a selective sweep and that is on the same DNA chain. When one gene goes through a selective sweep, any other nearby polymorphisms that are in linkage disequilibrium will tend to change their allele frequencies too. Selective sweeps happen when newly appeared mutations are advantageous and increase in frequency. Neutral or even slightly deleterious alleles that happen to be close by on the chromosome 'hitchhike' along with the sweep. In contrast, effects on a neutral locus due to linkage disequilibrium with newly appeared deleterious mutations are called background selection. Both genetic hitchhiking and background selection are stochastic (random) evolutionary forces, like genetic drift.

In genetics, a selective sweep is the process through which a new beneficial mutation that increases its frequency and becomes fixed in the population leads to the reduction or elimination of genetic variation among nucleotide sequences that are near the mutation. In selective sweep, positive selection causes the new mutation to reach fixation so quickly that linked alleles can "hitchhike" and also become fixed.

An evolutionary landscape is a metaphor or a construct used to think about and visualize the processes of evolution acting on a biological entity. This entity can be viewed as searching or moving through a search space. For example, the search space of a gene would be all possible nucleotide sequences. The search space is only part of an evolutionary landscape. The final component is the "y-axis", which is usually fitness. Each value along the search space can result in a high or low fitness for the entity. If small movements through search space cause changes in fitness that are relatively small, then the landscape is considered smooth. Smooth landscapes happen when most fixed mutations have little to no effect on fitness, which is what one would expect with the neutral theory of molecular evolution. In contrast, if small movements result in large changes in fitness, then the landscape is said to be rugged. In either case, movement tends to be toward areas of higher fitness, though usually not the global optima.

Background selection describes the loss of genetic diversity at a locus due to negative selection against deleterious alleles with which it is in linkage disequilibrium. The name emphasizes the fact that the genetic background, or genomic environment, of a mutation has a significant impact on whether it will be preserved versus lost from a population. Background selection contradicts the assumption of the neutral theory of molecular evolution that the fixation or loss of a neutral allele can be described by one-locus models of genetic drift, independently from other loci. As well as reducing neutral nucleotide diversity, background selection reduces the fixation probability of beneficial mutations, and increases the fixation probability of deleterious mutations.

In population genetics, fixation is the change in a gene pool from a situation where there exists at least two variants of a particular gene (allele) in a given population to a situation where only one of the alleles remains. That is, the allele becomes fixed. In the absence of mutation or heterozygote advantage, any allele must eventually either be lost completely from the population, or fixed, i.e. permanently established at 100% frequency in the population. Whether a gene will ultimately be lost or fixed is dependent on selection coefficients and chance fluctuations in allelic proportions. Fixation can refer to a gene in general or particular nucleotide position in the DNA chain (locus).

Fay and Wu's H is a statistical test created by and named after two researchers Justin Fay and Chung-I Wu. The purpose of the test is to distinguish between a DNA sequence evolving randomly ("neutrally") and one evolving under positive selection. This test is an advancement over Tajima's D, which is used to differentiate neutrally evolving sequences from those evolving non-randomly. Fay and Wu's H is frequently used to identify sequences which have experienced selective sweeps in their evolutionary history.

Hybrizyme is a term coined to indicate novel or normally rare gene variants that are associated with hybrid zones, geographic areas where two related taxa meet, mate, and produce hybrid offspring. The hybrizyme phenomenon is widespread and these alleles occur commonly, if not in all hybrid zones. Initially considered to be caused by elevated rates of mutation in hybrids, the most probable hypothesis infers that they are the result of negative (purifying) selection. Namely, in the center of the hybrid zone, negative selection purges alleles against hybrid disadvantage. Stated differently, any allele that will decrease reproductive isolation is favored and any linked alleles also increase their frequency by genetic hitchhiking. If the linked alleles used to be rare variants in the parental taxa, they will become more common in the area where the hybrids are formed.

Polygenic adaptation describes a process in which a population adapts through small changes in allele frequencies at hundreds or thousands of loci.

Allele age is the amount of time elapsed since an allele first appeared due to mutation. Estimating the time at which a certain allele appeared allows researchers to infer patterns of human migration, disease, and natural selection. Allele age can be estimated based on (1) the frequency of the allele in a population and (2) the genetic variation that occurs within different copies of the allele, also known as intra-allelic variation. While either of these methods can be used to estimate allele age, the use of both increases the accuracy of the estimation and can sometimes offer additional information regarding the presence of selection.

This glossary of genetics and evolutionary biology is a list of definitions of terms and concepts used in the study of genetics and evolutionary biology, as well as sub-disciplines and related fields, with an emphasis on classical genetics, quantitative genetics, population biology, phylogenetics, speciation, and systematics. Overlapping and related terms can be found in Glossary of cellular and molecular biology, Glossary of ecology, and Glossary of biology.

Haldane's sieve is a concept in population genetics named after the British geneticist J. B. S. Haldane. It refers to the fact that dominant advantageous alleles are more likely to fix in the population than recessive alleles. Haldane's sieve is particularly relevant in situations where the effects of natural selection are strong and the beneficial mutations have a significant impact on an organism's fitness.

References

  1. Paulose J, Hermisson J, Hallatschek O (2019) Spatial soft sweeps: Patterns of adaptation in populations with long-range dispersal. PLoS Genet 15(2): e1007936. https://doi.org/10.1371/journal.pgen.1007936
  2. 1 2 3 Jensen, J. On the unfounded enthusiasm for soft selective sweeps. Nat Commun 5, 5281 (2014). https://doi.org/10.1038/ncomms6281.
  3. Harris, Rebecca B.; Sackman, Andrew; Jensen, Jeffrey D. (28 December 2018). "On the unfounded enthusiasm for soft selective sweeps II: Examining recent evidence from humans, flies, and viruses". PLOS Genetics. 14 (12): e1007859. doi: 10.1371/journal.pgen.1007859 . PMC   6336318 . PMID   30592709.
  4. .Harris, Rebecca B.; Sackman, Andrew; Jensen, Jeffrey D. (28 December 2018). "On the unfounded enthusiasm for soft selective sweeps II: Examining recent evidence from humans, flies, and viruses". PLOS Genetics. 14 (12): e1007859. doi: 10.1371/journal.pgen.1007859 . PMC   6336318 . PMID   30592709.
  5. Benjamin A. Wilson, Pleuni S. Pennings and Dmitri A. Petrov Genetics April 1, 2017 vol. 205 no. 4 1573-1586; https://doi.org/10.1534/genetics.116.191478.
  6. 1 2 Schaffner, S. & Sabeti, P. (2008) Evolutionary adaptation in the human lineage. Nature Education 1(1):14
  7. 1 2 Colin M. Brand, Frances J. White, Nelson Ting, Timothy H. Webster bioRxiv 2020.12.14.422788; doi: https://doi.org/10.1101/2020.12.14.422788
  8. Graur, Dan, 2016. Molecular and genome evolution, Molecular and Genome Evolution, Sinauer Associates, an imprint of Oxford University Press. ISBN   9781605354699
  9. 1 2 3 Wilson, B.A., Pennings, P.S., Petrov, D.A., 2017. Soft Selective Sweeps in Evolutionary Rescue. Genetics 205, 1573–1586. https://doi.org/10.1534/genetics.116.191478
  10. Tajima, F. (1989) Statistical method for testing the neutral mutation hypothesis. Genetics, 123, 585–595.
  11. Przeworski, M. (2002) The signature of positive selection at randomly chosen loci. Genetics, 160, 1179–1189.
  12. 1 2 3 4 Pennings, P.S. & Hermisson, J. (2006) Soft sweeps III–the signature of positive selection from recurrent mutation. PLoS Genetics, 2, e186.
  13. 1 2 Peter, B.M., Huerta-Sanchez, E. & Nielsen, R. (2012) Distinguishing between selective sweeps from standing variation and from a de novo mutation. PLoS Genetics, 8, e1003011.
  14. Berg, J.J. & Coop, G. (2015) A coalescent model for a sweep of a unique standing variant. Genetics, 201, 707–725.
  15. Ferrer-Admetlla, A., Liang, M., Korneliussen, T. &Nielsen, R. (2014) On detecting incomplete soft or hard selective sweeps using haplotype structure. Molecular Biology and Evolution, 31, 1275–1291.
  16. 1 2 Garud, N.R., Messer, P.W., Buzbas, E.O. & Petrov, D.A. (2015) Recent selective sweeps in North American Drosophila melanogaster show signatures of soft sweeps. PLoS Genetics, 11, e1005004.
  17. Schrider, D.R. & Kern, A.D. (2016a) S/HIC: robust identification of soft and hard sweeps using machine learning. PLoS Genetics, 12, e1005928.
  18. Barrett R.D.H. & Schluter, D. (2008) Adaptation from standing genetic variation. Trends in Ecology & Evolution, 23, 38–44.
  19. Colosimo, P.F., Hosemann, K.E., Balabhadra, S. et al. (2005)Widespread parallel evolution in sticklebacks by repeated fixation of ectodysplasin alleles. Science, 307, 1928–1933.
  20. Novembre, J. & Han, E. (2012) Human population structure and the adaptive response to pathogen-induced selection pressures. Philosophical Transactions of the Royal Society B, 367, 878–886.