Selective sweep

Last updated

In genetics, a selective sweep is the process through which a new beneficial mutation that increases its frequency and becomes fixed (i.e., reaches a frequency of 1) in the population leads to the reduction or elimination of genetic variation among nucleotide sequences that are near the mutation. In selective sweep, positive selection causes the new mutation to reach fixation so quickly that linked alleles can "hitchhike" and also become fixed.

Contents

Overview

A selective sweep can occur when a rare or previously non-existing allele that increases the fitness of the carrier (relative to other members of the population) increases rapidly in frequency due to natural selection. As the prevalence of such a beneficial allele increases, genetic variants that happen to be present on the genomic background (the DNA neighborhood) of the beneficial allele will also become more prevalent. This is called genetic hitchhiking . A selective sweep due to a strongly selected allele, which arose on a single genomic background, therefore results in a region of the genome with a large reduction of genetic variation in that chromosome region. The idea that strong positive selection could reduce nearby genetic variation due to hitchhiking was proposed by John Maynard-Smith and John Haigh in 1974. [1]

Not all sweeps reduce genetic variation in the same way. Sweeps can be placed into three main categories:

  1. The "classic selective sweep" or "hard selective sweep" is expected to occur when beneficial mutations are rare, but once a beneficial mutation has occurred it increases in frequency rapidly, thereby drastically reducing genetic variation in the population. [1]
  2. Another type of sweep, a "soft sweep from standing genetic variation," occurs when a previously neutral mutation that was present in a population becomes beneficial because of an environmental change. Such a mutation may be present on several genomic backgrounds so that when it rapidly increases in frequency, it doesn't erase all genetic variation in the population. [2]
  3. Finally, a "multiple origin soft sweep" occurs when mutations are common (for example in a large population) so that the same or similar beneficial mutations occurs on different genomic backgrounds such that no single genomic background can hitchhike to high frequency. [3]
This is a diagram of a hard selective sweep. It shows the different steps (a beneficial mutation occurs, increases in frequency and fixes in a population) and the effect on nearby genetic variation. HardSelectiveSweep.jpg
This is a diagram of a hard selective sweep. It shows the different steps (a beneficial mutation occurs, increases in frequency and fixes in a population) and the effect on nearby genetic variation.

Sweeps do not occur when selection simultaneously causes very small shifts in allele frequencies at many loci each with standing variation (polygenic adaptation).

This is a diagram of a soft selective sweep from standing genetic variation. It shows the different steps (a neutral mutation becomes beneficial, increases in frequency and fixes in a population) and the effect on nearby genetic variation. SoftSGVSelectiveSweep.jpg
This is a diagram of a soft selective sweep from standing genetic variation. It shows the different steps (a neutral mutation becomes beneficial, increases in frequency and fixes in a population) and the effect on nearby genetic variation.
This is a diagram of a multiple origin soft selective sweep from recurrent mutation. It shows the different steps (a beneficial mutation occurs and increases in frequency, but before it fixes the same mutation occur again on a second genomic background, together, the mutations fix in the population) and the effect on nearby genetic variation. MultOriginSoftSelectiveSweep.jpg
This is a diagram of a multiple origin soft selective sweep from recurrent mutation. It shows the different steps (a beneficial mutation occurs and increases in frequency, but before it fixes the same mutation occur again on a second genomic background, together, the mutations fix in the population) and the effect on nearby genetic variation.

Detection

Whether or not a selective sweep has occurred can be investigated in various ways. One method is to measure linkage disequilibrium, i.e., whether a given haplotype is overrepresented in the population. Under neutral evolution, genetic recombination will result in the reshuffling of the different alleles within a haplotype, and no single haplotype will dominate the population. However, during a selective sweep, selection for a positively selected gene variant will also result in selection of neighbouring alleles and less opportunity for recombination. Therefore, the presence of strong linkage disequilibrium might indicate that there has been a recent selective sweep, and can be used to identify sites recently under selection.

There have been many scans for selective sweeps in humans and other species, using a variety of statistical approaches and assumptions. [4]

In maize, a recent comparison of yellow and white corn genotypes surrounding Y1—the phytoene synthetase gene responsible for the yellow endosperm color, shows strong evidence for a selective sweep in yellow germplasm reducing diversity at this locus and linkage disequilibrium in surrounding regions. White maize lines had increased diversity and no evidence of linkage disequilibrium associated with a selective sweep. [5]

Relevance to disease

Because selective sweeps allow for rapid adaptation, they have been cited as a key factor in the ability of pathogenic bacteria and viruses to attack their hosts and survive the medicines we use to treat them. [6] In such systems, the competition between host and parasite is often characterized as an evolutionary "arms race", so the more rapidly one organism can change its method of attack or defense, the better. This has elsewhere been described by the Red Queen hypothesis. Needless to say, a more effective pathogen or a more resistant host will have an adaptive advantage over its conspecifics, providing the fuel for a selective sweep.

One example comes from the human influenza virus, which has been involved in an adaptive contest with humans for hundreds of years. While antigenic drift (the gradual change of surface antigens) is considered the traditional model for changes in the viral genotype, recent evidence [7] suggests that selective sweeps play an important role as well. In several flu populations, the time to the most recent common ancestor (TMRCA) of "sister" strains, an indication of relatedness, suggested that they had all evolved from a common progenitor within just a few years. Periods of low genetic diversity, presumably resultant from genetic sweeps, gave way to increasing diversity as different strains adapted to their own locales.

A similar case can be found in Toxoplasma gondii , a remarkably potent protozoan parasite capable of infecting warm-blooded animals. T. gondii was recently discovered to exist in only three clonal lineages in all of Europe and North America. [8] In other words, there are only three genetically distinct strains of this parasite in all of the Old World and much of the New World. These three strains are characterized by a single monomorphic version of the gene Chr1a, which emerged at approximately the same time as the three modern clones. It appears then, that a novel genotype emerged containing this form of Chr1a and swept the entire European and North American population of Toxoplasma gondii, bringing with it the rest of its genome via genetic hitchhiking. The South American strains of T. gondii, of which there are far more than exist elsewhere, also carry this allele of Chr1a.

Involvement in agriculture and domestication

Rarely are genetic variability and its opposing forces, including adaptation, more relevant than in the generation of domestic and agricultural species. Cultivated crops, for example, have essentially been genetically modified for more than ten thousand years, [9] subjected to artificial selective pressures, and forced to adapt rapidly to new environments. Selective sweeps provide a baseline from which different varietals could have emerged. [10]

For example, recent study of the corn ( Zea mays ) genotype uncovered dozens of ancient selective sweeps uniting modern cultivars on the basis of shared genetic data possibly dating back as far as domestic corn's wild counterpart, teosinte. In other words, though artificial selection has shaped the genome of corn into a number of distinctly adapted cultivars, selective sweeps acting early in its development provide a unifying homoplasy of genetic sequence. In a sense, the long-buried sweeps may give evidence of corn's, and teosinte's, ancestral state by elucidating a common genetic background between the two.

Another example of the role of selective sweeps in domestication comes from the chicken. A Swedish research group recently used parallel sequencing techniques to examine eight cultivated varieties of chicken and their closest wild ancestor with the goal of uncovering genetic similarities resultant from selective sweeps. [11] They managed to uncover evidence of several selective sweeps, most notably in the gene responsible for thyroid-stimulating hormone receptor (TSHR), which regulates the metabolic and photoperiod-related elements of reproduction. What this suggests is that, at some point in the domestication of the chicken, a selective sweep, probably driven by human intervention, subtly changed the reproductive machinery of the bird, presumably to the advantage of its human manipulators.

In humans

Examples of selective sweeps in humans are in variants affecting lactase persistence, [12] [13] and adaptation to high altitude. [14]

See also

Related Research Articles

<span class="mw-page-title-main">Natural selection</span> Mechanism of evolution by differential survival and reproduction of individuals

Natural selection is the differential survival and reproduction of individuals due to differences in phenotype. It is a key mechanism of evolution, the change in the heritable traits characteristic of a population over generations. Charles Darwin popularised the term "natural selection", contrasting it with artificial selection, which is intentional, whereas natural selection is not.

Population genetics is a subfield of genetics that deals with genetic differences within and among populations, and is a part of evolutionary biology. Studies in this branch of biology examine such phenomena as adaptation, speciation, and population structure.

Allele frequency, or gene frequency, is the relative frequency of an allele at a particular locus in a population, expressed as a fraction or percentage. Specifically, it is the fraction of all chromosomes in the population that carry that allele over the total population or sample size. Microevolution is the change in allele frequencies that occurs over time within a population.

In population genetics, linkage disequilibrium (LD) is the non-random association of alleles at different loci in a given population. Loci are said to be in linkage disequilibrium when the frequency of association of their different alleles is higher or lower than expected if the loci were independent and associated randomly.

<span class="mw-page-title-main">Chromosomal inversion</span> Chromosome rearrangement in which a segment of a chromosome is reversed

An inversion is a chromosome rearrangement in which a segment of a chromosome becomes inverted within its original position. An inversion occurs when a chromosome undergoes a two breaks within the chromosomal arm, and the segment between the two breaks inserts itself in the opposite direction in the same chromosome arm. The breakpoints of inversions often happen in regions of repetitive nucleotides, and the regions may be reused in other inversions. Chromosomal segments in inversions can be as small as 1 kilobases or as large as 100 megabases. The number of genes captured by an inversion can range from a handful of genes to hundreds of genes. Inversions can happen either through ectopic recombination between repetitive sequences, or through chromosomal breakage followed by non-homologous end joining.

Genetic load is the difference between the fitness of an average genotype in a population and the fitness of some reference genotype, which may be either the best present in a population, or may be the theoretically optimal genotype. The average individual taken from a population with a low genetic load will generally, when grown in the same conditions, have more surviving offspring than the average individual from a population with a high genetic load. Genetic load can also be seen as reduced fitness at the population level compared to what the population would have if all individuals had the reference high-fitness genotype. High genetic load may put a population in danger of extinction.

Genetic association is when one or more genotypes within a population co-occur with a phenotypic trait more often than would be expected by chance occurrence.

Lactase persistence is the continued activity of the lactase enzyme in adulthood, allowing the digestion of lactose in milk. In most mammals, the activity of the enzyme is dramatically reduced after weaning. In some human populations though, lactase persistence has recently evolved as an adaptation to the consumption of nonhuman milk and dairy products beyond infancy. Lactase persistence is very high among northern Europeans, especially Irish people. Worldwide, most people are lactase non-persistent, and are affected by varying degrees of lactose intolerance as adults. However, lactase persistence and lactose intolerance can overlap.

Genetic hitchhiking, also called genetic draft or the hitchhiking effect, is when an allele changes frequency not because it itself is under natural selection, but because it is near another gene that is undergoing a selective sweep and that is on the same DNA chain. When one gene goes through a selective sweep, any other nearby polymorphisms that are in linkage disequilibrium will tend to change their allele frequencies too. Selective sweeps happen when newly appeared mutations are advantageous and increase in frequency. Neutral or even slightly deleterious alleles that happen to be close by on the chromosome 'hitchhike' along with the sweep. In contrast, effects on a neutral locus due to linkage disequilibrium with newly appeared deleterious mutations are called background selection. Both genetic hitchhiking and background selection are stochastic (random) evolutionary forces, like genetic drift.

An evolutionary landscape is a metaphor or a construct used to think about and visualize the processes of evolution acting on a biological entity. This entity can be viewed as searching or moving through a search space. For example, the search space of a gene would be all possible nucleotide sequences. The search space is only part of an evolutionary landscape. The final component is the "y-axis", which is usually fitness. Each value along the search space can result in a high or low fitness for the entity. If small movements through search space cause changes in fitness that are relatively small, then the landscape is considered smooth. Smooth landscapes happen when most fixed mutations have little to no effect on fitness, which is what one would expect with the neutral theory of molecular evolution. In contrast, if small movements result in large changes in fitness, then the landscape is said to be rugged. In either case, movement tends to be toward areas of higher fitness, though usually not the global optima.

Background selection describes the loss of genetic diversity at a locus due to negative selection against deleterious alleles with which it is in linkage disequilibrium. The name emphasizes the fact that the genetic background, or genomic environment, of a mutation has a significant impact on whether it will be preserved versus lost from a population. Background selection contradicts the assumption of the neutral theory of molecular evolution that the fixation or loss of a neutral allele can be described by one-locus models of genetic drift, independently from other loci. As well as reducing neutral nucleotide diversity, background selection reduces the fixation probability of beneficial mutations, and increases the fixation probability of deleterious mutations.

In population genetics, fixation is the change in a gene pool from a situation where there exists at least two variants of a particular gene (allele) in a given population to a situation where only one of the alleles remains. That is, the allele becomes fixed. In the absence of mutation or heterozygote advantage, any allele must eventually either be lost completely from the population, or fixed, i.e. permanently established at 100% frequency in the population. Whether a gene will ultimately be lost or fixed is dependent on selection coefficients and chance fluctuations in allelic proportions. Fixation can refer to a gene in general or particular nucleotide position in the DNA chain (locus).

<span class="mw-page-title-main">1000 Genomes Project</span> International research effort on genetic variation

The 1000 Genomes Project (1KGP), taken place from January 2008 to 2015, was an international research effort to establish the most detailed catalogue of human genetic variation at the time. Scientists planned to sequence the genomes of at least one thousand anonymous healthy participants from a number of different ethnic groups within the following three years, using advancements in newly developed technologies. In 2010, the project finished its pilot phase, which was described in detail in a publication in the journal Nature. In 2012, the sequencing of 1092 genomes was announced in a Nature publication. In 2015, two papers in Nature reported results and the completion of the project and opportunities for future research.

Host–parasite coevolution is a special case of coevolution, where a host and a parasite continually adapt to each other. This can create an evolutionary arms race between them. A more benign possibility is of an evolutionary trade-off between transmission and virulence in the parasite, as if it kills its host too quickly, the parasite will not be able to reproduce either. Another theory, the Red Queen hypothesis, proposes that since both host and parasite have to keep on evolving to keep up with each other, and since sexual reproduction continually creates new combinations of genes, parasitism favours sexual reproduction in the host.

Quantitative trait loci mapping or QTL mapping is the process of identifying genomic regions that potentially contain genes responsible for important economic, health or environmental characters. Mapping QTLs is an important activity that plant breeders and geneticists routinely use to associate potential causal genes with phenotypes of interest. Family-based QTL mapping is a variant of QTL mapping where multiple-families are used.

Hybrizyme is a term coined to indicate novel or normally rare gene variants that are associated with hybrid zones, geographic areas where two related taxa meet, mate, and produce hybrid offspring. The hybrizyme phenomenon is widespread and these alleles occur commonly, if not in all hybrid zones. Initially considered to be caused by elevated rates of mutation in hybrids, the most probable hypothesis infers that they are the result of negative (purifying) selection. Namely, in the center of the hybrid zone, negative selection purges alleles against hybrid disadvantage. Stated differently, any allele that will decrease reproductive isolation is favored and any linked alleles also increase their frequency by genetic hitchhiking. If the linked alleles used to be rare variants in the parental taxa, they will become more common in the area where the hybrids are formed.

Polygenic adaptation describes a process in which a population adapts through small changes in allele frequencies at hundreds or thousands of loci.

Allele age is the amount of time elapsed since an allele first appeared due to mutation. Estimating the time at which a certain allele appeared allows researchers to infer patterns of human migration, disease, and natural selection. Allele age can be estimated based on (1) the frequency of the allele in a population and (2) the genetic variation that occurs within different copies of the allele, also known as intra-allelic variation. While either of these methods can be used to estimate allele age, the use of both increases the accuracy of the estimation and can sometimes offer additional information regarding the presence of selection.

Sarah Anne Tishkoff is an American geneticist and the David and Lyn Silfen Professor in the Department of Genetics and Biology at the University of Pennsylvania. She also serves as a director for the American Society of Human Genetics and is an associate editor at PLOS Genetics, G3, and Genome Research. She is also a member of the scientific advisory board at the David and Lucile Packard Foundation.

In genetics, when multiple copies of a beneficial mutation become established and fix together it is called soft sweep. Depending on the origin of these copies, linked variants might then be retained and emerge as haplotype structures in the population. There are two major forms of soft sweeps:

  1. A beneficial mutation previously separated in the population neutrally and therefore existed as multiple haplotypes at the time of the selective shift in which the mutation became beneficial. In this way, a single beneficial mutation may carry multiple haplotypes to an intermediate frequency, while itself becomes fixed.
  2. Another model happening when multiple beneficial mutations independently occur in short succession of one another — consequently, a second copy occur through mutation before the selective fixation of the first copy.

References

  1. 1 2 Smith, John Maynard; Haigh, John (1974-02-01). "The hitch-hiking effect of a favourable gene". Genetics Research. 23 (1): 23–35. doi: 10.1017/S0016672300014634 . PMID   4407212.
  2. Hermisson, Joachim; Pennings, Pleuni S. (2005-04-01). "Soft Sweeps". Genetics. 169 (4): 2335–2352. doi:10.1534/genetics.104.036947. PMC   1449620 . PMID   15716498.
  3. Pennings, Pleuni S.; Hermisson, Joachim (2006-05-01). "Soft Sweeps II—Molecular Population Genetics of Adaptation from Recurrent Mutation or Migration". Molecular Biology and Evolution. 23 (5): 1076–1084. doi: 10.1093/molbev/msj117 . PMID   16520336.
  4. Fu, Wenqing; Akey, Joshua M. (2013). "Selection and adaptation in the human genome". Annual Review of Genomics and Human Genetics. 14: 467–489. doi:10.1146/annurev-genom-091212-153509. PMID   23834317.
  5. Palaisa K; Morgante M; Tingey S; Rafalski A (June 2004). "Long-range patterns of diversity and linkage disequilibrium surrounding the maize Y1 gene are indicative of an asymmetric selective sweep". Proc. Natl. Acad. Sci. U.S.A. 101 (26): 9885–90. Bibcode:2004PNAS..101.9885P. doi: 10.1073/pnas.0307839101 . PMC   470768 . PMID   15161968.
  6. Sa, Juliana Marth, Twua, Olivia Twua, Haytona, Karen, Reyesa, Sahily, Fayb, Michael P., Ringwald, Pascal, & Wellemsa, Thomas E. (2009). "Geographic patterns of Plasmodium falciparum drug resistance distinguished by differential responses to amodiaquine and chloroquine". PNAS. 106 (45): 18883–18889. doi: 10.1073/pnas.0911317106 . PMC   2771746 . PMID   19884511.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  7. Rambaut, Andrew, Pybus, Oliver G., Nelson, Martha I., Viboud, Cecile, Taubenberger, Jeffery K., & Holmes, Edward C. (2008). "The genomic and epidemiological dynamics of human influenza A virus". Nature. 453 (7195): 615–619. Bibcode:2008Natur.453..615R. doi:10.1038/nature06945. PMC   2441973 . PMID   18418375.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  8. Sibley, L. David; Ajioka, James W (2008). "Population Structure of Toxoplasma gondii: Clonal Expansion Driven by Infrequent Recombination and Selective Sweeps". Annu. Rev. Microbiol. 62 (1): 329–359. doi:10.1146/annurev.micro.62.081307.162925. PMID   18544039.
  9. Hillman, G., Hedges, R., Moore, A., Colledge, S., & Pettitt, P. (2001). "New evidence of Late glacial cereal cultivation at Abu Hureyra on the Euphrates". Holocene. 4: 388–393.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  10. Gore, Michael A., Chia, Jer-Ming, Elshire, Robert J., Sun, Ersoz, Elhan S., Hurwitz, Bonnie L., Peiffer, Jason A., McMullen, Michael D., Grills, George S., Ross-Ibarra, Jeffrey, Ware, Doreen H., & Buckler, Edward S. (2009). "A First-Generation Haplotype Map of Maize". Science. 326 (5956): 1115–7. Bibcode:2009Sci...326.1115G. CiteSeerX   10.1.1.658.7628 . doi:10.1126/science.1177837. PMID   19965431. S2CID   206521881.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  11. Rubin, Carl-Johan, Zody, Michael C., Eriksson, Jonas, Meadows, Jennifer R. S., Sherwood, Ellen, Webster, Matthew T., Jiang, Lin, Ingman, Max, Sharpe, Sojeong, Ted Ka, Hallboök, Finn, Besnier, Francois, Carlborg, Orjan, Bed'hom, Bertrand, Tixier-Boichard, Michele, Jensen, Per, Siege, Paul, Lindblad-Toh, Kerstin, & Andersson, Leif (March 2010). "Whole-genome resequencing reveals loci under selection during chicken domestication". Letters to Nature. 464 (7288): 587–91. Bibcode:2010Natur.464..587R. doi: 10.1038/nature08832 . PMID   20220755.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  12. Bersaglieri, Todd; Sabeti, Pardis C.; Patterson, Nick; Vanderploeg, Trisha; Schaffner, Steve F.; Drake, Jared A.; Rhodes, Matthew; Reich, David E.; Hirschhorn, Joel N. (2004-06-01). "Genetic signatures of strong recent positive selection at the lactase gene". American Journal of Human Genetics. 74 (6): 1111–1120. doi:10.1086/421051. PMC   1182075 . PMID   15114531.
  13. Tishkoff, Sarah A.; Reed, Floyd A.; Ranciaro, Alessia; Voight, Benjamin F.; Babbitt, Courtney C.; Silverman, Jesse S.; Powell, Kweli; Mortensen, Holly M.; Hirbo, Jibril B. (2007-01-01). "Convergent adaptation of human lactase persistence in Africa and Europe". Nature Genetics. 39 (1): 31–40. doi:10.1038/ng1946. PMC   2672153 . PMID   17159977.
  14. Yi, Xin; Liang, Yu; Huerta-Sanchez, Emilia; Jin, Xin; Cuo, Zha Xi Ping; Pool, John E.; Xu, Xun; Jiang, Hui; Vinckenbosch, Nicolas (2010-07-02). "Sequencing of 50 human exomes reveals adaptation to high altitude". Science. 329 (5987): 75–78. Bibcode:2010Sci...329...75Y. doi:10.1126/science.1190371. PMC   3711608 . PMID   20595611.