Allele age

Last updated

Allele age (or mutation age) is the amount of time elapsed since an allele first appeared due to mutation. Estimating the time at which a certain allele appeared allows researchers to infer patterns of human migration, disease, and natural selection. Allele age can be estimated based on (1) the frequency of the allele in a population and (2) the genetic variation that occurs within different copies of the allele, also known as intra-allelic variation. While either of these methods can be used to estimate allele age, the use of both increases the accuracy of the estimation and can sometimes offer additional information regarding the presence of selection.

Contents

Estimating allele age based on the allele’s frequency is based on the fact that alleles in high frequency are older than alleles in low frequency (assuming the absence of selection). Of course, many alleles of interest are under some type of selection. Because alleles that are under positive selection can rise to high frequency very quickly, it is important to understand the mechanisms that underlie allele frequency change, such as natural selection, gene flow, genetic drift, and mutation.

Estimating allele age based on intra-allelic variation is based on the fact that with every generation, linkage with other alleles (linkage disequilibrium) is disrupted by recombination and new variation in linkage is created via new mutations. The analysis of intra-allelic variation to assess allele age depends on coalescent theory. There are two different approaches that can be used to analyze allele age based on intra-allelic variation. First, a phylogenetics approach extrapolates an allele’s age by reconstructing a gene tree and dating the root of the tree. This approach is best when analyzing ancient, as opposed to recent, mutations. Second, a population genetics approach estimates allele age by using mutation, recombination, and demography models instead of a gene tree. This type of approach is best for analyzing recent mutations.

Recently, Albers and McVean (2018) proposed a non-parametric method to estimate the age of an allele, using probabilistic, coalescent-based models of mutation and recombination. [1] Specifically, their method infers the time to the most recent common ancestor (TMRCA) between hundreds or thousands of chromosomal sequence (haplotype) pairs. This information is then combined using a composite likelihood approach to obtain an estimate of the time of mutation at a single locus. This methodology was applied to more than 16 million variants in the human genome, using data from the 1000 Genomes Project and the Simons Genome Diversity Project, to generate the atlas of variant age. [2]

History

Population geneticists, Motoo Kimura and Tomoko Ohta, were the first to analyze the association between an allele’s frequency and its age in the 1970s. [3] [4] They showed that the age of a neutral allele can be estimated (assuming a large, randomly mating population) by

Where represents the allele frequency and is the expected age, measured in units of 2N generations. [3] [4]

More recent studies, however, have focused on the analysis of intra-allelic variation. In 1990, Jean-Louis Serre and his team were the first to assess allele age by analyzing intra-allelic variation. Using a sample of 240 French families, they surveyed two restriction fragment length polymorphisms (RFLP) sites (E1 and E2) that are closely linked to an allele (ΔF508) at the cystic fibrosis locus (CFTR). Recombination theory allows for the calculation of x(t), the expected frequency of E2 in association with the allele ΔF508 in generation t, and y, the frequency of E2 on chromosomes without the ΔF508 allele. The recombination rate, c, is assumed to be known, and so the allele age can be calculated as an estimate of t. [4] [5]

Although Serre et al. (1990) were the first to employ this method, it became increasingly popular after the Risch et al. study in 1995, which analyzed alleles in an Ashkenazi Jewish population. [4] [6]

Examples of allele age estimations

Many intra-allelic variation studies suggest that disease-causing alleles arose rather recently in human history. [7]

Cystic fibrosis

The Serre et al. (1990) study estimated that an allele causing cystic fibrosis arose approximately 181.4 generations ago. Therefore, they estimated that the allele age to be between 3,000 and 6,000 years ago. [4] [5] However, other studies have obtained drastically different estimates. Morral et al. (1994) suggested a minimum age of 52,000 years ago. A reanalysis of the Morral et al. (1994) data by Slatkin and Rannala (2000) estimated an allele age of approximately 3,000 years, which is consistent with the Serre et al. (1990) results. [4]

AIDS-resistance allele (CCR5)

A 32 base pair deletion at the CCR5 locus results in resistance to the HIV infection, which causes AIDS. Individuals who are homozygous for the mutation experience complete resistance to the infection, while heterozygotes only experience partial resistance to the infection, resulting in a delayed onset of AIDS. [4] [8] A study by Stephens et al. in 1998 suggested that this allele originated approximately 27.5 generations, or 688 years ago. These results were obtained using intra-allelic variation analysis. This same study also used the allele frequency and the Kimura-Ohta model to estimate allele age. This method provided very different results, suggesting that the allele appeared more than 100,000 years ago. Stephens et al. (1996) argue that the discrepancy between these age estimates strongly suggest recent positive selection for the CCR5 mutation. [4] [9] Because the CCR5 mutation also offers resistance to smallpox, these results are consistent with the idea that the CCR5 mutation first rose to higher frequency due to positive selection during smallpox outbreaks in European history before being positively selected for due to its role in HIV resistance. [10]

Lactase persistence

Many adults are lactose intolerant because their bodies cease production of the enzyme lactase post childhood. However, mutations in the promoter region of the lactase gene (LCT) result in the continued production of lactase throughout adulthood in certain African populations, a condition known as lactase persistence. A study conducted by Sarah Tishkoff and her team shows that the mutation for lactase persistence has been under positive selection since its recent appearance approximately 3,000 to 7,000 years ago. These dates are consistent with the rise of cattle domestication and pastoralist lifestyles in these regions, making the lactase persistence mutation a strong example of gene-culture co-evolution. [11]

Related Research Articles

<span class="mw-page-title-main">Mutation</span> Alteration in the nucleotide sequence of a genome

In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, mitosis, or meiosis or other types of damage to DNA, which then may undergo error-prone repair, cause an error during other forms of repair, or cause an error during replication. Mutations may also result from insertion or deletion of segments of DNA due to mobile genetic elements.

Genetic drift, also known as random genetic drift, allelic drift or the Wright effect, is the change in the frequency of an existing gene variant (allele) in a population due to random chance.

Population genetics is a subfield of genetics that deals with genetic differences within and among populations, and is a part of evolutionary biology. Studies in this branch of biology examine such phenomena as adaptation, speciation, and population structure.

<span class="mw-page-title-main">Genetic variation</span> Difference in DNA among individuals or populations

Genetic variation is the difference in DNA among individuals or the differences between populations among the same species. The multiple sources of genetic variation include mutation and genetic recombination. Mutations are the ultimate sources of genetic variation, but other mechanisms, such as genetic drift, contribute to it, as well.

In population genetics, F-statistics describe the statistically expected level of heterozygosity in a population; more specifically the expected degree of (usually) a reduction in heterozygosity when compared to Hardy–Weinberg expectation.

<span class="mw-page-title-main">CCR5</span> Immune system protein

C-C chemokine receptor type 5, also known as CCR5 or CD195, is a protein on the surface of white blood cells that is involved in the immune system as it acts as a receptor for chemokines.

Gene conversion is the process by which one DNA sequence replaces a homologous sequence such that the sequences become identical after the conversion event. Gene conversion can be either allelic, meaning that one allele of the same gene replaces another allele, or ectopic, meaning that one paralogous DNA sequence converts another.

Lactase persistence is the continued activity of the lactase enzyme in adulthood, allowing the digestion of lactose in milk. In most mammals, the activity of the enzyme is dramatically reduced after weaning. In some human populations though, lactase persistence has recently evolved as an adaptation to the consumption of nonhuman milk and dairy products beyond infancy. Lactase persistence is very high among northern Europeans, especially Irish people. Worldwide, most people are lactase non-persistent, and are affected by varying degrees of lactose intolerance as adults. However, lactase persistence and lactose intolerance can overlap.

Genetic hitchhiking, also called genetic draft or the hitchhiking effect, is when an allele changes frequency not because it itself is under natural selection, but because it is near another gene that is undergoing a selective sweep and that is on the same DNA chain. When one gene goes through a selective sweep, any other nearby polymorphisms that are in linkage disequilibrium will tend to change their allele frequencies too. Selective sweeps happen when newly appeared mutations are advantageous and increase in frequency. Neutral or even slightly deleterious alleles that happen to be close by on the chromosome 'hitchhike' along with the sweep. In contrast, effects on a neutral locus due to linkage disequilibrium with newly appeared deleterious mutations are called background selection. Both genetic hitchhiking and background selection are stochastic (random) evolutionary forces, like genetic drift.

<span class="mw-page-title-main">Human genetic variation</span> Genetic diversity in human populations

Human genetic variation is the genetic differences in and among populations. There may be multiple variants of any given gene in the human population (alleles), a situation called polymorphism.

In genetics, a selective sweep is the process through which a new beneficial mutation that increases its frequency and becomes fixed in the population leads to the reduction or elimination of genetic variation among nucleotide sequences that are near the mutation. In selective sweep, positive selection causes the new mutation to reach fixation so quickly that linked alleles can "hitchhike" and also become fixed.

Antigenic variation or antigenic alteration refers to the mechanism by which an infectious agent such as a protozoan, bacterium or virus alters the proteins or carbohydrates on its surface and thus avoids a host immune response, making it one of the mechanisms of antigenic escape. It is related to phase variation. Antigenic variation not only enables the pathogen to avoid the immune response in its current host, but also allows re-infection of previously infected hosts. Immunity to re-infection is based on recognition of the antigens carried by the pathogen, which are "remembered" by the acquired immune response. If the pathogen's dominant antigen can be altered, the pathogen can then evade the host's acquired immune system. Antigenic variation can occur by altering a variety of surface molecules including proteins and carbohydrates. Antigenic variation can result from gene conversion, site-specific DNA inversions, hypermutation, or recombination of sequence cassettes. The result is that even a clonal population of pathogens expresses a heterogeneous phenotype. Many of the proteins known to show antigenic or phase variation are related to virulence.

<span class="mw-page-title-main">1000 Genomes Project</span> International research effort on genetic variation

The 1000 Genomes Project (1KGP), taken place from January 2008 to 2015, was an international research effort to establish the most detailed catalogue of human genetic variation at the time. Scientists planned to sequence the genomes of at least one thousand anonymous healthy participants from a number of different ethnic groups within the following three years, using advancements in newly developed technologies. In 2010, the project finished its pilot phase, which was described in detail in a publication in the journal Nature. In 2012, the sequencing of 1092 genomes was announced in a Nature publication. In 2015, two papers in Nature reported results and the completion of the project and opportunities for future research.

Population structure is the presence of a systematic difference in allele frequencies between subpopulations. In a randomly mating population, allele frequencies are expected to be roughly similar between groups. However, mating tends to be non-random to some degree, causing structure to arise. For example, a barrier like a river can separate two groups of the same species and make it difficult for potential mates to cross; if a mutation occurs, over many generations it can spread and become common in one subpopulation while being completely absent in the other.

Host–parasite coevolution is a special case of coevolution, where a host and a parasite continually adapt to each other. This can create an evolutionary arms race between them. A more benign possibility is of an evolutionary trade-off between transmission and virulence in the parasite, as if it kills its host too quickly, the parasite will not be able to reproduce either. Another theory, the Red Queen hypothesis, proposes that since both host and parasite have to keep on evolving to keep up with each other, and since sexual reproduction continually creates new combinations of genes, parasitism favours sexual reproduction in the host.

Hybrizyme is a term coined to indicate novel or normally rare gene variants that are associated with hybrid zones, geographic areas where two related taxa meet, mate, and produce hybrid offspring. The hybrizyme phenomenon is widespread and these alleles occur commonly, if not in all hybrid zones. Initially considered to be caused by elevated rates of mutation in hybrids, the most probable hypothesis infers that they are the result of negative (purifying) selection. Namely, in the center of the hybrid zone, negative selection purges alleles against hybrid disadvantage. Stated differently, any allele that will decrease reproductive isolation is favored and any linked alleles also increase their frequency by genetic hitchhiking. If the linked alleles used to be rare variants in the parental taxa, they will become more common in the area where the hybrids are formed.

In population genetics, the allele frequency spectrum, sometimes called the site frequency spectrum, is the distribution of the allele frequencies of a given set of loci in a population or sample. Because an allele frequency spectrum is often a summary of or compared to sequenced samples of the whole population, it is a histogram with size depending on the number of sequenced individual chromosomes. Each entry in the frequency spectrum records the total number of loci with the corresponding derived allele frequency. Loci contributing to the frequency spectrum are assumed to be independently changing in frequency. Furthermore, loci are assumed to be biallelic, although extensions for multiallelic frequency spectra exist.

The stepwise mutation model (SMM) is a mathematical theory, developed by Motoo Kimura and Tomoko Ohta, that allows for investigation of the equilibrium distribution of allelic frequencies in a finite population where neutral alleles are produced in step-wise fashion.

Recent human evolution refers to evolutionary adaptation, sexual and natural selection, and genetic drift within Homo sapiens populations, since their separation and dispersal in the Middle Paleolithic about 50,000 years ago. Contrary to popular belief, not only are humans still evolving, their evolution since the dawn of agriculture is faster than ever before. It has been proposed that human culture acts as a selective force in human evolution and has accelerated it; however, this is disputed. With a sufficiently large data set and modern research methods, scientists can study the changes in the frequency of an allele occurring in a tiny subset of the population over a single lifetime, the shortest meaningful time scale in evolution. Comparing a given gene with that of other species enables geneticists to determine whether it is rapidly evolving in humans alone. For example, while human DNA is on average 98% identical to chimp DNA, the so-called Human Accelerated Region 1 (HAR1), involved in the development of the brain, is only 85% similar.

In genetics, when multiple copies of a beneficial mutation become established and fix together it is called soft sweep. Depending on the origin of these copies, linked variants might then be retained and emerge as haplotype structures in the population. There are two major forms of soft sweeps:

References

  1. Albers, Patrick K.; McVean, Gil (2018-09-13). "Dating genomic variants and shared ancestry in population-scale sequencing data". bioRxiv: 416610. doi:10.1101/416610. S2CID   92550011.
  2. Albers, Patrick K.; McVean, Gil (2018-09-18). "Atlas of Variant Age". Figshare. doi:10.6084/m9.figshare.c.4235771.v1.
  3. 1 2 Kimura M, Ohta T (September 1973). "The age of a neutral mutant persisting in a finite population". Genetics. 75 (1): 199–212. doi:10.1093/genetics/75.1.199. PMC   1212997 . PMID   4762875.
  4. 1 2 3 4 5 6 7 8 Slatkin M, Rannala B (2000). "Estimating allele age". Annual Review of Genomics and Human Genetics. 1: 225–49. doi:10.1146/annurev.genom.1.1.225. PMID   11701630.
  5. 1 2 Serre JL, Simon-Bouy B, Mornet E, Jaume-Roig B, Balassopoulou A, Schwartz M, Taillandier A, Boué J, Boué A (April 1990). "Studies of RFLP closely linked to the cystic fibrosis locus throughout Europe lead to new considerations in populations genetics". Human Genetics. 84 (5): 449–54. doi:10.1007/bf00195818. PMID   1969843. S2CID   24889308.
  6. Risch N, de Leon D, Ozelius L, Kramer P, Almasy L, Singer B, Fahn S, Breakefield X, Bressman S (February 1995). "Genetic analysis of idiopathic torsion dystonia in Ashkenazi Jews and their recent descent from a small founder population". Nature Genetics. 9 (2): 152–9. doi:10.1038/ng0295-152. PMID   7719342. S2CID   5922128.
  7. Rannala B, Bertorelle G (August 2001). "Using linked markers to infer the age of a mutation". Human Mutation. 18 (2): 87–100. doi: 10.1002/humu.1158 . PMID   11462233. S2CID   24342755.
  8. Henrich TJ, Hanhauser E, Harrison LJ, Palmer CD, Romero-Tejeda M, Jost S, Bosch RJ, Kuritzkes DR (March 2016). "CCR5-Δ32 Heterozygosity, HIV-1 Reservoir Size, and Lymphocyte Activation in Individuals Receiving Long-term Suppressive Antiretroviral Therapy". The Journal of Infectious Diseases. 213 (5): 766–70. doi:10.1093/infdis/jiv504. PMC   4747624 . PMID   26512140.
  9. Stephens JC, Reich DE, Goldstein DB, Shin HD, Smith MW, Carrington M, et al. (June 1998). "Dating the origin of the CCR5-Delta32 AIDS-resistance allele by the coalescence of haplotypes". American Journal of Human Genetics. 62 (6): 1507–15. doi:10.1086/301867. PMC   1377146 . PMID   9585595.
  10. Galvani AP, Slatkin M (December 2003). "Evaluating plague and smallpox as historical selective pressures for the CCR5-Delta 32 HIV-resistance allele". Proceedings of the National Academy of Sciences of the United States of America. 100 (25): 15276–9. Bibcode:2003PNAS..10015276G. doi: 10.1073/pnas.2435085100 . PMC   299980 . PMID   14645720.
  11. Tishkoff SA, Reed FA, Ranciaro A, Voight BF, Babbitt CC, Silverman JS, Powell K, Mortensen HM, Hirbo JB, Osman M, Ibrahim M, Omar SA, Lema G, Nyambo TB, Ghori J, Bumpstead S, Pritchard JK, Wray GA, Deloukas P (January 2007). "Convergent adaptation of human lactase persistence in Africa and Europe". Nature Genetics. 39 (1): 31–40. doi:10.1038/ng1946. PMC   2672153 . PMID   17159977.

Further reading