Stepwise mutation model

Last updated

The stepwise mutation model (SMM) is a mathematical theory, developed by Motoo Kimura and Tomoko Ohta, that allows for investigation of the equilibrium distribution of allelic frequencies in a finite population where neutral alleles are produced in step-wise fashion. [1]

Contents

Description

The original model assumes that if an allele has a mutation that causes it to change in state, mutations that occur in repetitive regions of the genome will increase or decrease by a single repeat unit at a fixed rate (i.e. by the addition or subtraction of one repeat unit per generation) and these changes in allele states are expressed by an integer (. . . A-1, A, A1, .. .). The model also assumes random mating and that all alleles are selectively equivalent for each locus. [2] The SMM is distinguished from the Kimura-Crow model, also known as the infinite alleles model (IAM), in that as the population size increases to infinity, while the product of the Ne (effective population size) and the mutation rate is fixed, the mean number of different alleles in the population rapidly reaches a peak and plateaus, at which time that value is almost the same as the effective number of alleles.

Differences in the length of "simple sequence repeats" (SSRs) between individuals can thus be used to construct phylogenies (i.e. determine relatedness of individuals) or determine genetic distance between groups of individuals. For example, more genetically distant individuals would show larger differences in the size of SSRs than more closely related individuals. [3] Given the underlying assumptions of the SMM, it has been widely adopted for use with microsatellite markers that contain repeat regions, are co-dominate, and have high rates of mutation. [4] [5]

A number of summary statistics can be used to estimate genetic differentiation using the SMM model. These include number of alleles, observed and expected heterozygosity, and allele frequencies. The SMM model takes into account the frequency of mismatches between microsatellite loci, meaning the number of times there are no mismatches, single mismatches, 2 mismatches, etc. Variance in allele sizes are used to make inferences about the genetic distance between individuals or populations. By comparing summary statistics at different levels of organization it is possible to make inferences about population histories. For example, we can examine the variance of allele size within a subpopulation as well as within the total population to infer something about population history.

Construction of phylogenies under the SMM is, however, complicated by the fact that it is possible to either gain or lose a repeat unit, thus alleles that are identical in size are not necessarily identical by descent (i.e. they show marker-size homoplasy). [6] [5] Therefore, the SMM cannot be used to determine the exact number of mutational events between two individuals. For example, individual A might have gained a single additional repeat (from an ancestor who had 9) whereas individual B might have lost a single repeat (from an ancestor who had 11), resulting in both individuals with identical number of microsatellite repeats (that is, 10 repeats for a particular locus).

Limitations

Some important caveats and limitations to consider when choosing molecular markers for estimating the relatedness of individuals or distinguishing between populations include the following:

  1. There are limitations associated with various marker types and the number of markers used can heavily influence analytical results (with a higher number of markers generally showing greater ability to resolve genetic differences).
  2. Molecular markers provide only a “sample” of the genetic information in which to compare individuals of populations, and can differ from actual genetic differentiation. For example, it is possible that two individual are identical at a given locus, having the same mutation even from its common ancestor, but could differ at other loci that were not observed (or sequenced).
  3. Null alleles are not detectable by plain SMM and will produce very incorrect results. [7]

Extensions

The original SMM has been modified in multiple ways to deal with these short comings, including:

  1. taking into account the upper size limit to most microsatellites [4]
  2. factoring in the likelihood of large alleles to show higher rates of mutation than small alleles [4]
  3. and including variations that suggest that mutations are split between point mutations that disrupt stretches of repeats and the additions or removal of repeat units. [4] This last assumption provides an explanation for why microsatellites do not evolve into enormous arrays of infinite size.
  4. Piry et al. 1999 introduces Bottleneck [7]
  5. Van Oosterhout et al. 2004 introduces micro-checker which has rapidly become widely used for correcting some common SMM errors: null alleles, preferential allele dropout of large alleles, incorrect guessing of stutter peaks, and typographical errors. [7]

Related Research Articles

An allele, or allelomorph, is a variant of the sequence of nucleotides at a particular location, or locus, on a DNA molecule.

A microsatellite is a tract of repetitive DNA in which certain DNA motifs are repeated, typically 5–50 times. Microsatellites occur at thousands of locations within an organism's genome. They have a higher mutation rate than other areas of DNA leading to high genetic diversity. Microsatellites are often referred to as short tandem repeats (STRs) by forensic geneticists and in genetic genealogy, or as simple sequence repeats (SSRs) by plant geneticists.

<span class="mw-page-title-main">Dominance (genetics)</span> One gene variant masking the effect of another in the other copy of the gene

In genetics, dominance is the phenomenon of one variant (allele) of a gene on a chromosome masking or overriding the effect of a different variant of the same gene on the other copy of the chromosome. The first variant is termed dominant and the second is called recessive. This state of having two different variants of the same gene on each chromosome is originally caused by a mutation in one of the genes, either new or inherited. The terms autosomal dominant or autosomal recessive are used to describe gene variants on non-sex chromosomes (autosomes) and their associated traits, while those on sex chromosomes (allosomes) are termed X-linked dominant, X-linked recessive or Y-linked; these have an inheritance and presentation pattern that depends on the sex of both the parent and the child. Since there is only one copy of the Y chromosome, Y-linked traits cannot be dominant or recessive. Additionally, there are other forms of dominance, such as incomplete dominance, in which a gene variant has a partial effect compared to when it is present on both chromosomes, and co-dominance, in which different variants on each chromosome both show their associated traits.

Genetic drift, also known as random genetic drift, allelic drift or the Wright effect, is the change in the frequency of an existing gene variant (allele) in a population due to random chance.

Population genetics is a subfield of genetics that deals with genetic differences within and among populations, and is a part of evolutionary biology. Studies in this branch of biology examine such phenomena as adaptation, speciation, and population structure.

Allele frequency, or gene frequency, is the relative frequency of an allele at a particular locus in a population, expressed as a fraction or percentage. Specifically, it is the fraction of all chromosomes in the population that carry that allele over the total population or sample size. Microevolution is the change in allele frequencies that occurs over time within a population.

Genetic linkage is the tendency of DNA sequences that are close together on a chromosome to be inherited together during the meiosis phase of sexual reproduction. Two genetic markers that are physically near to each other are unlikely to be separated onto different chromatids during chromosomal crossover, and are therefore said to be more linked than markers that are far apart. In other words, the nearer two genes are on a chromosome, the lower the chance of recombination between them, and the more likely they are to be inherited together. Markers on different chromosomes are perfectly unlinked, although the penetrance of potentially deleterious alleles may be influenced by the presence of other alleles, and these other alleles may be located on other chromosomes than that on which a particular potentially deleterious allele is located.

<span class="mw-page-title-main">Polymorphism (biology)</span> Occurrence of two or more clearly different morphs or forms in the population of a species

In biology, polymorphism is the occurrence of two or more clearly different morphs or forms, also referred to as alternative phenotypes, in the population of a species. To be classified as such, morphs must occupy the same habitat at the same time and belong to a panmictic population.

<span class="mw-page-title-main">Haplotype</span> Group of genes from one parent

A haplotype is a group of alleles in an organism that are inherited together from a single parent.

<span class="mw-page-title-main">Motoo Kimura</span> Japanese biologist

Motoo Kimura was a Japanese biologist best known for introducing the neutral theory of molecular evolution in 1968. He became one of the most influential theoretical population geneticists. He is remembered in genetics for his innovative use of diffusion equations to calculate the probability of fixation of beneficial, deleterious, or neutral alleles. Combining theoretical population genetics with molecular evolution data, he also developed the neutral theory of molecular evolution in which genetic drift is the main force changing allele frequencies. James F. Crow, himself a renowned population geneticist, considered Kimura to be one of the two greatest evolutionary geneticists, along with Gustave Malécot, after the great trio of the modern synthesis, Ronald Fisher, J. B. S. Haldane, and Sewall Wright.

A null allele is a nonfunctional allele caused by a genetic mutation. Such mutations can cause a complete lack of production of the associated gene product or a product that does not function properly; in either case, the allele may be considered nonfunctional. A null allele cannot be distinguished from deletion of the entire locus solely from phenotypic observation.

<span class="mw-page-title-main">Genetic distance</span> Measure of divergence between populations

Genetic distance is a measure of the genetic divergence between species or between populations within a species, whether the distance measures time from common ancestor or degree of differentiation. Populations with many similar alleles have small genetic distances. This indicates that they are closely related and have a recent common ancestor.

In population genetics, fixation is the change in a gene pool from a situation where there exists at least two variants of a particular gene (allele) in a given population to a situation where only one of the alleles remains. That is, the allele becomes fixed. In the absence of mutation or heterozygote advantage, any allele must eventually be lost completely from the population or fixed. Whether a gene will ultimately be lost or fixed is dependent on selection coefficients and chance fluctuations in allelic proportions. Fixation can refer to a gene in general or particular nucleotide position in the DNA chain (locus).

Marker assisted selection or marker aided selection (MAS) is an indirect selection process where a trait of interest is selected based on a marker linked to a trait of interest, rather than on the trait itself. This process has been extensively researched and proposed for plant- and animal- breeding.

The infinite alleles model is a mathematical model for calculating genetic mutations. The Japanese geneticist Motoo Kimura and American geneticist James F. Crow (1964) introduced the infinite alleles model, an attempt to determine for a finite diploid population what proportion of loci would be homozygous. This was, in part, motivated by assertions by other geneticists that more than 50 percent of Drosophila loci were heterozygous, a claim they initially doubted. In order to answer this question they assumed first, that there were a large enough number of alleles so that any mutation would lead to a different allele ; and second, that the mutations would result in a number of different outcomes from neutral to deleterious.

<span class="mw-page-title-main">Fixed allele</span> Allele with a frequency of 1

In population genetics, a fixed allele is an allele that is the only variant that exists for that gene in a population. A fixed allele is homozygous for all members of the population. The process by which alleles become fixed is called fixation.

<span class="mw-page-title-main">Zygosity</span> Degree of similarity of the alleles in an organism

Zygosity is the degree to which both copies of a chromosome or gene have the same genetic sequence. In other words, it is the degree of similarity of the alleles in an organism.

<span class="mw-page-title-main">Gene polymorphism</span> Occurrence in an interbreeding population of two or more discontinuous genotypes

A gene is said to be polymorphic if more than one allele occupies that gene's locus within a population. In addition to having more than one allele at a specific locus, each allele must also occur in the population at a rate of at least 1% to generally be considered polymorphic.

In population genetics, the allele frequency spectrum, sometimes called the site frequency spectrum, is the distribution of the allele frequencies of a given set of loci in a population or sample. Because an allele frequency spectrum is often a summary of or compared to sequenced samples of the whole population, it is a histogram with size depending on the number of sequenced individual chromosomes. Each entry in the frequency spectrum records the total number of loci with the corresponding derived allele frequency. Loci contributing to the frequency spectrum are assumed to be independently changing in frequency. Furthermore, loci are assumed to be biallelic, although extensions for multiallelic frequency spectra exist.

The Infinite sites model (ISM) is a mathematical model of molecular evolution first proposed by Motoo Kimura in 1969. Like other mutation models, the ISM provides a basis for understanding how mutation develops new alleles in DNA sequences. Using allele frequencies, it allows for the calculation of heterozygosity, or genetic diversity, in a finite population and for the estimation of genetic distances between populations of interest.

References

  1. Kimura, Motoo; Ohta, Tomoko (1978-06-01). "Stepwise mutation model and distribution of allelic frequencies in a finite population". Proceedings of the National Academy of Sciences . 75 (6): 2868–2872. Bibcode:1978PNAS...75.2868K. doi: 10.1073/pnas.75.6.2868 . ISSN   0027-8424. JSTOR   68345. PMC   392666 . PMID   275857. S2CID   8084577.
  2. Valdes, A. M.; Slatkin, M.; Freimer, N. B. (1993). "Allele Frequencies at Microsatellite Loci: The Stepwise Mutation Model Revisited". Genetics. 133 (3): 737–49. doi:10.1093/genetics/133.3.737. ISSN   0016-6731. PMC   1205356 . PMID   8454213.
  3. Chen, X.; Cho, Y.; McCouch, Susan (2002). "Sequence divergence of rice microsatellites in Oryza and other plant species". Molecular Genetics and Genomics . 268 (3): 331–343. doi:10.1007/s00438-002-0739-5. ISSN   1617-4615. PMID   12436255. S2CID   886970.
  4. 1 2 3 4 Ellegren, Hans (2004). "Microsatellites: simple sequences with complex evolution". Nature Reviews Genetics . 5 (6): 435–445. doi:10.1038/nrg1348. ISSN   1471-0056. PMID   15153996. S2CID   11975343.
  5. 1 2 Laval, Guillaume; SanCristobal, Magali; Chevalet, Claude (2002-07-15). "Measuring genetic distances between breeds: use of some distances in various short term evolution models". Genetics Selection Evolution . 34 (4): 481–507. doi: 10.1186/1297-9686-34-4-481 . ISSN   1297-9686. PMC   2705457 . PMID   12270106.
  6. Estoup, Arnaud; Jarne, Philippe; Cornuet, Jean-Marie (2002). "Homoplasy and mutation model at microsatellite loci and their consequences for population genetics analysis". Molecular Ecology . 11 (9): 1591–1604. doi:10.1046/j.1365-294x.2002.01576.x. ISSN   0962-1083. PMID   12207711. S2CID   25797455.
  7. 1 2 3 Selkoe, Kimberly A.; Toonen, Robert J. (2006). "Microsatellites for ecologists: a practical guide to using and evaluating microsatellite markers". Ecology Letters . 9 (5): 615–629. doi:10.1111/j.1461-0248.2006.00889.x. ISSN   1461-023X. PMID   16643306.