Four-gamete test

Last updated

In population genetics, the four-gamete test is a method for detecting historical recombination events. [1]

Contents

Description

Given a set of four or more sampled haploid chromosomes, the four-gamete test (FGT) detects recombination events by locating pairs of segregating sites that cannot have arisen without either recombination or a repeat mutation. Under the infinite-sites assumption (i.e. repeat mutations have zero probability), the probability of a repeat mutation is zero, and hence a recombination event is inferred. For example, if the data being studied consists of bi-allelic single-nucleotide polymorphism data, then the following configuration could be generated without recombination.

ChromosomeSite 1Site 2
100
210
301

However, the following configuration cannot be generated without recombination.

ChromosomeSite 1Site 2
100
210
301
411

The FGT detects a recombination event if the above configuration occurs in the data. The data in the above configuration is considered to be incompatible with any non-recombining ancestral history.

The FGT has low statistical power to detect recombination. Furthermore, the FGT is suitable only when the mutation rate is significantly smaller than the recombination rate. If the mutation rate is high, then the infinite-sites assumption is violated. For example, the FGT is generally suitable for human datasets, but is unsuitable for bacterial datasets.

See also

Related Research Articles

A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of sample data. A statistical model represents, often in considerably idealized form, the data-generating process.

Statistical inference is the process of using data analysis to deduce properties of an underlying distribution of probability. Inferential statistical analysis infers properties of a population, for example by testing hypotheses and deriving estimates. It is assumed that the observed data set is sampled from a larger population.

Population genetics Study of genetic differences within and between populations including the study of adaptation, speciation, and population structure

Population genetics is a subfield of genetics that deals with genetic differences within and between populations, and is a part of evolutionary biology. Studies in this branch of biology examine such phenomena as adaptation, speciation, and population structure.

Nonparametric statistics is the branch of statistics that is not based solely on parametrized families of probability distributions. Nonparametric statistics is based on either being distribution-free or having a specified distribution but with the distribution's parameters unspecified. Nonparametric statistics includes both descriptive statistics and statistical inference.

Genetic linkage is the tendency of DNA sequences that are close together on a chromosome to be inherited together during the meiosis phase of sexual reproduction. Two genetic markers that are physically near to each other are unlikely to be separated onto different chromatids during chromosomal crossover, and are therefore said to be more linked than markers that are far apart. In other words, the nearer two genes are on a chromosome, the lower the chance of recombination between them, and the more likely they are to be inherited together. Markers on different chromosomes are perfectly unlinked.

Haplotype Group of genes from one parent

A haplotype is a group of alleles in an organism that are inherited together from a single parent. However, there are other uses of this term. First, it is used to mean a collection of specific alleles in a cluster of tightly linked genes on a chromosome that are likely to be inherited together—that is, they are likely to be conserved as a sequence that survives the descent of many generations of reproduction. A second use is to mean a set of linked single-nucleotide polymorphism (SNP) alleles that tend to always occur together. It is thought that identifying these statistical associations and few alleles of a specific haplotype sequence can facilitate identifying all other such polymorphic sites that are nearby on the chromosome. Such information is critical for investigating the genetics of common diseases; which in fact have been investigated in humans by the International HapMap Project. Thirdly, many human genetic testing companies use the term in a third way: to refer to an individual collection of specific mutations within a given genetic segment;.

In population genetics, linkage disequilibrium is the non-random association of alleles at different loci in a given population. Loci are said to be in linkage disequilibrium when the frequency of association of their different alleles is higher or lower than what would be expected if the loci were independent and associated randomly.

In population genetics, Ewens's sampling formula, describes the probabilities associated with counts of how many different alleles are observed a given number of times in the sample.

Identity by descent Identical nucleotide sequence due to inheritance without recombination from a common ancestor

A DNA segment is identical by state (IBS) in two or more individuals if they have identical nucleotide sequences in this segment. An IBS segment is identical by descent (IBD) in two or more individuals if they have inherited it from a common ancestor without recombination, that is, the segment has the same ancestral origin in these individuals. DNA segments that are IBD are IBS per definition, but segments that are not IBD can still be IBS due to the same mutations in different individuals or recombinations that do not alter the segment.

Coalescent theory is a model of how gene variants sampled from a population may have originated from a common ancestor. In the simplest case, coalescent theory assumes no recombination, no natural selection, and no gene flow or population structure, meaning that each variant is equally likely to have been passed from one generation to the next. The model looks backward in time, merging alleles into a single ancestral copy according to a random process in coalescence events. Under this model, the expected time between successive coalescence events increases almost exponentially back in time. Variance in the model comes from both the random passing of alleles from one generation to the next, and the random occurrence of mutations in these alleles.

Genetic hitchhiking, also called genetic draft or the hitchhiking effect, is when an allele changes frequency not because it itself is under natural selection, but because it is near another gene that is undergoing a selective sweep and that is on the same DNA chain. When one gene goes through a selective sweep, any other nearby polymorphisms that are in linkage disequilibrium will tend to change their allele frequencies too. Selective sweeps happen when newly appeared mutations are advantageous and increase in frequency. Neutral or even slightly deleterious alleles that happen to be close by on the chromosome 'hitchhike' along with the sweep. In contrast, effects on a neutral locus due to linkage disequilibrium with newly appeared deleterious mutations are called background selection. Both genetic hitchhiking and background selection are stochastic (random) evolutionary forces, like genetic drift.

FGT may refer to:

In genetics, the Ka/Ks ratio, also known as ω or dN/dS ratio, is used to estimate the balance between neutral mutations, purifying selection and beneficial mutations acting on a set of homologous protein-coding genes. It is calculated as the ratio of the number of nonsynonymous substitutions per non-synonymous site (Ka), in a given period of time, to the number of synonymous substitutions per synonymous site (Ks), in the same period. The latter are assumed to be neutral, so that the ratio indicates the net balance between deleterious and beneficial mutations. Values of Ka/Ks significantly above 1 are unlikely to occur without at least some of the mutations being advantageous. If beneficial mutations are assumed to make little contribution, then Ks estimates the degree of evolutionary constraint.

In population genetics, the Watterson estimator is a method for describing the genetic diversity in a population. It was developed by Margaret Wu and G. A. Watterson in the 1970s. It is estimated by counting the number of polymorphic sites. It is a measure of the "population mutation rate" from the observed nucleotide diversity of a population. , where is the effective population size and is the per-generation mutation rate of the population of interest. The assumptions made are that there is a sample of haploid individuals from the population of interest, that there are infinitely many sites capable of varying, and that . Because the number of segregating sites counted will increase with the number of sequences looked at, the correction factor is used.

In genetics, a centimorgan or map unit (m.u.) is a unit for measuring genetic linkage. It is defined as the distance between chromosome positions for which the expected average number of intervening chromosomal crossovers in a single generation is 0.01. It is often used to infer distance along a chromosome. However, it is not a true physical distance.

In probability theory and statistics, the index of dispersion, dispersion index,coefficient of dispersion,relative variance, or variance-to-mean ratio (VMR), like the coefficient of variation, is a normalized measure of the dispersion of a probability distribution: it is a measure used to quantify whether a set of observed occurrences are clustered or dispersed compared to a standard statistical model.

In statistics, asymptotic theory, or large sample theory, is a framework for assessing properties of estimators and statistical tests. Within this framework, it is typically assumed that the sample size n grows indefinitely; the properties of estimators and tests are then evaluated in the limit as n → ∞. In practice, a limit evaluation is treated as being approximately valid for large finite sample sizes, as well.

The HKA Test, named after Richard R. Hudson, Martin Kreitman, and Montserrat Aguadé, is a statistical test used in genetics to evaluate the predictions of the Neutral Theory of molecular evolution. By comparing the polymorphism within each species and the divergence observed between two species at two or more loci, the test can determine whether the observed difference is likely due to neutral evolution or rather due to adaptive evolution. Developed in 1987, the HKA test is a precursor to the McDonald-Kreitman test, which was derived in 1991. The HKA test is best used to look for balancing selection, recent selective sweeps or other variation-reducing forces.

The Infinite sites model (ISM) is a mathematical model of molecular evolution first proposed by Motoo Kimura in 1969. Like other mutation models, the ISM provides a basis for understanding how mutation develops new alleles in DNA sequences. Using allele frequencies, it allows for the calculation of heterozygosity, or genetic diversity, in a finite population and for the estimation of genetic distances between populations of interest.

PyClone is a software that implements a Hierarchical Bayes statistical model to estimate cellular frequency patterns of mutations in a population of cancer cells using observed alternate allele frequencies, copy number, and loss of heterozygosity (LOH) information. PyClone outputs clusters of variants based on calculated cellular frequencies of mutations.

References

  1. Hudson, R. K. (1 September 1985). "Statistical Properties of the Number of Recombination Events in the History of a Sample of DNA Sequences". Genetics. 111 (1): 147–164. ISSN   0016-6731. PMC   1202594 . PMID   4029609.