Genetic distance

Last updated
Genetic distance map by Cavalli-Sforza et al. (1994) The history and geography of human genes Luigi Luca Cavalli-Sforza map genetic.png
Genetic distance map by Cavalli-Sforza et al. (1994)

Genetic distance is a measure of the genetic divergence between species or between populations within a species, whether the distance measures time from common ancestor or degree of differentiation. [2] Populations with many similar alleles have small genetic distances. This indicates that they are closely related and have a recent common ancestor.

Contents

Genetic distance is useful for reconstructing the history of populations, such as the multiple human expansions out of Africa. [3] It is also used for understanding the origin of biodiversity. For example, the genetic distances between different breeds of domesticated animals are often investigated in order to determine which breeds should be protected to maintain genetic diversity. [4]

Biological foundation

In the genome of an organism, each gene is located at a specific place called the locus for that gene. Allelic variations at these loci cause phenotypic variation within species (e.g. hair colour, eye colour). However, most alleles do not have an observable impact on the phenotype. Within a population new alleles generated by mutation either die out or spread throughout the population. When a population is split into different isolated populations (by either geographical or ecological factors), mutations that occur after the split will be present only in the isolated population. Random fluctuation of allele frequencies also produces genetic differentiation between populations. This process is known as genetic drift. By examining the differences between allele frequencies between the populations and computing genetic distance, we can estimate how long ago the two populations were separated. [5]

Measures

Although it is simple to define genetic distance as a measure of genetic divergence, there are several different statistical measures that have been proposed. This has happened because different authors considered different evolutionary models. The most commonly used are Nei's genetic distance, [5] Cavalli-Sforza and Edwards measure, [6] and Reynolds, Weir and Cockerham's genetic distance, [7] listed below.

In all the formulae in this section, and represent two different populations for which loci have been studied. Let represent the th allele frequency at the th locus.

Nei's standard genetic distance

In 1972, Masatoshi Nei published what came to be known as Nei's standard genetic distance. This distance has the nice property that if the rate of genetic change (amino acid substitution) is constant per year or generation then Nei's standard genetic distance (D) increases in proportion to divergence time. This measure assumes that genetic differences are caused by mutation and genetic drift. [5]

This distance can also be expressed in terms of the arithmetic mean of gene identity. Let be the probability for the two members of population having the same allele at a particular locus and be the corresponding probability in population . Also, let be the probability for a member of and a member of having the same allele. Now let , and represent the arithmetic mean of , and over all loci, respectively. In other words,

where is the total number of loci examined. [8]

Nei's standard distance can then be written as [5]

Cavalli-Sforza chord distance

In 1967 Luigi Luca Cavalli-Sforza and A. W. F. Edwards published this measure. It assumes that genetic differences arise due to genetic drift only. One major advantage of this measure is that the populations are represented in a hypersphere, the scale of which is one unit per gene substitution. The chord distance in the hyperdimensional sphere is given by [2] [6]

Some authors drop the factor to simplify the formula at the cost of losing the property that the scale is one unit per gene substitution.

Reynolds, Weir, and Cockerham's genetic distance

In 1983, this measure was published by John Reynolds, Bruce Weir and C. Clark Cockerham. This measure assumes that genetic differentiation occurs only by genetic drift without mutations. It estimates the coancestry coefficient which provides a measure of the genetic divergence by: [7]

Other measures

Many other measures of genetic distance have been proposed with varying success.

Nei's DA distance 1983

This distance assumes that genetic differences arise due to mutation and genetic drift, but this distance measure is known to give more reliable population trees than other distances particularly for microsatellite DNA data. [9] [10]

Euclidean distance

Euclidean genetic distance between 51 worldwide human populations, calculated using 289,160 SNPs. Dark red is the most similar pair and dark blue is the most distant pair. Genetic similarities between 51 worldwide human populations (Euclidean genetic distance using 289,160 SNPs).png
Euclidean genetic distance between 51 worldwide human populations, calculated using 289,160 SNPs. Dark red is the most similar pair and dark blue is the most distant pair.
[2]

Goldstein distance 1995

It was specifically developed for microsatellite markers and is based on the stepwise-mutation model (SMM). and are the means of the allele sizes in population X and Y. [12]

Nei's minimum genetic distance 1973

This measure assumes that genetic differences arise due to mutation and genetic drift. [13]

Roger's distance 1972

[14]

Fixation index

A commonly used measure of genetic distance is the fixation index (FST) which varies between 0 and 1. A value of 0 indicates that two populations are genetically identical (minimal or no genetic diversity between the two populations) whereas a value of 1 indicates that two populations are genetically different (maximum genetic diversity between the two populations). No mutation is assumed. Large populations between which there is much migration, for example, tend to be little differentiated whereas small populations between which there is little migration tend to be greatly differentiated. FST is a convenient measure of this differentiation, and as a result FST and related statistics are among the most widely used descriptive statistics in population and evolutionary genetics. But FST is more than a descriptive statistic and measure of genetic differentiation. FST is directly related to the Variance in allele frequency among populations and conversely to the degree of resemblance among individuals within populations. If FST is small, it means that allele frequencies within each population are very similar; if it is large, it means that allele frequencies are very different.

Software

See also

Related Research Articles

Genetic drift Concept in genetics

Genetic drift is the change in the frequency of an existing gene variant (allele) in a population due to random chance.

Fitness (biology) Expected reproductive success

Fitness is the quantitative representation of individual reproductive success. It is also equal to the average contribution to the gene pool of the next generation, made by the same individuals of the specified genotype or phenotype. Fitness can be defined either with respect to a genotype or to a phenotype in a given environment or time. The fitness of a genotype is manifested through its phenotype, which is also affected by the developmental environment. The fitness of a given phenotype can also be different in different selective environments.

Allele frequency, or gene frequency, is the relative frequency of an allele at a particular locus in a population, expressed as a fraction or percentage. Specifically, it is the fraction of all chromosomes in the population that carry that allele over the total population or sample size. Microevolution is the change in allele frequencies that occurs over time within a population.

Quantitative genetics Study of the inheritance of continuously variable traits

Quantitative genetics deals with phenotypes that vary continuously —as opposed to discretely identifiable phenotypes and gene-products.

Genetic variation Difference in DNA among individuals or populations

Genetic variation is the difference in DNA among individuals or the differences between populations. The multiple sources of genetic variation include mutation and genetic recombination. Mutations are the ultimate sources of genetic variation, but other mechanisms, such as sexual reproduction and genetic drift, contribute to it, as well.

Haplotype Group of genes from one parent

A haplotype is a group of alleles in an organism that are inherited together from a single parent.

In population genetics, linkage disequilibrium (LD) is the non-random association of alleles at different loci in a given population. Loci are said to be in linkage disequilibrium when the frequency of association of their different alleles is higher or lower than what would be expected if the loci were independent and associated randomly.

In population genetics, F-statistics describe the statistically expected level of heterozygosity in a population; more specifically the expected degree of (usually) a reduction in heterozygosity when compared to Hardy–Weinberg expectation.

Nucleotide diversity is a concept in molecular genetics which is used to measure the degree of polymorphism within a population.

The effective population size (Ne) is the number of individuals that an idealised population would need to have in order for some specified quantity of interest to be the same as in the real population. Idealised populations are based on unrealistic but convenient simplifications such as random mating, simultaneous birth of each new generation, constant population size, and equal numbers of children per parent. In some simple scenarios, the effective population size is the number of breeding individuals in the population. However, for most quantities of interest and most real populations, the census population size N of a real population is usually larger than the effective population size Ne. The same population may have multiple effective population sizes, for different properties of interest, including for different genetic loci.

Genetic load is the difference between the fitness of an average genotype in a population and the fitness of some reference genotype, which may be either the best present in a population, or may be the theoretically optimal genotype. The average individual taken from a population with a low genetic load will generally, when grown in the same conditions, have more surviving offspring than the average individual from a population with a high genetic load. Genetic load can also be seen as reduced fitness at the population level compared to what the population would have if all individuals had the reference high-fitness genotype. High genetic load may put a population in danger of extinction.

Genetic hitchhiking, also called genetic draft or the hitchhiking effect, is when an allele changes frequency not because it itself is under natural selection, but because it is near another gene that is undergoing a selective sweep and that is on the same DNA chain. When one gene goes through a selective sweep, any other nearby polymorphisms that are in linkage disequilibrium will tend to change their allele frequencies too. Selective sweeps happen when newly appeared mutations are advantageous and increase in frequency. Neutral or even slightly deleterious alleles that happen to be close by on the chromosome 'hitchhike' along with the sweep. In contrast, effects on a neutral locus due to linkage disequilibrium with newly appeared deleterious mutations are called background selection. Both genetic hitchhiking and background selection are stochastic (random) evolutionary forces, like genetic drift.

The fixation index (FST) is a measure of population differentiation due to genetic structure. It is frequently estimated from genetic polymorphism data, such as single-nucleotide polymorphisms (SNP) or microsatellites. Developed as a special case of Wright's F-statistics, it is one of the most commonly used statistics in population genetics.

Reproductive value is a concept in demography and population genetics that represents the discounted number of future female children that will be born to a female of a specific age. Ronald Fisher first defined reproductive value in his 1930 book The Genetical Theory of Natural Selection where he proposed that future offspring be discounted at the rate of growth of the population; this implies that sexually reproductive value measures the contribution of an individual of a given age to the future growth of the population.

Genetic history of Europe Aspect of history

The Genetic history of Europe deals with the formation, ethnogenesis, and other DNA-specific information about populations indigenous, or living in Europe.

Balding–Nichols model

In population genetics, the Balding–Nichols model is a statistical description of the allele frequencies in the components of a sub-divided population. With background allele frequency p the allele frequencies, in sub-populations separated by Wright's FSTF, are distributed according to independent draws from

In population genetics, fixation is the change in a gene pool from a situation where there exists at least two variants of a particular gene (allele) in a given population to a situation where only one of the alleles remains. In the absence of mutation or heterozygote advantage, any allele must eventually be lost completely from the population or fixed. Whether a gene will ultimately be lost or fixed is dependent on selection coefficients and chance fluctuations in allelic proportions. Fixation can refer to a gene in general or particular nucleotide position in the DNA chain (locus).

Tajima's D is a population genetic test statistic created by and named after the Japanese researcher Fumio Tajima. Tajima's D is computed as the difference between two measures of genetic diversity: the mean number of pairwise differences and the number of segregating sites, each scaled so that they are expected to be the same in a neutrally evolving population of constant size.

Population structure is the presence of a systematic difference in allele frequencies between subpopulations. In a randomly mating population, allele frequencies are expected to be roughly similar between groups. However, mating tends to be non-random to some degree, causing structure to arise. For example, a barrier like a river can separate two groups of the same species and make it difficult for potential mates to cross; if a mutation occurs, over many generations it can spread and become common in one subpopulation while being completely absent in the other.

Isolation by distance

Isolation by distance (IBD) is a term used to refer to the accrual of local genetic variation under geographically limited dispersal. The IBD model is useful for determining the distribution of gene frequencies over a geographic region. Both dispersal variance and migration probabilities are variables in this model and both contribute to local genetic differentiation. Isolation by distance is usually the simplest model for the cause of genetic isolation between populations. Evolutionary biologists and population geneticists have been exploring varying theories and models for explaining population structure. Yoichi Ishida compares two important theories of isolation by distance and clarifies the relationship between the two. According to Ishida, Sewall Wright's isolation by distance theory is termed ecological isolation by distance while Gustave Malécot's theory is called genetic isolation by distance. Isolation by distance is distantly related to speciation. Multiple types of isolating barriers, namely prezygotic isolating barriers, including isolation by distance, are considered the key factor in keeping populations apart, limiting gene flow.

References

  1. Cavalli-Sforza, L.L., Menozzi, P. & Piazza, A. (1994). The History and Geography of Human Genes. New Jersey: Princeton University Press.
  2. 1 2 3 Nei, M. (1987). "Chapter 9". Molecular Evolutionary Genetics. New York: Columbia University Press.
  3. Ramachandran S, Deshpande O, Roseman CC, Rosenberg NA, Feldman MW, Cavalli-Sforza LL (November 2005). "Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa". Proc Natl Acad Sci U S A. 102 (44): 15942–7. Bibcode:2005PNAS..10215942R. doi: 10.1073/pnas.0507611102 . PMC   1276087 . PMID   16243969.
  4. Ruane J (1999). "A critical review of the value of genetic distance studies in conservation of animal genetic resources". Journal of Animal Breeding and Genetics. 116 (5): 317–323. doi:10.1046/j.1439-0388.1999.00205.x.
  5. 1 2 3 4 Nei, M. (1972). "Genetic distance between populations". Am. Nat. 106 (949): 283–292. doi:10.1086/282771. S2CID   55212907.
  6. 1 2 L.L. Cavalli-Sforza; A.W.F. Edwards (1967). "Phylogenetic Analysis – Models and Estimation Procedures". The American Journal of Human Genetics. 19 (3 Part I (May)): 233–257. PMC   1706274 . PMID   6026583.
  7. 1 2 John Reynolds; B.S. Weir; C. Clark Cockerham (November 1983). "Estimation of the coancestry coefficient: Basis for a short-term genetic distance". Genetics. 105 (3): 767–779. doi:10.1093/genetics/105.3.767. PMC   1202185 . PMID   17246175.
  8. Nei, M. (1987) Genetic distance and molecular phylogeny. In: Population Genetics and Fishery Management (N. Ryman and F. Utter, eds.), University of Washington Press, Seattle, WA, pp. 193–223.
  9. Nei M., Tajima F., Tateno Y. (1983). "Accuracy of estimated phylogenetic trees from molecular data. II. Gene frequency data". J. Mol. Evol. 19 (2): 153–170. doi:10.1007/bf02300753. PMID   6571220. S2CID   19567426.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  10. Takezaki N. (1996). "Genetic distances and reconstruction of phylogenetic trees from microsatellite DNA". Genetics. 144 (1): 389–399. doi:10.1093/genetics/144.1.389. PMC   1207511 . PMID   8878702.
  11. Magalhães TR, Casey JP, Conroy J, Regan R, Fitzpatrick DJ, Shah N; et al. (2012). "HGDP and HapMap analysis by Ancestry Mapper reveals local and global population relationships". PLOS ONE. 7 (11): e49438. Bibcode:2012PLoSO...749438M. doi: 10.1371/journal.pone.0049438 . PMC   3506643 . PMID   23189146.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  12. Gillian Cooper; William Amos; Richard Bellamy; Mahveen Ruby Siddiqui; Angela Frodsham; Adrian V. S. Hill; David C. Rubinsztein (1999). "An Empirical Exploration of the Genetic Distance for 213 Human Microsatellite Markers". The American Journal of Human Genetics. 65 (4): 1125–1133. doi:10.1086/302574. PMC   1288246 . PMID   10486332.
  13. Nei M, Roychoudhury AK (February 1974). "Sampling variances of heterozygosity and genetic distance". Genetics. 76 (2): 379–90. doi:10.1093/genetics/76.2.379. PMC   1213072 . PMID   4822472.
  14. Rogers, J. S. (1972). Measures of similarity and genetic distance. In Studies in Genetics VII. pp. 145−153. University of Texas Publication 7213. Austin, Texas.