Effective population size

Last updated

The effective population size (Ne) is the size of an idealised population that would experience the same rate of genetic drift as the real population. [1] Idealised populations are those following simple one-locus models that comply with assumptions of the neutral theory of molecular evolution. The effective population size is normally smaller than the census population size N, partly because chance events prevent some individuals from breeding, and partly due to background selection and genetic hitchhiking.

Contents

The same real population could have a different effective population size for different properties of interest, such as genetic drift (or more precisely, the speed of coalescence) over one generation vs. over many generations. Within a species, areas of the genome that have more genes and/or less genetic recombination tend to have lower effective population sizes, because of the effects of selection at linked sites. In a population with selection at many loci and abundant linkage disequilibrium, the coalescent effective population size may not reflect the census population size at all, or may reflect its logarithm.

The concept of effective population size was introduced in the field of population genetics in 1931 by the American geneticist Sewall Wright. [2] [3] Some versions of the effective population size are used in wildlife conservation.

Empirical measurements

In a rare experiment that directly measured genetic drift one generation at a time, in Drosophila populations of census size 16, the effective population size was 11.5. [4] This measurement was achieved through studying changes in the frequency of a neutral allele from one generation to another in over 100 replicate populations.

More commonly, effective population size is estimated indirectly by comparing data on current within-species genetic diversity to theoretical expectations. According to the neutral theory of molecular evolution, an idealised diploid population will have a pairwise nucleotide diversity equal to 4Ne, where is the mutation rate. The effective population size can therefore be estimated empirically by dividing the nucleotide diversity by the mutation rate. [5] This captures the cumulative effects of genetic drift, genetic hitchhiking, and background selection over longer timescales. More advanced methods, permitting a changing effective population size over time, have also been developed. [6]

The effective size measured to reflect these longer timescales may have little relationship to the number of individuals physically present in a population. [7] Measured effective population sizes vary between genes in the same population, being low in genome areas of low recombination and high in genome areas of high recombination. [8] [9] Sojourn times are proportional to N in neutral theory, but for alleles under selection, sojourn times are proportional to log(N). Genetic hitchhiking can cause neutral mutations to have sojourn times proportional to log(N): this may explain the relationship between measured effective population size and the local recombination rate. [10]

A survey of publications on 102 mostly wildlife animal and plant species yielded 192 Ne/N ratios. Seven different estimation methods were used in the surveyed studies. Accordingly, the ratios ranged widely from 10-6 for Pacific oysters to 0.994 for humans, with an average of 0.34 across the examined species. Based on these data they subsequently estimated more comprehensive ratios, accounting for fluctuations in population size, variance in family size and unequal sex-ratio. These ratios average to only 0.10-0.11. [11]

A genealogical analysis of human hunter-gatherers (Eskimos) determined the effective-to-census population size ratio for haploid (mitochondrial DNA, Y chromosomal DNA), and diploid (autosomal DNA) loci separately: the ratio of the effective to the census population size was estimated as 0.6–0.7 for autosomal and X-chromosomal DNA, 0.7–0.9 for mitochondrial DNA and 0.5 for Y-chromosomal DNA. [12]

Selection effective size

In an idealised Wright-Fisher model, the fate of an allele, beginning at an intermediate frequency, is largely determined by selection if the selection coefficient s ≫ 1/N, and largely determined by neutral genetic drift if s ≪ 1/N. In real populations, the cutoff value of s may depend instead on local recombination rates. [13] [14] This limit to selection in a real population may be captured in a toy Wright-Fisher simulation through the appropriate choice of Ne. Populations with different selection effective population sizes are predicted to evolve profoundly different genome architectures. [15] [16]

History of theory

Ronald Fisher and Sewall Wright originally defined effective population size as "the number of breeding individuals in an idealised population that would show the same amount of dispersion of allele frequencies under random genetic drift or the same amount of inbreeding as the population under consideration". This implied two potentially different effective population sizes, based either on the one-generation increase in variance across replicate populations (variance effective population size), or on the one-generation change in the inbreeding coefficient (inbreeding effective population size). These two are closely linked, and derived from F-statistics, but they are not identical. [17]

Today, the effective population size is usually estimated empirically with respect to the amount of within-species genetic diversity divided by the mutation rate, yielding a coalescent effective population size that reflects the cumulative effects of genetic drift, background selection, and genetic hitchhiking over longer time periods. [5] Another important effective population size is the selection effective population size 1/scritical, where scritical is the critical value of the selection coefficient at which selection becomes more important than genetic drift. [13]

Variance effective size

In the Wright-Fisher idealized population model, the conditional variance of the allele frequency , given the allele frequency in the previous generation, is

Let denote the same, typically larger, variance in the actual population under consideration. The variance effective population size is defined as the size of an idealized population with the same variance. This is found by substituting for and solving for which gives

In the following examples, one or more of the assumptions of a strictly idealised population are relaxed, while other assumptions are retained. The variance effective population size of the more relaxed population model is then calculated with respect to the strict model.

Variations in population size

Population size varies over time. Suppose there are t non-overlapping generations, then effective population size is given by the harmonic mean of the population sizes: [18]

For example, say the population size was N = 10, 100, 50, 80, 20, 500 for six generations (t = 6). Then the effective population size is the harmonic mean of these, giving:

Note this is less than the arithmetic mean of the population size, which in this example is 126.7. The harmonic mean tends to be dominated by the smallest bottleneck that the population goes through.

Dioeciousness

If a population is dioecious, i.e. there is no self-fertilisation then

or more generally,

where D represents dioeciousness and may take the value 0 (for not dioecious) or 1 for dioecious.

When N is large, Ne approximately equals N, so this is usually trivial and often ignored:

Variance in reproductive success

If population size is to remain constant, each individual must contribute on average two gametes to the next generation. An idealized population assumes that this follows a Poisson distribution so that the variance of the number of gametes contributed, k is equal to the mean number contributed, i.e. 2:

However, in natural populations the variance is often larger than this. The vast majority of individuals may have no offspring, and the next generation stems only from a small number of individuals, so

The effective population size is then smaller, and given by:

Note that if the variance of k is less than 2, Ne is greater than N. In the extreme case of a population experiencing no variation in family size, in a laboratory population in which the number of offspring is artificially controlled, Vk = 0 and Ne = 2N.

Non-Fisherian sex-ratios

When the sex ratio of a population varies from the Fisherian 1:1 ratio, effective population size is given by:

Where Nm is the number of males and Nf the number of females. For example, with 80 males and 20 females (an absolute population size of 100):

Again, this results in Ne being less than N.

Inbreeding effective size

Alternatively, the effective population size may be defined by noting how the average inbreeding coefficient changes from one generation to the next, and then defining Ne as the size of the idealized population that has the same change in average inbreeding coefficient as the population under consideration. The presentation follows Kempthorne (1957). [19]

For the idealized population, the inbreeding coefficients follow the recurrence equation

Using Panmictic Index (1  F) instead of inbreeding coefficient, we get the approximate recurrence equation

The difference per generation is

The inbreeding effective size can be found by solving

This is

.

Theory of overlapping generations and age-structured populations

When organisms live longer than one breeding season, effective population sizes have to take into account the life tables for the species.

Haploid

Assume a haploid population with discrete age structure. An example might be an organism that can survive several discrete breeding seasons. Further, define the following age structure characteristics:

Fisher's reproductive value for age ,
The chance an individual will survive to age , and
The number of newborn individuals per breeding season.

The generation time is calculated as

average age of a reproducing individual

Then, the inbreeding effective population size is [20]

Diploid

Similarly, the inbreeding effective number can be calculated for a diploid population with discrete age structure. This was first given by Johnson, [21] but the notation more closely resembles Emigh and Pollak. [22]

Assume the same basic parameters for the life table as given for the haploid case, but distinguishing between male and female, such as N0ƒ and N0m for the number of newborn females and males, respectively (notice lower case ƒ for females, compared to upper case F for inbreeding).

The inbreeding effective number is


See also

Related Research Articles

In mathematics, the harmonic mean is a kind of average, one of the Pythagorean means.

<span class="mw-page-title-main">Variance</span> Statistical measure of how far values spread from their average

In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value. It is the second central moment of a distribution, and the covariance of the random variable with itself, and it is often represented by , , , , or .

Genetic drift, also known as random genetic drift, allelic drift or the Wright effect, is the change in the frequency of an existing gene variant (allele) in a population due to random chance.

Small populations can behave differently from larger populations. They are often the result of population bottlenecks from larger populations, leading to loss of heterozygosity and reduced genetic diversity and loss or fixation of alleles and shifts in allele frequencies. A small population is then more susceptible to demographic and genetic stochastic events, which can impact the long-term survival of the population. Therefore, small populations are often considered at risk of endangerment or extinction, and are often of conservation concern.

<span class="mw-page-title-main">Heritability</span> Estimation of effect of genetic variation on phenotypic variation of a trait

Heritability is a statistic used in the fields of breeding and genetics that estimates the degree of variation in a phenotypic trait in a population that is due to genetic variation between individuals in that population. The concept of heritability can be expressed in the form of the following question: "What is the proportion of the variation in a given trait within a population that is not explained by the environment or random chance?"

Population genetics is a subfield of genetics that deals with genetic differences within and among populations, and is a part of evolutionary biology. Studies in this branch of biology examine such phenomena as adaptation, speciation, and population structure.

<span class="mw-page-title-main">Hardy–Weinberg principle</span> Principle in genetics

In population genetics, the Hardy–Weinberg principle, also known as the Hardy–Weinberg equilibrium, model, theorem, or law, states that allele and genotype frequencies in a population will remain constant from generation to generation in the absence of other evolutionary influences. These influences include genetic drift, mate choice, assortative mating, natural selection, sexual selection, mutation, gene flow, meiotic drive, genetic hitchhiking, population bottleneck, founder effect,inbreeding and outbreeding depression.

<span class="mw-page-title-main">Quantitative genetics</span> Study of the inheritance of continuously variable traits

Quantitative genetics is the study of quantitative traits, which are phenotypes that vary continuously—such as height or mass—as opposed to phenotypes and gene-products that are discretely identifiable—such as eye-colour, or the presence of a particular biochemical.

<span class="mw-page-title-main">Standard error</span> Statistical property

The standard error (SE) of a statistic is the standard deviation of its sampling distribution or an estimate of that standard deviation. If the statistic is the sample mean, it is called the standard error of the mean (SEM). The standard error is a key ingredient in producing confidence intervals.

<span class="mw-page-title-main">Cramér–Rao bound</span> Lower bound on variance of an estimator

In estimation theory and statistics, the Cramér–Rao bound (CRB) relates to estimation of a deterministic parameter. The result is named in honor of Harald Cramér and Calyampudi Radhakrishna Rao, but has also been derived independently by Maurice Fréchet, Georges Darmois, and by Alexander Aitken and Harold Silverstone. It is also known as Fréchet-Cramér–Rao or Fréchet-Darmois-Cramér-Rao lower bound. It states that the precision of any unbiased estimator is at most the Fisher information; or (equivalently) the reciprocal of the Fisher information is a lower bound on its variance.

The coefficient of relationship is a measure of the degree of consanguinity between two individuals. The term coefficient of relationship was defined by Sewall Wright in 1922, and was derived from his definition of the coefficient of inbreeding of 1921. The measure is most commonly used in genetics and genealogy. A coefficient of inbreeding can be calculated for an individual, and is typically one-half the coefficient of relationship between the parents.

In population genetics, F-statistics describe the statistically expected level of heterozygosity in a population; more specifically the expected degree of (usually) a reduction in heterozygosity when compared to Hardy–Weinberg expectation.

<span class="mw-page-title-main">Genetic distance</span> Measure of divergence between populations

Genetic distance is a measure of the genetic divergence between species or between populations within a species, whether the distance measures time from common ancestor or degree of differentiation. Populations with many similar alleles have small genetic distances. This indicates that they are closely related and have a recent common ancestor.

Coalescent theory is a model of how alleles sampled from a population may have originated from a common ancestor. In the simplest case, coalescent theory assumes no recombination, no natural selection, and no gene flow or population structure, meaning that each variant is equally likely to have been passed from one generation to the next. The model looks backward in time, merging alleles into a single ancestral copy according to a random process in coalescence events. Under this model, the expected time between successive coalescence events increases almost exponentially back in time. Variance in the model comes from both the random passing of alleles from one generation to the next, and the random occurrence of mutations in these alleles.

Genetic hitchhiking, also called genetic draft or the hitchhiking effect, is when an allele changes frequency not because it itself is under natural selection, but because it is near another gene that is undergoing a selective sweep and that is on the same DNA chain. When one gene goes through a selective sweep, any other nearby polymorphisms that are in linkage disequilibrium will tend to change their allele frequencies too. Selective sweeps happen when newly appeared mutations are advantageous and increase in frequency. Neutral or even slightly deleterious alleles that happen to be close by on the chromosome 'hitchhike' along with the sweep. In contrast, effects on a neutral locus due to linkage disequilibrium with newly appeared deleterious mutations are called background selection. Both genetic hitchhiking and background selection are stochastic (random) evolutionary forces, like genetic drift.

In population genetics, fixation is the change in a gene pool from a situation where there exists at least two variants of a particular gene (allele) in a given population to a situation where only one of the alleles remains. That is, the allele becomes fixed. In the absence of mutation or heterozygote advantage, any allele must eventually either be lost completely from the population, or fixed, i.e. permanently established at 100% frequency in the population. Whether a gene will ultimately be lost or fixed is dependent on selection coefficients and chance fluctuations in allelic proportions. Fixation can refer to a gene in general or particular nucleotide position in the DNA chain (locus).

The nearly neutral theory of molecular evolution is a modification of the neutral theory of molecular evolution that accounts for the fact that not all mutations are either so deleterious such that they can be ignored, or else neutral. Slightly deleterious mutations are reliably purged only when their selection coefficient are greater than one divided by the effective population size. In larger populations, a higher proportion of mutations exceed this threshold for which genetic drift cannot overpower selection, leading to fewer fixation events and so slower molecular evolution.

A Moran process or Moran model is a simple stochastic process used in biology to describe finite populations. The process is named after Patrick Moran, who first proposed the model in 1958. It can be used to model variety-increasing processes such as mutation as well as variety-reducing effects such as genetic drift and natural selection. The process can describe the probabilistic dynamics in a finite population of constant size N in which two alleles A and B are competing for dominance. The two alleles are considered to be true replicators.

<span class="mw-page-title-main">Isolation by distance</span>

Isolation by distance (IBD) is a term used to refer to the accrual of local genetic variation under geographically limited dispersal. The IBD model is useful for determining the distribution of gene frequencies over a geographic region. Both dispersal variance and migration probabilities are variables in this model and both contribute to local genetic differentiation. Isolation by distance is usually the simplest model for the cause of genetic isolation between populations. Evolutionary biologists and population geneticists have been exploring varying theories and models for explaining population structure. Yoichi Ishida compares two important theories of isolation by distance and clarifies the relationship between the two. According to Ishida, Sewall Wright's isolation by distance theory is termed ecological isolation by distance while Gustave Malécot's theory is called genetic isolation by distance. Isolation by distance is distantly related to speciation. Multiple types of isolating barriers, namely prezygotic isolating barriers, including isolation by distance, are considered the key factor in keeping populations apart, limiting gene flow.

Genetic purging is the increased pressure of natural selection against deleterious alleles prompted by inbreeding.

References

  1. "Effective population size". Blackwell Publishing . Retrieved 4 March 2018.
  2. Wright S (1931). "Evolution in Mendelian populations" (PDF). Genetics . 16 (2): 97–159. doi:10.1093/genetics/16.2.97. PMC   1201091 . PMID   17246615.
  3. Wright S (1938). "Size of population and breeding structure in relation to evolution". Science . 87 (2263): 430–431. doi:10.1126/science.87.2263.425-a.
  4. Buri, P (1956). "Gene frequency in small populations of mutant Drosophila". Evolution. 10 (4): 367–402. doi:10.2307/2406998. JSTOR   2406998.
  5. 1 2 Lynch, M.; Conery, J.S. (2003). "The origins of genome complexity". Science. 302 (5649): 1401–1404. Bibcode:2003Sci...302.1401L. CiteSeerX   10.1.1.135.974 . doi:10.1126/science.1089370. PMID   14631042. S2CID   11246091.
  6. Weinreich, Daniel M. (2023). The foundations of population genetics. Cambridge, Massachusetts: The MIT Press. ISBN   0262047578.
  7. Gillespie, JH (2001). "Is the population size of a species relevant to its evolution?". Evolution. 55 (11): 2161–2169. doi: 10.1111/j.0014-3820.2001.tb00732.x . PMID   11794777.
  8. Hahn, Matthew W. (2008). "Toward a selection theory of molecular evolution". Evolution. 62 (2): 255–265. doi: 10.1111/j.1558-5646.2007.00308.x . PMID   18302709.
  9. Masel, Joanna (2012). "Rethinking Hardy–Weinberg and genetic drift in undergraduate biology". BioEssays. 34 (8): 701–10. doi:10.1002/bies.201100178. PMID   22576789. S2CID   28513167.
  10. Neher, Richard A. (23 November 2013). "Genetic Draft, Selective Interference, and Population Genetics of Rapid Adaptation". Annual Review of Ecology, Evolution, and Systematics. 44 (1): 195–215. doi:10.1146/annurev-ecolsys-110512-135920.
  11. R. Frankham (1995). "Effective population size/adult population size ratios in wildlife: a review". Genetics Research. 66 (2): 95–107. doi: 10.1017/S0016672300034455 .
  12. S. Matsumura; P. Forster (2008). "Generation time and effective population size in Polar Eskimos". Proc Biol Sci. 275 (1642): 1501–1508. doi:10.1098/rspb.2007.1724. PMC   2602656 . PMID   18364314.
  13. 1 2 R.A. Neher; B.I. Shraiman (2011). "Genetic Draft and Quasi-Neutrality in Large Facultatively Sexual Populations". Genetics. 188 (4): 975–996. arXiv: 1108.1635 . doi:10.1534/genetics.111.128876. PMC   3176096 . PMID   21625002.
  14. Daniel B. Weissman; Nicholas H. Barton (2012). "Limits to the Rate of Adaptive Substitution in Sexual Populations". PLOS Genetics. 8 (6): e1002740. doi: 10.1371/journal.pgen.1002740 . PMC   3369949 . PMID   22685419.
  15. Lynch, Michael (2007). The Origins of Genome Architecture. Sinauer Associates. ISBN   978-0-87893-484-3.
  16. Rajon, E.; Masel, J. (2011). "Evolution of molecular error rates and the consequences for evolvability". PNAS. 108 (3): 1082–1087. Bibcode:2011PNAS..108.1082R. doi: 10.1073/pnas.1012918108 . PMC   3024668 . PMID   21199946.
  17. James F. Crow (2010). "Wright and Fisher on Inbreeding and Random Drift". Genetics. 184 (3): 609–611. doi:10.1534/genetics.109.110023. PMC   2845331 . PMID   20332416.
  18. Karlin, Samuel (1968-09-01). "Rates of Approach to Homozygosity for Finite Stochastic Models with Variable Population Size". The American Naturalist. 102 (927): 443–455. doi:10.1086/282557. ISSN   0003-0147. S2CID   83824294.
  19. Kempthorne O (1957). An Introduction to Genetic Statistics. Iowa State University Press.
  20. Felsenstein J (1971). "Inbreeding and variance effective numbers in populations with overlapping generations". Genetics . 68 (4): 581–597. doi:10.1093/genetics/68.4.581. PMC   1212678 . PMID   5166069.
  21. Johnson DL (1977). "Inbreeding in populations with overlapping generations". Genetics . 87 (3): 581–591. doi:10.1093/genetics/87.3.581. PMC   1213763 . PMID   17248780.
  22. Emigh TH, Pollak E (1979). "Fixation probabilities and effective population numbers in diploid populations with overlapping generations". Theoretical Population Biology. 15 (1): 86–107. Bibcode:1979TPBio..15...86E. doi:10.1016/0040-5809(79)90028-5.