Error catastrophe

Last updated

Error catastrophe refers to the cumulative loss of genetic information in a lineage of organisms due to high mutation rates. The mutation rate above which error catastrophe occurs is called the error threshold. Both terms were coined by Manfred Eigen in his mathematical evolutionary theory of the quasispecies. [1]

Contents

The term is most widely used to refer to mutation accumulation to the point of inviability of the organism or virus, where it cannot produce enough viable offspring to maintain a population. This use of Eigen's term was adopted by Lawrence Loeb and colleagues to describe the strategy of lethal mutagenesis to cure HIV by using mutagenic ribonucleoside analogs. [2] [3]

There was an earlier use of the term introduced in 1963 by Leslie Orgel in a theory for cellular aging, in which errors in the translation of proteins involved in protein translation would amplify the errors until the cell was inviable. [4] This theory has not received empirical support. [5]

Error catastrophe is predicted in certain mathematical models of evolution and has also been observed empirically. [6]

Like every organism, viruses 'make mistakes' (or mutate) during replication. The resulting mutations increase biodiversity among the population and help subvert the ability of a host's immune system to recognise it in a subsequent infection. The more mutations the virus makes during replication, the more likely it is to avoid recognition by the immune system and the more diverse its population will be (see the article on biodiversity for an explanation of the selective advantages of this). However, if it makes too many mutations, it may lose some of its biological features which have evolved to its advantage, including its ability to reproduce at all.

The question arises: how many mutations can be made during each replication before the population of viruses begins to lose self-identity?

Basic mathematical model

Consider a virus which has a genetic identity modeled by a string of ones and zeros (e.g. 11010001011101....). Suppose that the string has fixed length L and that during replication the virus copies each digit one by one, making a mistake with probability q independently of all other digits.

Due to the mutations resulting from erroneous replication, there exist up to 2L distinct strains derived from the parent virus. Let xi denote the concentration of strain i; let ai denote the rate at which strain i reproduces; and let Qij denote the probability of a virus of strain i mutating to strain j.

Then the rate of change of concentration xj is given by

At this point, we make a mathematical idealisation: we pick the fittest strain (the one with the greatest reproduction rate aj) and assume that it is unique (i.e. that the chosen aj satisfies aj > ai for all i); and we then group the remaining strains into a single group. Let the concentrations of the two groups be x , y with reproduction rates a>b, respectively; let Q be the probability of a virus in the first group (x) mutating to a member of the second group (y) and let R be the probability of a member of the second group returning to the first (via an unlikely and very specific mutation). The equations governing the development of the populations are:

We are particularly interested in the case where L is very large, so we may safely neglect R and instead consider:

Then setting z = x/y we have

.

Assuming z achieves a steady concentration over time, z settles down to satisfy

(which is deduced by setting the derivative of z with respect to time to zero).

So the important question is under what parameter values does the original population persist (continue to exist)? The population persists if and only if the steady state value of z is strictly positive. i.e. if and only if:

This result is more popularly expressed in terms of the ratio of a:b and the error rate q of individual digits: set b/a = (1-s), then the condition becomes

Taking a logarithm on both sides and approximating for small q and s one gets

reducing the condition to:

RNA viruses which replicate close to the error threshold have a genome size of order 104 (10000) base pairs. Human DNA is about 3.3 billion (109) base units long. This means that the replication mechanism for human DNA must be orders of magnitude more accurate than for the RNA of RNA viruses.

Information-theory based presentation

To avoid error catastrophe, the amount of information lost through mutation must be less than the amount gained through natural selection. This fact can be used to arrive at essentially the same equations as the more common differential presentation. [7]

The information lost can be quantified as the genome length L times the replication error rate q. The probability of survival, S, determines the amount of information contributed by natural selection— and information is the negative log of probability. Therefore, a genome can only survive unchanged when

For example, the very simple genome where L = 1 and q = 1 is a genome with one bit which always mutates. Since Lq is then 1, it follows that S has to be ½ or less. This corresponds to half the offspring surviving; namely the half with the correct genome.

Applications

Some viruses such as polio or hepatitis C operate very close to the critical mutation rate (i.e. the largest q that L will allow). Drugs have been created to increase the mutation rate of the viruses in order to push them over the critical boundary so that they lose self-identity. However, given the criticism of the basic assumption of the mathematical model, this approach is problematic. [8]

The result introduces a Catch-22 mystery for biologists, Eigen's paradox: in general, large genomes are required for accurate replication (high replication rates are achieved by the help of enzymes), but a large genome requires a high accuracy rate q to persist. Which comes first and how does it happen? An illustration of the difficulty involved is L can only be 100 if q' is 0.99 - a very small string length in terms of genes.[ citation needed ]

See also

Related Research Articles

Mutagenesis is a process by which the genetic information of an organism is changed by the production of a mutation. It may occur spontaneously in nature, or as a result of exposure to mutagens. It can also be achieved experimentally using laboratory procedures. A mutagen is a mutation-causing agent, be it chemical or physical, which results in an increased rate of mutations in an organism's genetic code. In nature mutagenesis can lead to cancer and various heritable diseases, and it is also a driving force of evolution. Mutagenesis as a science was developed based on work done by Hermann Muller, Charlotte Auerbach and J. M. Robson in the first half of the 20th century.

<span class="mw-page-title-main">Mutation</span> Alteration in the nucleotide sequence of a genome

In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, mitosis, or meiosis or other types of damage to DNA, which then may undergo error-prone repair, cause an error during other forms of repair, or cause an error during replication. Mutations may also result from insertion or deletion of segments of DNA due to mobile genetic elements.

The quasispecies model is a description of the process of the Darwinian evolution of certain self-replicating entities within the framework of physical chemistry. A quasispecies is a large group or "cloud" of related genotypes that exist in an environment of high mutation rate, where a large fraction of offspring are expected to contain one or more mutations relative to the parent. This is in contrast to a species, which from an evolutionary perspective is a more-or-less stable single genotype, most of the offspring of which will be genetically accurate copies.

<span class="mw-page-title-main">Exponential distribution</span> Probability distribution

In probability theory and statistics, the exponential distribution or negative exponential distribution is the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate. It is a particular case of the gamma distribution. It is the continuous analogue of the geometric distribution, and it has the key property of being memoryless. In addition to being used for the analysis of Poisson point processes it is found in various other contexts.

<span class="mw-page-title-main">Log-normal distribution</span> Probability distribution

In probability theory, a log-normal (or lognormal) distribution is a continuous probability distribution of a random variable whose logarithm is normally distributed. Thus, if the random variable X is log-normally distributed, then Y = ln(X) has a normal distribution. Equivalently, if Y has a normal distribution, then the exponential function of Y, X = exp(Y), has a log-normal distribution. A random variable which is log-normally distributed takes only positive real values. It is a convenient and useful model for measurements in exact and engineering sciences, as well as medicine, economics and other topics (e.g., energies, concentrations, lengths, prices of financial instruments, and other metrics).

<span class="mw-page-title-main">Gamma distribution</span> Probability distribution

In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution. There are two equivalent parameterizations in common use:

  1. With a shape parameter and a scale parameter .
  2. With a shape parameter and an inverse scale parameter , called a rate parameter.
<span class="mw-page-title-main">Logistic regression</span> Statistical model for a binary dependent variable

In statistics, the logistic model is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear combination of one or more independent variables. In regression analysis, logistic regression is estimating the parameters of a logistic model. Formally, in binary logistic regression there is a single binary dependent variable, coded by an indicator variable, where the two values are labeled "0" and "1", while the independent variables can each be a binary variable or a continuous variable. The corresponding probability of the value labeled "1" can vary between 0 and 1, hence the labeling; the function that converts log-odds to probability is the logistic function, hence the name. The unit of measurement for the log-odds scale is called a logit, from logistic unit, hence the alternative names. See § Background and § Definition for formal mathematics, and § Example for a worked example.

In physics, a partition function describes the statistical properties of a system in thermodynamic equilibrium. Partition functions are functions of the thermodynamic state variables, such as the temperature and volume. Most of the aggregate thermodynamic variables of the system, such as the total energy, free energy, entropy, and pressure, can be expressed in terms of the partition function or its derivatives. The partition function is dimensionless.

<span class="mw-page-title-main">Mutation rate</span> Rate at which mutations occur during some unit of time

In genetics, the mutation rate is the frequency of new mutations in a single gene or organism over time. Mutation rates are not constant and are not limited to a single type of mutation; there are many different types of mutations. Mutation rates are given for specific classes of mutations. Point mutations are a class of mutations which are changes to a single base. Missense and Nonsense mutations are two subtypes of point mutations. The rate of these types of substitutions can be further subdivided into a mutation spectrum which describes the influence of the genetic context on the mutation rate.

<span class="mw-page-title-main">Luria–Delbrück experiment</span>

The Luria–Delbrück experiment (1943) demonstrated that in bacteria, genetic mutations arise in the absence of selective pressure rather than being a response to it. Thus, it concluded Darwin's theory of natural selection acting on random mutations applies to bacteria as well as to more complex organisms. Max Delbrück and Salvador Luria won the 1969 Nobel Prize in Physiology or Medicine in part for this work.

In information theory, the cross-entropy between two probability distributions and over the same underlying set of events measures the average number of bits needed to identify an event drawn from the set if a coding scheme used for the set is optimized for an estimated probability distribution , rather than the true distribution .

In evolutionary biology and population genetics, the error threshold is a limit on the number of base pairs a self-replicating molecule may have before mutation will destroy the information in subsequent generations of the molecule. The error threshold is crucial to understanding "Eigen's paradox".

A viral quasispecies is a population structure of viruses with a large number of variant genomes. Quasispecies result from high mutation rates as mutants arise continually and change in relative frequency as viral replication and selection proceeds.

<span class="mw-page-title-main">Viscoplasticity</span> Theory in continuum mechanics

Viscoplasticity is a theory in continuum mechanics that describes the rate-dependent inelastic behavior of solids. Rate-dependence in this context means that the deformation of the material depends on the rate at which loads are applied. The inelastic behavior that is the subject of viscoplasticity is plastic deformation which means that the material undergoes unrecoverable deformations when a load level is reached. Rate-dependent plasticity is important for transient plasticity calculations. The main difference between rate-independent plastic and viscoplastic material models is that the latter exhibit not only permanent deformations after the application of loads but continue to undergo a creep flow as a function of time under the influence of the applied load.

The term proofreading is used in genetics to refer to the error-correcting processes, first proposed by John Hopfield and Jacques Ninio, involved in DNA replication, immune system specificity, enzyme-substrate recognition among many other processes that require enhanced specificity. The proofreading mechanisms of Hopfield and Ninio are non-equilibrium active processes that consume ATP to enhance specificity of various biochemical reactions.

A Moran process or Moran model is a simple stochastic process used in biology to describe finite populations. The process is named after Patrick Moran, who first proposed the model in 1958. It can be used to model variety-increasing processes such as mutation as well as variety-reducing effects such as genetic drift and natural selection. The process can describe the probabilistic dynamics in a finite population of constant size N in which two alleles A and B are competing for dominance. The two alleles are considered to be true replicators.

<span class="mw-page-title-main">Poisson distribution</span> Discrete probability distribution

In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event. It is named after French mathematician Siméon Denis Poisson. The Poisson distribution can also be used for the number of events in other specified interval types such as distance, area, or volume. It plays an important role for discrete-stable distributions.

<span class="mw-page-title-main">Hypercycle (chemistry)</span> Cyclic sequence of self-reproducing single cycles

In chemistry, a hypercycle is an abstract model of organization of self-replicating molecules connected in a cyclic, autocatalytic manner. It was introduced in an ordinary differential equation (ODE) form by the Nobel Prize in Chemistry winner Manfred Eigen in 1971 and subsequently further extended in collaboration with Peter Schuster. It was proposed as a solution to the error threshold problem encountered during modelling of replicative molecules that hypothetically existed on the primordial Earth. As such, it explained how life on Earth could have begun using only relatively short genetic sequences, which in theory were too short to store all essential information. The hypercycle is a special case of the replicator equation. The most important properties of hypercycles are autocatalytic growth competition between cycles, once-for-ever selective behaviour, utilization of small selective advantage, rapid evolvability, increased information capacity, and selection against parasitic branches.

SNV calling from NGS data is any of a range of methods for identifying the existence of single nucleotide variants (SNVs) from the results of next generation sequencing (NGS) experiments. These are computational techniques, and are in contrast to special experimental methods based on known population-wide single nucleotide polymorphisms. Due to the increasing abundance of NGS data, these techniques are becoming increasingly popular for performing SNP genotyping, with a wide variety of algorithms designed for specific experimental designs and applications. In addition to the usual application domain of SNP genotyping, these techniques have been successfully adapted to identify rare SNPs within a population, as well as detecting somatic SNVs within an individual using multiple tissue samples.

<span class="mw-page-title-main">Generative adversarial network</span> Deep learning method

A generative adversarial network (GAN) is a class of machine learning framework and a prominent framework for approaching generative AI. The concept was initially developed by Ian Goodfellow and his colleagues in June 2014. In a GAN, two neural networks contest with each other in the form of a zero-sum game, where one agent's gain is another agent's loss.

References

  1. Eigen M (October 1971). "Selforganization of matter and the evolution of biological macromolecules". Die Naturwissenschaften. 58 (10): 465–523. Bibcode:1971NW.....58..465E. doi:10.1007/BF00623322. PMID   4942363. S2CID   38296619.
  2. Hizi, A; Kamath-Loeb, AS; Rose, KD; Loeb, LA (1997). "Mutagenesis by human immunodeficiency virus reverse transcriptase: incorporation of O6-methyldeoxyguanosine triphosphate". Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis. 374 (1): 41–50. doi:10.1016/S0027-5107(96)00217-5. PMID   9067414 . Retrieved 3 October 2021.
  3. Loeb, LA; Mullins, JI (2000). "Perspective-Lethal Mutagenesis of HIV by Mutagenic Ribonucleoside Analogs". AIDS Research and Human Retroviruses. 16 (1): 1–3. doi:10.1089/088922200309539. PMID   10628810 . Retrieved 3 October 2021.
  4. Orgel, Leslie E. (1963). "The maintenance of the accuracy of protein synthesis and its relevance to ageing". Proc. Natl. Acad. Sci. USA. 49 (4): 517–521. Bibcode:1963PNAS...49..517O. doi: 10.1073/pnas.49.4.517 . PMC   299893 . PMID   13940312.
  5. Michael R. Rose (1991). Evolutionary Biology of Aging . New York, NY: Oxford University Press. pp.  147–152.
  6. Pariente, N; Sierra, S; Airaksinen, A (2005). "Action of mutagenic agents and antiviral inhibitors on foot-and-mouth disease virus". Virus Res. 107 (2): 183–93. doi:10.1016/j.virusres.2004.11.008. PMID   15649564.
  7. M. Barbieri, The Organic Codes, p. 140
  8. Summers; Litwin (2006). "Examining The Theory of Error Catastrophe". Journal of Virology. 80 (1): 20–26. doi:10.1128/JVI.80.1.20-26.2006. PMC   1317512 . PMID   16352527.