Evolution by gene duplication

Last updated

Evolution by gene duplication is an event by which a gene or part of a gene can have two identical copies that can not be distinguished from each other. This phenomenon is understood to be an important source of novelty in evolution, providing for an expanded repertoire of molecular activities. The underlying mutational event of duplication may be a conventional gene duplication mutation within a chromosome, or a larger-scale event involving whole chromosomes (aneuploidy) or whole genomes (polyploidy). A classic view, owing to Susumu Ohno, [1] which is known as Ohno model, he explains how duplication creates redundancy, the redundant copy accumulates beneficial mutations which provides fuel for innovation. [2] Knowledge of evolution by gene duplication has advanced more rapidly in the past 15 years due to new genomic data, more powerful computational methods of comparative inference, and new evolutionary models.

Contents

Theoretical models

Several models exist that try to explain how new cellular functions of genes and their encoded protein products evolve through the mechanism of duplication and divergence. Although each model can explain certain aspects of the evolutionary process, the relative importance of each aspect is still unclear. This page only presents which theoretical models are currently discussed in the literature. Review articles on this topic can be found at the bottom.

In the following, a distinction will be made between explanations for the short-term effects (preservation) of a gene duplication and its long-term outcomes.

Preservation of gene duplicates

Since a gene duplication occurs in only one cell, either in a single-celled organism or in the germ cell of a multi-cellular organism, its carrier (i.e. the organism) usually has to compete against other organisms that do not carry the duplication. If the duplication disrupts the normal functioning of an organism, the organism has a reduced reproductive success (or low fitness) compared to its competitors and will most likely die out rapidly. If the duplication has no effect on fitness, it might be maintained in a certain proportion of a population. In certain cases, the duplication of a certain gene might be immediately beneficial, providing its carrier with a fitness advantage.

Dosage effect or gene amplification

The so-called 'dosage' of a gene refers to the amount of mRNA transcripts and subsequently translated protein molecules produced from a gene per time and per cell. If the amount of gene product is below its optimal level, there are two kinds of mutations that can increase dosage: increases in gene expression by promoter mutations and increases in gene copy number by gene duplication[ citation needed ].

The more copies of the same (duplicated) gene a cell has in its genome, the more gene product can be produced simultaneously. Assuming that no regulatory feedback loops exist that automatically down-regulate gene expression, the amount of gene product (or gene dosage) will increase with each additional gene copy, until some upper limit is reached or sufficient gene product is available.

Furthermore, under positive selection for increased dosage, a duplicated gene could be immediately advantageous and quickly increase in frequency in a population. In this case, no further mutations would be necessary to preserve (or retain) the duplicates. However, at a later time, such mutations could still occur, leading to genes with different functions (see below).

Gene dosage effects after duplication can also be harmful to a cell and the duplication might therefore be selected against. For instance, when the metabolic network within a cell is fine-tuned so that it can only tolerate a certain amount of a certain gene product, gene duplication would offset this balance[ citation needed ].

Activity reducing mutations

In cases of gene duplications that have no immediate fitness effect, a retention of the duplicate copy could still be possible if both copies accumulate mutations that for instance reduce the functional efficiency of the encoded proteins without inhibiting this function altogether. In such a case, the molecular function (e.g. protein/enzyme activity) would still be available to the cell to at least the extent that was available before duplication (now provided by proteins expressed from two gene loci, instead of one gene locus). However, the accidental loss of one gene copy might then be detrimental, since one copy of the gene with reduced activity would almost certainly lie below the activity that was available before duplication.[ citation needed ]

Long-term fate of duplicated genes

If a gene duplication is preserved, the most likely fate is that random mutations in one duplicate gene copy will eventually cause the gene to become non-functional . [3] Such non-functional remnants of genes, with detectable sequence homology, can sometimes still be found in genomes and are called pseudogenes.

Functional divergence between the duplicate genes is another possible fate. There are several theoretical models that try to explain the mechanisms leading to divergence:

Neofunctionalization

The term neofunctionalization was first coined by Force et al. 1999, [4] but it refers to the general mechanism proposed by Ohno 1970. [1] The long-term outcome of Neofunctionalization is that one copy retains the original (pre-duplication) function of the gene, while the second copy acquires a distinct function. It is also known as the MDN model, "mutation during non-functionality". The major criticism of this model is the high likelihood of non-functionalization, i.e. the loss of all functionality of the gene, due to random accumulation of mutations. [5] [6]

IAD model

IAD stands for 'innovation, amplification, divergence' and aims to explain evolution of new gene functions while preserving its existing functions. [5] Innovation, i.e. the establishment of a new molecular function, can occur via side-activities of genes and thus proteins this is called Enzyme promiscuity. [7] For example, enzymes can sometimes catalyse more than just one reaction, even though they usually are optimised for catalysing just one reaction. Such promiscuous protein functions, if they provide an advantage to the host organism, can then be amplified with additional copies of the gene. Such a rapid amplification is best known from bacteria that often carry certain genes on smaller non-chromosomal DNA molecules (called plasmids) which are capable of rapid replication. Any gene on such a plasmid is also replicated and the additional copies amplify the expression of the encoded proteins, and with it any promiscuous function. After several such copies have been made, and are also passed on to descendent bacterial cells, a few of these copies might accumulate mutations that eventually will lead to a side-activity becoming the main activity.

The IAD model have been previously tested in the lab by using bacterial enzyme with dual function as starting point. This enzyme is capable of catalyzing not only its original function, but also side function that can carried out by other enzyme. By allowing the bacteria with this enzyme to evolve under selection to improve both activities (original and side) for several generations, it was shown that one ancestral bifunctional gene with poor activities (Innovation) evolved first by gene amplification to increase expression of the poor enzyme, and later accumulated more beneficial mutations that improved one or both of the activities that can be passed on to the next generation (divergence) [2]

Subfunctionalization

"Subfunctionalization" was also first coined by Force et al. 1999. [4] This model requires the ancestral (pre-duplication) gene to have several functions (sub-functions), which the descendant (post-duplication) genes specialise on in a complementary fashion. There are now at least two different models that are labeled as subfunctionalization, "DDC" and "EAC".

DDC model

DDC stands for "duplication-degeneration-complementation". This model was first introduced by Force et al. 1999. [4] The first step is gene duplication. The gene duplication in itself is neither advantageous, nor deleterious, so it will remain at low frequency within a population of individuals that do not carry a duplication. According to DDC, this period of neutral drift may eventually lead to the complementary retention of sub-functions distributed over the two gene copies. This comes about by activity reducing (degenerative) mutations in both duplicates, accumulating over time periods and many generations. Taken together, the two mutated genes provide the same set of functions as the ancestral gene (before duplication). However, if one of the genes was removed, the remaining gene would not be able to provide the full set of functions and the host cell would likely suffer some detrimental consequences. Therefore, at this later stage of the process, there is a strong selection pressure against removing any of the two gene copies that arose by gene duplication. The duplication becomes permanently established in the genome of the host cell or organism.

EAC model

EAC stands for "Escape from Adaptive Conflict". This name first appeared in a publication by Hittinger and Carroll 2007. [8] The evolutionary process described by the EAC model actually begins before the gene duplication event. A singleton (not duplicated) gene evolves towards two beneficial functions simultaneously. This creates an "adaptive conflict" for the gene, since it is unlikely to execute each individual function with maximum efficiency. The intermediate evolutionary result could be a multi-functional gene and after a gene duplication its sub-functions could be carried out by specialised descendants of the gene. The result would be the same as under the DDC model, two functionally specialised genes (paralogs). In contrast to the DDC model, the EAC model puts more emphasis on the multi-functional pre-duplication state of the evolving genes and gives a slightly different explanation as to why the duplicated multi-functional genes would benefit from additional specialisation after duplication (because of the adaptive conflict of the multi-functional ancestor that needs to be resolved). Under EAC there is an assumption of a positive selection pressure driving evolution after gene duplication, whereas the DDC model only requires neutral ("undirected") evolution to take place, i.e. degeneration and complementation.

See also

Related Research Articles

<span class="mw-page-title-main">Mutation</span> Alteration in the nucleotide sequence of a genome

In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, mitosis, or meiosis or other types of damage to DNA, which then may undergo error-prone repair, cause an error during other forms of repair, or cause an error during replication. Mutations may also result from insertion or deletion of segments of DNA due to mobile genetic elements.

<span class="mw-page-title-main">Molecular evolution</span> Process of change in the sequence composition of cellular molecules across generations

Molecular evolution is the process of change in the sequence composition of cellular molecules such as DNA, RNA, and proteins across generations. The field of molecular evolution uses principles of evolutionary biology and population genetics to explain patterns in these changes. Major topics in molecular evolution concern the rates and impacts of single nucleotide changes, neutral evolution vs. natural selection, origins of new genes, the genetic nature of complex traits, the genetic basis of speciation, the evolution of development, and ways that evolutionary forces influence genomic and phenotypic changes.

<span class="mw-page-title-main">Pseudogene</span> Functionless relative of a gene

Pseudogenes are nonfunctional segments of DNA that resemble functional genes. Most arise as superfluous copies of functional genes, either directly by gene duplication or indirectly by reverse transcription of an mRNA transcript. Pseudogenes are usually identified when genome sequence analysis finds gene-like sequences that lack regulatory sequences needed for transcription or translation, or whose coding sequences are obviously defective due to frameshifts or premature stop codons. Pseudogenes are a type of junk DNA.

Gene duplication is a major mechanism through which new genetic material is generated during molecular evolution. It can be defined as any duplication of a region of DNA that contains a gene. Gene duplications can arise as products of several types of errors in DNA replication and repair machinery as well as through fortuitous capture by selfish genetic elements. Common sources of gene duplications include ectopic recombination, retrotransposition event, aneuploidy, polyploidy, and replication slippage.

<span class="mw-page-title-main">Gene family</span> Set of several similar genes

A gene family is a set of several similar genes, formed by duplication of a single original gene, and generally with similar biochemical functions. One such family are the genes for human hemoglobin subunits; the ten genes are in two clusters on different chromosomes, called the α-globin and β-globin loci. These two gene clusters are thought to have arisen as a result of a precursor gene being duplicated approximately 500 million years ago.

<span class="mw-page-title-main">Sequence homology</span> Shared ancestry between DNA, RNA or protein sequences

Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a speciation event (orthologs), or a duplication event (paralogs), or else a horizontal gene transfer event (xenologs).

<span class="mw-page-title-main">Gene</span> Sequence of DNA or RNA that codes for an RNA or protein product

In biology, the word gene can have several different meanings. The Mendelian gene is a basic unit of heredity and the molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protein-coding genes and noncoding genes.

<span class="mw-page-title-main">Directed evolution</span> Protein engineering method

Directed evolution (DE) is a method used in protein engineering that mimics the process of natural selection to steer proteins or nucleic acids toward a user-defined goal. It consists of subjecting a gene to iterative rounds of mutagenesis, selection and amplification. It can be performed in vivo, or in vitro. Directed evolution is used both for protein engineering as an alternative to rationally designing modified proteins, as well as for experimental evolution studies of fundamental evolutionary principles in a controlled, laboratory environment.

Neutral mutations are changes in DNA sequence that are neither beneficial nor detrimental to the ability of an organism to survive and reproduce. In population genetics, mutations in which natural selection does not affect the spread of the mutation in a species are termed neutral mutations. Neutral mutations that are inheritable and not linked to any genes under selection will be lost or will replace all other alleles of the gene. That loss or fixation of the gene proceeds based on random sampling known as genetic drift. A neutral mutation that is in linkage disequilibrium with other alleles that are under selection may proceed to loss or fixation via genetic hitchhiking and/or background selection.

<span class="mw-page-title-main">Gene cluster</span>

A gene family is a set of homologous genes within one organism. A gene cluster is a group of two or more genes found within an organism's DNA that encode similar polypeptides, or proteins, which collectively share a generalized function and are often located within a few thousand base pairs of each other. The size of gene clusters can vary significantly, from a few genes to several hundred genes. Portions of the DNA sequence of each gene within a gene cluster are found to be identical; however, the resulting protein of each gene is distinctive from the resulting protein of another gene within the cluster. Genes found in a gene cluster may be observed near one another on the same chromosome or on different, but homologous chromosomes. An example of a gene cluster is the Hox gene, which is made up of eight genes and is part of the Homeobox gene family.

<span class="mw-page-title-main">Gene redundancy</span>

Gene redundancy is the existence of multiple genes in the genome of an organism that perform the same function. Gene redundancy can result from gene duplication. Such duplication events are responsible for many sets of paralogous genes. When an individual gene in such a set is disrupted by mutation or targeted knockout, there can be little effect on phenotype as a result of gene redundancy, whereas the effect is large for the knockout of a gene with only one copy. Gene knockout is a method utilized in some studies aiming to characterize the maintenance and fitness effects functional overlap.

<span class="mw-page-title-main">Applications of evolution</span> Practical application of biological evolution

Evolutionary biology, in particular the understanding of how organisms evolve through natural selection, is an area of science with many practical applications. Creationists often claim that the theory of evolution lacks any practical applications; however, this claim has been refuted by scientists.

<span class="mw-page-title-main">Subfunctionalization</span>

Subfunctionalization was proposed by Stoltzfus (1999) and Force et al. (1999) as one of the possible outcomes of functional divergence that occurs after a gene duplication event, in which pairs of genes that originate from duplication, or paralogs, take on separate functions. Subfunctionalization is a neutral mutation process of constructive neutral evolution; meaning that no new adaptations are formed. During the process of gene duplication paralogs simply undergo a division of labor by retaining different parts (subfunctions) of their original ancestral function. This partitioning event occurs because of segmental gene silencing leading to the formation of paralogs that are no longer duplicates, because each gene only retains a single function. It is important to note that the ancestral gene was capable of performing both functions and the descendant duplicate genes can now only perform one of the original ancestral functions.

A conserved non-coding sequence (CNS) is a DNA sequence of noncoding DNA that is evolutionarily conserved. These sequences are of interest for their potential to regulate gene production.

Bovine seminal RNase (BS-RNase) is a member of the ribonuclease superfamily produced by the bovine seminal vesicles. This enzyme can not be differentiated from its members distinctly since there are more features that this enzyme shares with its family members than features that it possess alone. The research on the question of how new functions arrive in proteins in evolution led the scientists to find an uncommon consequence for a usual biological event called gene conversion in the case of the ribonuclease (RNase) protein family. The most well-known member of this family, RNase A, is expressed in the pancreas of oxen. It serves to digest RNA in intestine, and evolved from bacteria fermenting in the stomach of the first ox. The homologous RNase, called seminal RNase, differs from RNase A by 23 amino acids and is expressed in seminal plasma in the concentration of 1-1.5 mg/ml, which constitutes more than 3% of the fluid protein content. Bovine seminal ribonuclease (BS-RNase) is a homologue of RNase A with specific antitumor activity.

<span class="mw-page-title-main">Neofunctionalization</span>

Neofunctionalization, one of the possible outcomes of functional divergence, occurs when one gene copy, or paralog, takes on a totally new function after a gene duplication event. Neofunctionalization is an adaptive mutation process; meaning one of the gene copies must mutate to develop a function that was not present in the ancestral gene. In other words, one of the duplicates retains its original function, while the other accumulates molecular changes such that, in time, it can perform a different task.

Enzyme promiscuity is the ability of an enzyme to catalyse a fortuitous side reaction in addition to its main reaction. Although enzymes are remarkably specific catalysts, they can often perform side reactions in addition to their main, native catalytic activity. These promiscuous activities are usually slow relative to the main activity and are under neutral selection. Despite ordinarily being physiologically irrelevant, under new selective pressures these activities may confer a fitness benefit therefore prompting the evolution of the formerly promiscuous activity to become the new main activity. An example of this is the atrazine chlorohydrolase from Pseudomonas sp. ADP that evolved from melamine deaminase, which has very small promiscuous activity toward atrazine, a man-made chemical.

<span class="mw-page-title-main">Epistasis</span> Dependence of a gene mutations phenotype on mutations in other genes

Epistasis is a phenomenon in genetics in which the effect of a gene mutation is dependent on the presence or absence of mutations in one or more other genes, respectively termed modifier genes. In other words, the effect of the mutation is dependent on the genetic background in which it appears. Epistatic mutations therefore have different effects on their own than when they occur together. Originally, the term epistasis specifically meant that the effect of a gene variant is masked by that of a different gene.

Constructive neutral evolution(CNE) is a theory that seeks to explain how complex systems can evolve through neutral transitions and spread through a population by chance fixation (genetic drift). Constructive neutral evolution is a competitor for both adaptationist explanations for the emergence of complex traits and hypotheses positing that a complex trait emerged as a response to a deleterious development in an organism. Constructive neutral evolution often leads to irreversible or "irremediable" complexity and produces systems which, instead of being finely adapted for performing a task, represent an excess complexity that has been described with terms such as "runaway bureaucracy" or even a "Rube Goldberg machine".

This glossary of genetics is a list of definitions of terms and concepts commonly used in the study of genetics and related disciplines in biology, including molecular biology, cell biology, and evolutionary biology. It is split across two articles:

References

  1. 1 2 Susumu Ohno (1970). Evolution by gene duplication. Springer-Verlag. ISBN   0-04-575015-7.
  2. 1 2 Andersson DI, Jerlström-Hultqvist J, Näsvall J. Evolution of new functions de novo and from preexisting genes. Cold Spring Harbor Perspectives in Biology. 2015 Jun 1;7(6):a017996.
  3. Lynch, M; et al. (2000). "The evolutionary fate and consequences of duplicate genes". Science. 290 (5494): 1151–2254. Bibcode:2000Sci...290.1151L. doi:10.1126/science.290.5494.1151. PMID   11073452.
  4. 1 2 3 Force, A.; et al. (1999). "Preservation of duplicate genes by complementary, degenerative mutations". Genetics. 151 (4): 1531–1545. doi:10.1093/genetics/151.4.1531. PMC   1460548 . PMID   10101175.
  5. 1 2 Bergthorsson U, Andersson DI, Roth JR (2007). "Ohno's dilemma: Evolution of new genes under continuous selection". PNAS. 104 (43): 17004–17009. Bibcode:2007PNAS..10417004B. doi: 10.1073/pnas.0707158104 . PMC   2040452 . PMID   17942681.
  6. Grauer, Dan; Li, Wen-Hsuing (2000). Fundamentals of molecular evolution . Sunderland, MA: Sinauer. pp.  282–283. ISBN   0-87893-266-6.
  7. Bergthorsson U, Andersson DI, Roth JR. Ohno's dilemma: evolution of new genes under continuous selection. Proceedings of the National Academy of Sciences. 2007 Oct 23;104(43):17004-9.
  8. Hittinger CT, Carroll SB (2007). "Gene duplication and the adaptive evolution of a classic genetic switch". Nature. 449 (7163): 677–81. Bibcode:2007Natur.449..677H. doi:10.1038/nature06151. PMID   17928853. S2CID   4418250.