Genome evolution

Last updated

Genome evolution is the process by which a genome changes in structure (sequence) or size over time. The study of genome evolution involves multiple fields such as structural analysis of the genome, the study of genomic parasites, gene and ancient genome duplications, polyploidy, and comparative genomics. Genome evolution is a constantly changing and evolving field due to the steadily growing number of sequenced genomes, both prokaryotic and eukaryotic, available to the scientific community and the public at large.

Contents

Circular representation of the Mycobacterium leprae genome created using JCVI online genome tools. Mycobacterium leprae circular genome.png
Circular representation of the Mycobacterium leprae genome created using JCVI online genome tools.

History

Since the first sequenced genomes became available in the late 1970s, [1] scientists have been using comparative genomics to study the differences and similarities between various genomes. Genome sequencing has progressed over time to include more and more complex genomes including the eventual sequencing of the entire human genome in 2001. [2] By comparing genomes of both close relatives and distant ancestors the stark differences and similarities between species began to emerge as well as the mechanisms by which genomes are able to evolve over time.[ citation needed ]

Prokaryotic and eukaryotic genomes

Prokaryotes

The principal forces of evolution in prokaryotes and their effects on archaeal and bacterial genomes. The horizontal line shows archaeal and bacterial genome size on a logarithmic scale (in megabase pairs) and the approximate corresponding number of genes (in parentheses).The effects of the main forces of prokaryotic genome evolution are denoted by triangles that are positioned, roughly, over the ranges of genome size for which the corresponding effects are thought to be most pronounced. Forces of Prokaryote evolution.png
The principal forces of evolution in prokaryotes and their effects on archaeal and bacterial genomes. The horizontal line shows archaeal and bacterial genome size on a logarithmic scale (in megabase pairs) and the approximate corresponding number of genes (in parentheses).The effects of the main forces of prokaryotic genome evolution are denoted by triangles that are positioned, roughly, over the ranges of genome size for which the corresponding effects are thought to be most pronounced.

Prokaryotic genomes have two main mechanisms of evolution: mutation and horizontal gene transfer. [3] A third mechanism, sexual reproduction, is prominent in eukaryotes and also occurs in bacteria. Prokaryotes can acquire novel genetic material through the process of bacterial conjugation in which both plasmids and whole chromosomes can be passed between organisms. An often cited example of this process is the transfer of antibiotic resistance utilizing plasmid DNA. [4] Another mechanism of genome evolution is provided by transduction whereby bacteriophages introduce new DNA into a bacterial genome. The main mechanism of sexual interaction is natural genetic transformation which involves the transfer of DNA from one prokaryotic cell to another though the intervening medium. Transformation is a common mode of DNA transfer and at least 67 prokaryotic species are known to be competent for transformation. [5]

Genome evolution in bacteria is well understood because of the thousands of completely sequenced bacterial genomes available. Genetic changes may lead to both increases or decreases of genomic complexity due to adaptive genome streamlining and purifying selection. [6] In general, free-living bacteria have evolved larger genomes with more genes so they can adapt more easily to changing environmental conditions. By contrast, most parasitic bacteria have reduced genomes as their hosts supply many if not most nutrients, so that their genome does not need to encode for enzymes that produce these nutrients themselves. [7] [ page needed ]

CharacteristicE.coli genomeHuman genome
Genome Size (base pairs)4.6 Mb3.2 Gb
Genome StructureCircularLinear
Number of chromosomes 146
Presence of Plasmids YesNo
Presence of Histones NoYes
DNA segregated in the nucleus NoYes
Number of genes 4,28820,000
Presence of Introns No*Yes
Average Gene Size700 bp27,000 bp
* E.coli largely contains only exons in genes. However, it does contain a small amount of self-splicing introns (Group II). [8]

Eukaryotes

Eukaryotic genomes are generally larger than that of the prokaryotes. While the E. coli genome is roughly 4.6Mb in length, [9] in comparison the Human genome is much larger with a size of approximately 3.2Gb. [10] The eukaryotic genome is linear and can be composed of multiple chromosomes, packaged in the nucleus of the cell. The non-coding portions of the gene, known as introns, which are largely not present in prokaryotes, are removed by RNA splicing before translation of the protein can occur. Eukaryotic genomes evolve over time through many mechanisms including sexual reproduction which introduces much greater genetic diversity to the offspring than the usual prokaryotic process of replication in which the offspring are theoretically genetic clones of the parental cell.[ citation needed ]

Genome size

Genome size is usually measured in base pairs (or bases in single-stranded DNA or RNA). The C-value is another measure of genome size. Research on prokaryotic genomes shows that there is a significant positive correlation between the C-value of prokaryotes and the amount of genes that compose the genome. [11] This indicates that gene number is the main factor influencing the size of the prokaryotic genome. In eukaryotic organisms, there is a paradox observed, namely that the number of genes that make up the genome does not correlate with genome size. In other words, the genome size is much larger than would be expected given the total number of protein coding genes. [12]

Genome size can increase by duplication, insertion, or polyploidization. Recombination can lead to both DNA loss or gain. Genomes can also shrink because of deletions. A famous example for such gene decay is the genome of Mycobacterium leprae , the causative agent of leprosy. M. leprae has lost many once-functional genes over time due to the formation of pseudogenes. [13] This is evident in looking at its closest ancestor Mycobacterium tuberculosis. [14] M. leprae lives and replicates inside of a host and due to this arrangement it does not have a need for many of the genes it once carried which allowed it to live and prosper outside the host. Thus over time these genes have lost their function through mechanisms such as mutation causing them to become pseudogenes. It is beneficial to an organism to rid itself of non-essential genes because it makes replicating its DNA much faster and requires less energy. [15]

An example of increasing genome size over time is seen in filamentous plant pathogens. These plant pathogen genomes have been growing larger over the years due to repeat-driven expansion. The repeat-rich regions contain genes coding for host interaction proteins. With the addition of more and more repeats to these regions the plants increase the possibility of developing new virulence factors through mutation and other forms of genetic recombination. In this way it is beneficial for these plant pathogens to have larger genomes. [16]

Chromosomal evolution

Chromosome fusion, leading to a reduced number of chromosomes (here a fused human chromosome 2, with 2 separate chromosomes still present in chimpanzees and other apes). Chromosome 2 merge en.svg
Chromosome fusion, leading to a reduced number of chromosomes (here a fused human chromosome 2, with 2 separate chromosomes still present in chimpanzees and other apes).

The evolution of genomes can be impressively shown by the change of chromosome number and structure over time. For instance, the ancestral chromosomes corresponding to chimpanzee chromosomes 2A and 2B fused to produce human chromosome 2. Similarly, the chromosomes of more distantly related species show chromosomes that have been broken up into more parts over the course of evolution. This can be demonstrated by Fluorescence in situ hybridization. [17]

Mechanisms

Gene duplication

Gene duplication is the process by which a region of DNA coding for a gene is duplicated. This can occur as the result of an error in recombination or through a retrotransposition event. Duplicate genes are often immune to the selective pressure under which genes normally exist. As a result, a large number of mutations may accumulate in the duplicate gene code. This may render the gene non-functional or in some cases confer some benefit to the organism. [18] [19]

Whole genome duplication

Similar to gene duplication, whole genome duplication is the process by which an organism's entire genetic information is copied, once or multiple times which is known as polyploidy. [20] This may provide an evolutionary benefit to the organism by supplying it with multiple copies of a gene thus creating a greater possibility of functional and selectively favored genes. However, tests for enhanced rate and innovation in teleost fishes with duplicated genomes compared with their close relative holostean fishes (without duplicated genomes) found that there was little difference between them for the first 150 million years of their evolution. [21]

In 1997, Wolfe & Shields gave evidence for an ancient duplication of the Saccharomyces cerevisiae (Yeast) genome. [22] It was initially noted that this yeast genome contained many individual gene duplications. Wolfe & Shields hypothesized that this was actually the result of an entire genome duplication in the yeast's distant evolutionary history. They found 32 pairs of homologous chromosomal regions, accounting for over half of the yeast's genome. They also noted that although homologs were present, they were often located on different chromosomes. Based on these observations, they determined that Saccharomyces cerevisiae underwent a whole genome duplication soon after its evolutionary split from Kluyveromyces, a genus of ascomycetous yeasts. Over time, many of the duplicate genes were deleted and rendered non-functional. A number of chromosomal rearrangements broke the original duplicate chromosomes into the current manifestation of homologous chromosomal regions. This idea was further solidified in looking at the genome of yeast's close relative Ashbya gossypii. [23] Whole genome duplication is common in fungi as well as plant species. An example of extreme genome duplication is represented by the Common Cordgrass (Spartina anglica) which is a dodecaploid, meaning that it contains 12 sets of chromosomes, [24] in stark contrast to the human diploid structure in which each individual has only two sets of 23 chromosomes.

Transposable elements

Transposable elements are regions of DNA that can be inserted into the genetic code through one of two mechanisms. These mechanisms work similarly to "cut-and-paste" and "copy-and-paste" functionalities in word processing programs. The "cut-and-paste" mechanism works by excising DNA from one place in the genome and inserting itself into another location in the code. The "copy-and-paste" mechanism works by making a genetic copy or copies of a specific region of DNA and inserting these copies elsewhere in the code. [25] [26] The most common transposable element in the human genome is the Alu sequence, which is present in the genome over one million times. [27]

Mutation

Spontaneous mutations often occur which can cause various changes in the genome. [28] Mutations can either change the identity of one or more nucleotides, or result in the addition or deletion of one or more nucleotide bases. Such changes can lead to a frameshift mutation, causing the entire code to be read in a different order from the original, often resulting in a protein becoming non-functional. [29] A mutation in a promoter region, enhancer region or transcription factor binding region can also result in either a loss of function, or an up or downregulation in the transcription of the gene targeted by these regulatory elements. Mutations are constantly occurring in an organism's genome and can cause either a negative effect, positive effect or neutral effect (no effect at all). [30] [31]

Pseudogenes

The proS loci in Mycobacterium leprae and M. tuberculosis, showing 3 pseudogenes (indicated by crosses) in M. leprae that still represent functional genes in M. tuberculosis. Homologous genes are indicated by identical colors and vertical, hatched bars. Modified after Cole et al. 2001. Pseudogenes Mycobacteria.png
The proS loci in Mycobacterium leprae and M. tuberculosis, showing 3 pseudogenes (indicated by crosses) in M. leprae that still represent functional genes in M. tuberculosis. Homologous genes are indicated by identical colors and vertical, hatched bars. Modified after Cole et al. 2001.

Often a result of spontaneous mutation, pseudogenes are dysfunctional genes derived from previously functional gene relatives. There are many mechanisms by which a functional gene can become a pseudogene including the deletion or insertion of one or multiple nucleotides. This can result in a shift of reading frame, causing the gene to no longer code for the expected protein, introduce a premature stop codon or a mutation in the promoter region. [32]

Often cited examples of pseudogenes within the human genome include the once functional olfactory gene families. Over time, many olfactory genes in the human genome became pseudogenes and were no longer able to produce functional proteins, explaining the poor sense of smell humans possess in comparison to their mammalian relatives. [33] [34]

Similarly, bacterial pseudogenes commonly arise from adaptation of free-living bacteria to parasitic lifestyles, so that many metabolic genes become superfluous as these species become adapted to their host. Once a parasite obtains nutrients (such as amino acids or vitamins) from its host it has no need to produce these nutrients itself and often loses the genes to make them.[ citation needed ]

Exon shuffling

Exon shuffling is a mechanism by which new genes are created. This can occur when two or more exons from different genes are combined or when exons are duplicated. Exon shuffling results in new genes by altering the current intron-exon structure. This can occur by any of the following processes: transposon mediated shuffling, sexual recombination or non-homologous recombination (also called illegitimate recombination). Exon shuffling may introduce new genes into the genome that can be either selected against and deleted or selectively favored and conserved. [35] [36] [37]

Genome reduction and gene loss

Many species exhibit genome reduction when subsets of their genes are not needed anymore. This typically happens when organisms adapt to a parasitic life style, e.g. when their nutrients are supplied by a host. As a consequence, they lose the genes needed to produce these nutrients. In many cases, there are both free living and parasitic species that can be compared and their lost genes identified. Good examples are the genomes of Mycobacterium tuberculosis and Mycobacterium leprae , the latter of which has a dramatically reduced genome (see figure under pseudogenes above).

Another beautiful example are endosymbiont species. For instance, Polynucleobacter necessarius was first described as a cytoplasmic endosymbiont of the ciliate Euplotes aediculatus . The latter species dies soon after being cured of the endosymbiont. In the few cases in which P. necessarius is not present, a different and rarer bacterium apparently supplies the same function. No attempt to grow symbiotic P. necessarius outside their hosts has yet been successful, strongly suggesting that the relationship is obligate for both partners. Yet, closely related free-living relatives of P. necessarius have been identified. The endosymbionts have a significantly reduced genome when compared to their free-living relatives (1.56 Mbp vs. 2.16 Mbp). [38]

Speciation

Cichlids such as Tropheops tropheops from Lake Malawi provide models for genome evolution. Melanochromis auratus femelle dominante.jpg
Cichlids such as Tropheops tropheops from Lake Malawi provide models for genome evolution.

A major question of evolutionary biology is how genomes change to create new species. Speciation requires changes in behavior, morphology, physiology, or metabolism (or combinations thereof). The evolution of genomes during speciation has been studied only very recently with the availability of next-generation sequencing technologies. For instance, cichlid fish in African lakes differ both morphologically and in their behavior. The genomes of 5 species have revealed that both the sequences but also the expression pattern of many genes has quickly changed over a relatively short period of time (100,000 to several million years). Notably, 20% of duplicate gene pairs have gained a completely new tissue-specific expression pattern, indicating that these genes also obtained new functions. Given that gene expression is driven by short regulatory sequences, this demonstrates that relatively few mutations are required to drive speciation. The cichlid genomes also showed increased evolutionary rates in microRNAs which are involved in gene expression. [39] [40]

Gene expression

Mutations can lead to changed gene function or, probably more often, to changed gene expression patterns. In fact, a study on 12 animal species provided strong evidence that tissue-specific gene expression was largely conserved between orthologs in different species. However, paralogs within the same species often have a different expression pattern. That is, after duplication of genes they often change their expression pattern, for instance by getting expressed in another tissue and thereby adopting new roles. [41]

Composition of nucleotides (GC content)

The genetic code is made up of sequences of four nucleotide bases: Adenine, Guanine, Cytosine and Thymine, commonly referred to as A, G, C, and T. The GC-content is the percentage of G & C bases within a genome. GC-content varies greatly between different organisms. [42] Gene coding regions have been shown to have a higher GC-content and the longer the gene is, the greater the percentage of G and C bases that are present. A higher GC-content confers a benefit because a Guanine-Cytosine bond is made up of three hydrogen bonds while an Adenine-Thymine bond is made up of only two. Thus the three hydrogen bonds give greater stability to the DNA strand. So, it is not surprising that important genes often have a higher GC-content than other parts of an organism's genome. [43] For this reason, many species living at very high temperatures such as the ecosystems surrounding hydrothermal vents, have a very high GC-content. High GC-content is also seen in regulatory sequences such as promoters which signal the start of a gene. Many promoters contain CpG islands, areas of the genome where a cytosine nucleotide occurs next to a guanine nucleotide at a greater proportion. It has also been shown that a broad distribution of GC-content between species within a genus shows a more ancient ancestry. Since the species have had more time to evolve, their GC-content has diverged further apart.[ citation needed ]

Evolving translation of genetic code

Amino acids are made up of three base long codons and both Glycine and Alanine are characterized by codons with Guanine-Cytosine bonds at the first two codon base positions. This GC bond gives more stability to the DNA structure. It has been hypothesized that as the first organisms evolved in a high-heat and pressure environment they needed the stability of these GC bonds in their genetic code. [44]

De novo origin of genes

Novel genes can arise from non-coding DNA. De novo origin of (protein-coding) genes only requires two features, namely the generation of an open reading frame, and the creation of a transcription factor binding site. For instance, Levine and colleagues reported the origin of five new genes in the D. melanogaster genome from noncoding DNA. [45] [46] Subsequently, de novo origin of genes has been also shown in other organisms such as yeast, [47] rice [48] and humans. [49] For instance, Wu et al. (2011) reported 60 putative de novo human-specific genes all of which are short consisting of a single exon (except one). [50] In bacteria, 'grounded' prophages (i.e. integrated phage that cannot produce new phage) are buffer zones which would tolerate variations thereby increasing the probability of de novo gene formation. [51] These grounded prophages and other such genetic elements are sites where genes could be acquired through horizontal gene transfer (HGT).

Origin of life and the first genomes

In order to understand how the genome arose, knowledge is required of the chemical pathways that permit formation of the key building blocks of the genome under plausible prebiotic conditions. According to the RNA world hypothesis free-floating ribonucleotides were present in the primitive soup. These were the fundamental molecules that combined in series to form the original RNA genome. Molecules as complex as RNA must have arisen from small molecules whose reactivity was governed by physico-chemical processes. RNA is composed of purine and pyrimidine nucleotides, both of which are necessary for reliable information transfer, and thus Darwinian natural selection and evolution. Nam et al. [52] demonstrated the direct condensation of nucleobases with ribose to give ribonucleosides in aqueous microdroplets, a key step leading to formation of the RNA genome. Also, a plausible prebiotic process for synthesizing pyrimidine and purine ribonucleotides leading to genome formation using wet-dry cycles was presented by Becker et al. [53]

See also

Related Research Articles

<span class="mw-page-title-main">Genome</span> All genetic material of an organism

In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA. The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as regulatory sequences, and often a substantial fraction of junk DNA with no evident function. Almost all eukaryotes have mitochondria and a small mitochondrial genome. Algae and plants also contain chloroplasts with a chloroplast genome.

An intron is any nucleotide sequence within a gene that is not expressed or operative in the final RNA product. The word intron is derived from the term intragenic region, i.e., a region inside a gene. The term intron refers to both the DNA sequence within a gene and the corresponding RNA sequence in RNA transcripts. The non-intron sequences that become joined by this RNA processing to form the mature RNA are called exons.

Microevolution is the change in allele frequencies that occurs over time within a population. This change is due to four different processes: mutation, selection, gene flow and genetic drift. This change happens over a relatively short amount of time compared to the changes termed macroevolution.

<span class="mw-page-title-main">Mutation</span> Alteration in the nucleotide sequence of a genome

In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, mitosis, or meiosis or other types of damage to DNA, which then may undergo error-prone repair, cause an error during other forms of repair, or cause an error during replication. Mutations may also result from insertion or deletion of segments of DNA due to mobile genetic elements.

<span class="mw-page-title-main">Transposable element</span> Semiparasitic DNA sequence

A transposable element is a nucleic acid sequence in DNA that can change its position within a genome, sometimes creating or reversing mutations and altering the cell's genetic identity and genome size. Transposition often results in duplication of the same genetic material. In the human genome, L1 and Alu elements are two examples. Barbara McClintock's discovery of them earned her a Nobel Prize in 1983. Its importance in personalized medicine is becoming increasingly relevant, as well as gaining more attention in data analytics given the difficulty of analysis in very high dimensional spaces.

<span class="mw-page-title-main">Human genome</span> Complete set of nucleic acid sequences for humans

The human genome is a complete set of nucleic acid sequences for humans, encoded as DNA within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria. These are usually treated separately as the nuclear genome and the mitochondrial genome. Human genomes include both protein-coding DNA sequences and various types of DNA that does not encode proteins. The latter is a diverse category that includes DNA coding for non-translated RNA, such as that for ribosomal RNA, transfer RNA, ribozymes, small nuclear RNAs, and several types of regulatory RNAs. It also includes promoters and their associated gene-regulatory elements, DNA playing structural and replicatory roles, such as scaffolding regions, telomeres, centromeres, and origins of replication, plus large numbers of transposable elements, inserted viral DNA, non-functional pseudogenes and simple, highly repetitive sequences. Introns make up a large percentage of non-coding DNA. Some of this non-coding DNA is non-functional junk DNA, such as pseudogenes, but there is no firm consensus on the total amount of junk DNA.

Non-coding DNA (ncDNA) sequences are components of an organism's DNA that do not encode protein sequences. Some non-coding DNA is transcribed into functional non-coding RNA molecules. Other functional regions of the non-coding DNA fraction include regulatory sequences that control gene expression; scaffold attachment regions; origins of DNA replication; centromeres; and telomeres. Some non-coding regions appear to be mostly nonfunctional, such as introns, pseudogenes, intergenic DNA, and fragments of transposons and viruses. Regions that are completely nonfunctional are called junk DNA.

Molecular evolution is the process of change in the sequence composition of cellular molecules such as DNA, RNA, and proteins across generations. The field of molecular evolution uses principles of evolutionary biology and population genetics to explain patterns in these changes. Major topics in molecular evolution concern the rates and impacts of single nucleotide changes, neutral evolution vs. natural selection, origins of new genes, the genetic nature of complex traits, the genetic basis of speciation, the evolution of development, and ways that evolutionary forces influence genomic and phenotypic changes.

The coding region of a gene, also known as the coding sequence (CDS), is the portion of a gene's DNA or RNA that codes for a protein. Studying the length, composition, regulation, splicing, structures, and functions of coding regions compared to non-coding regions over different species and time periods can provide a significant amount of important information regarding gene organization and evolution of prokaryotes and eukaryotes. This can further assist in mapping the human genome and developing gene therapy.

<span class="mw-page-title-main">Pseudogene</span> Functionless relative of a gene

Pseudogenes are nonfunctional segments of DNA that resemble functional genes. Most arise as superfluous copies of functional genes, either directly by gene duplication or indirectly by reverse transcription of an mRNA transcript. Pseudogenes are usually identified when genome sequence analysis finds gene-like sequences that lack regulatory sequences needed for transcription or translation, or whose coding sequences are obviously defective due to frameshifts or premature stop codons. Pseudogenes are a type of junk DNA.

Gene duplication is a major mechanism through which new genetic material is generated during molecular evolution. It can be defined as any duplication of a region of DNA that contains a gene. Gene duplications can arise as products of several types of errors in DNA replication and repair machinery as well as through fortuitous capture by selfish genetic elements. Common sources of gene duplications include ectopic recombination, retrotransposition event, aneuploidy, polyploidy, and replication slippage.

<span class="mw-page-title-main">Gene family</span> Set of several similar genes

A gene family is a set of several similar genes, formed by duplication of a single original gene, and generally with similar biochemical functions. One such family are the genes for human hemoglobin subunits; the ten genes are in two clusters on different chromosomes, called the α-globin and β-globin loci. These two gene clusters are thought to have arisen as a result of a precursor gene being duplicated approximately 500 million years ago.

In computational biology, gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. This includes protein-coding genes as well as RNA genes, but may also include prediction of other functional elements such as regulatory regions. Gene finding is one of the first and most important steps in understanding the genome of a species once it has been sequenced.

Gene conversion is the process by which one DNA sequence replaces a homologous sequence such that the sequences become identical after the conversion event. Gene conversion can be either allelic, meaning that one allele of the same gene replaces another allele, or ectopic, meaning that one paralogous DNA sequence converts another.

<span class="mw-page-title-main">Gene</span> Sequence of DNA or RNA that codes for an RNA or protein product

In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA, that is transcribed to produce a functional RNA. There are two types of molecular genes: protein-coding genes and non-coding genes.

<span class="mw-page-title-main">Gene cluster</span>

A gene family is a set of homologous genes within one organism. A gene cluster is a group of two or more genes found within an organism's DNA that encode similar polypeptides, or proteins, which collectively share a generalized function and are often located within a few thousand base pairs of each other. The size of gene clusters can vary significantly, from a few genes to several hundred genes. Portions of the DNA sequence of each gene within a gene cluster are found to be identical; however, the resulting protein of each gene is distinctive from the resulting protein of another gene within the cluster. Genes found in a gene cluster may be observed near one another on the same chromosome or on different, but homologous chromosomes. An example of a gene cluster is the Hox gene, which is made up of eight genes and is part of the Homeobox gene family.

Nuclear mitochondrial DNA (NUMT) segments or genetic loci describe a transposition of any type of cytoplasmic mitochondrial DNA into the nuclear genome of eukaryotic organisms.

<span class="mw-page-title-main">Gene redundancy</span>

Gene redundancy is the existence of multiple genes in the genome of an organism that perform the same function. Gene redundancy can result from gene duplication. Such duplication events are responsible for many sets of paralogous genes. When an individual gene in such a set is disrupted by mutation or targeted knockout, there can be little effect on phenotype as a result of gene redundancy, whereas the effect is large for the knockout of a gene with only one copy. Gene knockout is a method utilized in some studies aiming to characterize the maintenance and fitness effects functional overlap.

DNA transposons are DNA sequences, sometimes referred to "jumping genes", that can move and integrate to different locations within the genome. They are class II transposable elements (TEs) that move through a DNA intermediate, as opposed to class I TEs, retrotransposons, that move through an RNA intermediate. DNA transposons can move in the DNA of an organism via a single-or double-stranded DNA intermediate. DNA transposons have been found in both prokaryotic and eukaryotic organisms. They can make up a significant portion of an organism's genome, particularly in eukaryotes. In prokaryotes, TE's can facilitate the horizontal transfer of antibiotic resistance or other genes associated with virulence. After replicating and propagating in a host, all transposon copies become inactivated and are lost unless the transposon passes to a genome by starting a new life cycle with horizontal transfer. It is important to note that DNA transposons do not randomly insert themselves into the genome, but rather show preference for specific sites.

<span class="mw-page-title-main">Illegitimate recombination</span>

Illegitimate recombination, or nonhomologous recombination, is the process by which two unrelated double stranded segments of DNA are joined. This insertion of genetic material which is not meant to be adjacent tends to lead to genes being broken causing the protein which they encode to not be properly expressed. One of the primary pathways by which this will occur is the repair mechanism known as non-homologous end joining (NHEJ).

References

  1. Fiers W, Contreras R, Duerinck F, Haegeman G, Iserentant D, Merregaert J, et al. (April 1976). "Complete nucleotide sequence of bacteriophage MS2 RNA: primary and secondary structure of the replicase gene". Nature. 260 (5551): 500–7. Bibcode:1976Natur.260..500F. doi:10.1038/260500a0. PMID   1264203. S2CID   4289674.
  2. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, et al. (February 2001). "The sequence of the human genome". Science. 291 (5507): 1304–51. Bibcode:2001Sci...291.1304V. doi: 10.1126/science.1058040 . PMID   11181995.
  3. Toussaint A, Chandler M (2012). "Prokaryote Genome Fluidity: Toward a System Approach of the Mobilome". Bacterial Molecular Networks. Methods in Molecular Biology. Vol. 804. pp. 57–80. doi:10.1007/978-1-61779-361-5_4. ISBN   978-1-61779-360-8. PMID   22144148.
  4. Ruiz J, Pons MJ, Gomes C (September 2012). "Transferable mechanisms of quinolone resistance". International Journal of Antimicrobial Agents. 40 (3): 196–203. doi:10.1016/j.ijantimicag.2012.02.011. PMID   22831841.
  5. Johnsborg O, Eldholm V, Håvarstein LS (December 2007). "Natural genetic transformation: prevalence, mechanisms and function". Research in Microbiology. 158 (10): 767–78. doi: 10.1016/j.resmic.2007.09.004 . PMID   17997281.
  6. Koonin EV, Wolf YI (December 2008). "Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world". Nucleic Acids Research. 36 (21): 6688–719. doi:10.1093/nar/gkn668. PMC   2588523 . PMID   18948295.
  7. Tortora GJ (2015). Microbiology: An Introduction. Pearson. ISBN   978-0321929150.
  8. Dai L, Zimmerly S (October 2002). "The dispersal of five group II introns among natural populations of Escherichia coli". RNA. 8 (10): 1294–307. doi:10.1017/S1355838202023014. PMC   1370338 . PMID   12403467.
  9. Blattner FR, Plunkett G, Bloch CA, Perna NT, Burland V, Riley M, et al. (September 1997). "The complete genome sequence of Escherichia coli K-12". Science. 277 (5331): 1453–62. doi: 10.1126/science.277.5331.1453 . PMID   9278503.
  10. International Human Genome Sequencing Consortium (October 2004). "Finishing the euchromatic sequence of the human genome". Nature. 431 (7011): 931–45. Bibcode:2004Natur.431..931H. doi: 10.1038/nature03001 . PMID   15496913.
  11. Gregory TR (February 2001). "Coincidence, coevolution, or causation? DNA content, cell size, and the C-value enigma". Biological Reviews of the Cambridge Philosophical Society. 76 (1): 65–101. doi: 10.1111/j.1469-185X.2000.tb00059.x . PMID   11325054.
  12. Gregory TR (January 2002). "A bird's-eye view of the C-value enigma: genome size, cell size, and metabolic rate in the class aves". Evolution; International Journal of Organic Evolution. 56 (1): 121–30. doi: 10.1111/j.0014-3820.2002.tb00854.x . PMID   11913657.
  13. Singh P, Cole ST (January 2011). "Mycobacterium leprae: genes, pseudogenes and genetic diversity". Future Microbiology. 6 (1): 57–71. doi:10.2217/fmb.10.153. PMC   3076554 . PMID   21162636.
  14. Eiglmeier K, Parkhill J, Honoré N, Garnier T, Tekaia F, Telenti A, et al. (December 2001). "The decaying genome of Mycobacterium leprae". Leprosy Review. 72 (4): 387–98. doi: 10.5935/0305-7518.20010054 . PMID   11826475.
  15. Rosengarten R, Citti C, Glew M, Lischewski A, Droesse M, Much P, et al. (March 2000). "Host-pathogen interactions in mycoplasma pathogenesis: virulence and survival strategies of minimalist prokaryotes". International Journal of Medical Microbiology. 290 (1): 15–25. doi:10.1016/S1438-4221(00)80099-5. PMID   11043978.
  16. Raffaele S, Kamoun S (May 2012). "Genome evolution in filamentous plant pathogens: why bigger can be better". Nature Reviews. Microbiology. 10 (6): 417–30. doi:10.1038/nrmicro2790. PMID   22565130. S2CID   6169712.
  17. Ferguson-Smith MA, Pereira JC, Borges A, Kasai F (October 2022). "Observations on chromosome-specific sequencing for the construction of cross-species chromosome homology maps and its resolution of human:alpaca homology". Molecular Cytogenetics. 15 (1): 44. doi: 10.1186/s13039-022-00622-0 . PMC   9547437 . PMID   36207754.
  18. Zhang J (2003). "Evolution by gene duplication: an update". Trends in Ecology & Evolution. 18 (6): 292–298. doi:10.1016/S0169-5347(03)00033-8.
  19. Taylor JS, Raes J (2004). "Duplication and divergence: the evolution of new genes and old ideas". Annual Review of Genetics. 38: 615–43. doi:10.1146/annurev.genet.38.072902.092831. PMID   15568988.
  20. Song C, Liu S, Xiao J, He W, Zhou Y, Qin Q, Zhang C, Liu Y (April 2012). "Polyploid organisms". Science China Life Sciences. 55 (4): 301–11. doi: 10.1007/s11427-012-4310-2 . PMID   22566086.
  21. Clarke JT, Lloyd GT, Friedman M (October 2016). "Little evidence for enhanced phenotypic evolution in early teleosts relative to their living fossil sister group". Proceedings of the National Academy of Sciences of the United States of America. 113 (41): 11531–11536. Bibcode:2016PNAS..11311531C. doi: 10.1073/pnas.1607237113 . PMC   5068283 . PMID   27671652.
  22. Wolfe KH, Shields DC (June 1997). "Molecular evidence for an ancient duplication of the entire yeast genome". Nature. 387 (6634): 708–13. Bibcode:1997Natur.387..708W. doi: 10.1038/42711 . PMID   9192896.
  23. Dietrich FS, Voegeli S, Brachat S, Lerch A, Gates K, Steiner S, Mohr C, Pöhlmann R, Luedi P, Choi S, Wing RA, Flavier A, Gaffney TD, Philippsen P (April 2004). "The Ashbya gossypii genome as a tool for mapping the ancient Saccharomyces cerevisiae genome". Science. 304 (5668): 304–7. Bibcode:2004Sci...304..304D. doi:10.1126/science.1095781. PMID   15001715. S2CID   26130646.
  24. Buggs RJ (November 2012). "Monkeying around with ploidy". Molecular Ecology. 21 (21): 5159–61. Bibcode:2012MolEc..21.5159B. doi:10.1111/mec.12005. PMID   23075066. S2CID   5799005.
  25. Wicker T, Sabot F, Hua-Van A, Bennetzen JL, Capy P, Chalhoub B, Flavell A, Leroy P, Morgante M, Panaud O, Paux E, SanMiguel P, Schulman AH (December 2007). "A unified classification system for eukaryotic transposable elements". Nature Reviews Genetics. 8 (12): 973–82. doi:10.1038/nrg2165. PMID   17984973. S2CID   32132898.
  26. Ivics Z, Izsvák Z (January 2005). "A whole lotta jumpin' goin' on: new transposon tools for vertebrate functional genomics". Trends in Genetics. 21 (1): 8–11. doi:10.1016/j.tig.2004.11.008. PMID   15680506.
  27. Oler AJ, Traina-Dorge S, Derbes RS, Canella D, Cairns BR, Roy-Engel AM (June 2012). "Alu expression in human cell lines and their retrotranspositional potential". Mobile DNA. 3 (1): 11. doi: 10.1186/1759-8753-3-11 . PMC   3412727 . PMID   22716230.
  28. 1 2 Cole ST, Eiglmeier K, Parkhill J, James KD, Thomson NR, Wheeler PR, et al. (February 2001). "Massive gene decay in the leprosy bacillus". Nature. 409 (6823): 1007–1011. Bibcode:2001Natur.409.1007C. doi:10.1038/35059006. PMID   11234002. S2CID   4307207.
  29. Griffiths A (December 2011). "Slipping and sliding: frameshift mutations in herpes simplex virus thymidine kinase and drug-resistance". Drug Resistance Updates. 14 (6): 251–259. doi:10.1016/j.drup.2011.08.003. PMC   3195865 . PMID   21940196.
  30. Eyre-Walker A, Keightley PD (August 2007). "The distribution of fitness effects of new mutations". Nature Reviews. Genetics. 8 (8): 610–618. doi:10.1038/nrg2146. PMID   17637733. S2CID   10868777.
  31. Gillespie JH (September 1984). "Molecular Evolution over the Mutational Landscape". Evolution; International Journal of Organic Evolution. 38 (5): 1116–1129. doi:10.2307/2408444. JSTOR   2408444. PMID   28555784.
  32. Pink RC, Wicks K, Caley DP, Punch EK, Jacobs L, Carter DR (May 2011). "Pseudogenes: pseudo-functional or key regulators in health and disease?". RNA. 17 (5): 792–8. doi:10.1261/rna.2658311. PMC   3078729 . PMID   21398401.
  33. Sharon D, Glusman G, Pilpel Y, Horn-Saban S, Lancet D (November 1998). "Genome dynamics, evolution, and protein modeling in the olfactory receptor gene superfamily". Annals of the New York Academy of Sciences. 855 (1): 182–93. Bibcode:1998NYASA.855..182S. doi:10.1111/j.1749-6632.1998.tb10564.x. PMID   9929603. S2CID   29725250.
  34. Mombaerts P (2001). "The human repertoire of odorant receptor genes and pseudogenes". Annual Review of Genomics and Human Genetics. 2: 493–510. doi: 10.1146/annurev.genom.2.1.493 . PMID   11701659.
  35. Liu M, Grigoriev A (September 2004). "Protein domains correlate strongly with exons in multiple eukaryotic genomes--evidence of exon shuffling?". Trends in Genetics. 20 (9): 399–403. doi:10.1016/j.tig.2004.06.013. PMID   15313546.
  36. Froy O, Gurevitz M (December 2003). "Arthropod and mollusk defensins--evolution by exon-shuffling". Trends in Genetics. 19 (12): 684–7. doi:10.1016/j.tig.2003.10.010. PMID   14642747.
  37. Roy SW (July 2003). "Recent evidence for the exon theory of genes". Genetica. 118 (2–3): 251–66. doi:10.1023/A:1024190617462. PMID   12868614. S2CID   2266380.
  38. Boscaro V, Felletti M, Vannini C, Ackerman MS, Chain PS, Malfatti S, Vergez LM, Shin M, Doak TG, Lynch M, Petroni G (November 2013). "Polynucleobacter necessarius, a model for genome reduction in both free-living and symbiotic bacteria". Proceedings of the National Academy of Sciences of the United States of America. 110 (46): 18590–5. Bibcode:2013PNAS..11018590B. doi: 10.1073/pnas.1316687110 . PMC   3831957 . PMID   24167248.
  39. Brawand D, Wagner CE, Li YI, Malinsky M, Keller I, Fan S, et al. (September 2014). "The genomic substrate for adaptive radiation in African cichlid fish". Nature. 513 (7518): 375–381. Bibcode:2014Natur.513..375B. doi:10.1038/nature13726 (inactive 2024-04-26). PMC   4353498 . PMID   25186727.{{cite journal}}: CS1 maint: DOI inactive as of April 2024 (link)
  40. Jiggins CD (September 2014). "Evolutionary biology: Radiating genomes". Nature. 513 (7518): 318–9. Bibcode:2014Natur.513..318J. doi: 10.1038/nature13742 . PMID   25186726.
  41. Kryuchkova-Mostacci N, Robinson-Rechavi M (December 2016). "Tissue-Specificity of Gene Expression Diverges Slowly between Orthologs, and Rapidly between Paralogs". PLOS Computational Biology. 12 (12): e1005274. Bibcode:2016PLSCB..12E5274K. doi: 10.1371/journal.pcbi.1005274 . PMC   5193323 . PMID   28030541.
  42. Li W (November 2011). "On parameters of the human genome". Journal of Theoretical Biology. 288: 92–104. Bibcode:2011JThBi.288...92L. doi:10.1016/j.jtbi.2011.07.021. PMID   21821053.
  43. Galtier N (February 2003). "Gene conversion drives GC content evolution in mammalian histones". Trends in Genetics. 19 (2): 65–8. doi:10.1016/S0168-9525(02)00002-1. PMID   12547511.
  44. Šmarda P, Bureš P, Horová L, Leitch IJ, Mucina L, Pacini E, et al. (September 2014). "Ecological and evolutionary significance of genomic GC content diversity in monocots". Proceedings of the National Academy of Sciences of the United States of America. 111 (39): E4096-102. Bibcode:2014PNAS..111E4096M. doi: 10.1073/pnas.1321152111 . PMC   4191780 . PMID   25225383.
  45. Levine MT, Jones CD, Kern AD, Lindfors HA, Begun DJ (June 2006). "Novel genes derived from noncoding DNA in Drosophila melanogaster are frequently X-linked and exhibit testis-biased expression". Proceedings of the National Academy of Sciences of the United States of America. 103 (26): 9935–9. Bibcode:2006PNAS..103.9935L. doi: 10.1073/pnas.0509809103 . PMC   1502557 . PMID   16777968.
  46. Zhou Q, Zhang G, Zhang Y, Xu S, Zhao R, Zhan Z, Li X, Ding Y, Yang S, Wang W (September 2008). "On the origin of new genes in Drosophila". Genome Research. 18 (9): 1446–55. doi:10.1101/gr.076588.108. PMC   2527705 . PMID   18550802.
  47. Cai J, Zhao R, Jiang H, Wang W (May 2008). "De novo origination of a new protein-coding gene in Saccharomyces cerevisiae". Genetics. 179 (1): 487–96. doi:10.1534/genetics.107.084491. PMC   2390625 . PMID   18493065.
  48. Xiao W, Liu H, Li Y, Li X, Xu C, Long M, Wang S (2009). El-Shemy HA (ed.). "A rice gene of de novo origin negatively regulates pathogen-induced defense response". PLOS ONE. 4 (2): e4603. Bibcode:2009PLoSO...4.4603X. doi: 10.1371/journal.pone.0004603 . PMC   2643483 . PMID   19240804.
  49. Knowles DG, McLysaght A (October 2009). "Recent de novo origin of human protein-coding genes". Genome Research. 19 (10): 1752–9. doi:10.1101/gr.095026.109. PMC   2765279 . PMID   19726446.
  50. Wu DD, Irwin DM, Zhang YP (November 2011). "De novo origin of human protein-coding genes". PLOS Genetics. 7 (11): e1002379. doi: 10.1371/journal.pgen.1002379 . PMC   3213175 . PMID   22102831.
  51. Ramisetty BC, Sudhakari PA (2019). "Bacterial 'Grounded' Prophages: Hotspots for Genetic Renovation and Innovation". Frontiers in Genetics. 10: 65. doi: 10.3389/fgene.2019.00065 . PMC   6379469 . PMID   30809245.
  52. Nam I, Nam HG, Zare RN. Abiotic synthesis of purine and pyrimidine ribonucleosides in aqueous microdroplets. Proc Natl Acad Sci U S A 2018 Jan 2;115(1):36-40. doi: 10.1073/pnas.1718559115. Epub 2017 Dec 18. PMID 29255025; PMCID: PMC5776833
  53. Becker S, Feldmann J, Wiedemann S, Okamura H, Schneider C, Iwan K, Crisp A, Rossa M, Amatov T, Carell T. Unified prebiotically plausible synthesis of pyrimidine and purine RNA ribonucleotides. Science. 2019 Oct 4;366(6461):76-82. doi: 10.1126/science.aax2747. PMID 31604305.