G-value paradox

Last updated

The G-value paradox arises from the lack of correlation between the number of protein-coding genes among eukaryotes and their relative biological complexity. The microscopic nematode Caenorhabditis elegans , for example, is composed of only a thousand cells but has about the same number of genes as a human. [1] [2] Researchers suggest resolution of the paradox may lie in mechanisms such as alternative splicing and complex gene regulation that make the genes of humans and other complex eukaryotes relatively more productive. [3]

Contents

DNA and biological complexity

The lack of correlation between the morphological complexity of eukaryotes and the amount of genetic information they carry has long puzzled researchers. [4] The sheer amount of DNA in an organism, measured by the mass of DNA present in the nucleus or the number of constituent nucleotide pairs, varies by several orders of magnitude among eukaryotes and often is unrelated to an organism's size or developmental complexity. [5] One amoeba has 200 times more DNA per cell than humans, [6] and even insects and plants within the same genus can vary dramatically in their quantity of DNA. [7] This C-value paradox troubled genome scientists for many years.

Eventually, researchers recognized that not all DNA contributes directly to the production of proteins and other biological functions. [8] Susumu Ohno coined the phrase "junk DNA" to describe these nonfunctional swaths of DNA. [9] They include introns, genetic sequences that are removed after transcription into mRNA and thus are not translated into proteins; [4] [10] transposable elements that are mobile fragments of DNA, most of which are nonfunctional in humans; [8] [11] and pseudogenes, nonfunctional DNA sequences that originated from functional genes. [12] The share of the human genome that may be considered "junk" remains controversial. Estimates reach as low as 8% [13] and as high as 80%, [14] with one researcher arguing that there is a fixed ceiling of 15% imposed by the genome's genetic load. [15] (Prokaryotes, which have little "junk" DNA by comparison, exhibit a fairly close relationship between genome size and biological functionality). [16]

In any case, the assumption was that once the C-paradox was swept away and the focus shifted to the number of protein-coding genes, the anticipated correlation between genetic information and biological complexity in eukaryotes would emerge. [3] Unfortunately, the G-value paradox simply picked up where the C-value paradox left off, because the discrepancy persisted when comparisons were narrowed to just protein-coding genes. [3] [17]

G-value paradox

Estimates of the number of coding genes in the human genome reached upwards of 100,000 prior to the human genome project, [18] but since have dwindled to as low as 19,000 following completion of that massive sequencing effort and subsequent refinements. [1] By comparison, the microscopic water flea Daphnia pulex has about 31,000 genes; [19] the nematode C. elegans about 19,700; [2] the fruit fly (Drosophila melanogaster) about 14,000; [20] the zebrafish ( Danio rerio), 26,000; [21] and the small flowering plant Arabidopsisthaliana , 27,000. [22] Plants in general tend to have more genes than other eukaryotes. [23] One explanation is their higher incidence of gene and whole genome duplication and retention of those additional genes, due in part to their development of a large collection of defensive secondary metabolites. [23]

The apparent disconnect between the number of genes in a species and its biological complexity was dubbed the G-value paradox. [3] While the C-value paradox unraveled with the discovery of massive sequences of noncoding DNA, resolution of the G-value paradox appears to rest on differences in genome productivity. Humans and other complex eukaryotes simply may be able to do more with what they have, genetically speaking.

Among the mechanisms cited for this greater productivity are more sophisticated transcriptional controls, [24] multifunctional proteins, more interaction between protein products, alternative splicing [25] and post-translational modifications that may produce several protein products from the same genetic raw material. [3] [24] In addition, thousands of non-coding RNAs that are transcribed from DNA but not translated into protein have emerged as important regulators of gene expression and development in humans and other eukaryotes. [26] They include short RNA sequences, such as microRNAs (miRNAs), small interfering RNAs (siRNAs) and Piwi-interacting RNAs (piRNAs), [26] and long, non-coding RNAs (lncRNA) that may regulate gene expression at different stages of development. [27] Some researchers suggest that instead of the number of genes the focus now should shift to gene interactions and the network of genetic regulatory mechanisms that allow them to support a variety of biological activities. [28] [24] These transitions have taken analysis of genetic complexity from the C-value to the G-value to what some refer to as the I-value, a measure of the total information contained in a genome. [3]

Defining complexity

One of the challenges in the long debate over the mismatch between genome size and biological complexity has been ambiguity in defining complexity. Is it the number of cell types in an organism, the sophistication of its nervous system or the number of different proteins it produces? [17] By some definitions, the greater complexity of humans compared to other organisms may be illusory. [29] Even once complexity is defined, some researchers argue complexity in function does not necessarily require the same complexity in process. Evolution is not a paragon of efficiency but travels a crooked path that leads to a more cumbersome genome than is necessary in some species. [30]

Related Research Articles

<span class="mw-page-title-main">Genome</span> All genetic material of an organism

In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA. The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as regulatory sequences, and often a substantial fraction of junk DNA with no evident function. Almost all eukaryotes have mitochondria and a small mitochondrial genome. Algae and plants also contain chloroplasts with a chloroplast genome.

An intron is any nucleotide sequence within a gene that is not expressed or operative in the final RNA product. The word intron is derived from the term intragenic region, i.e., a region inside a gene. The term intron refers to both the DNA sequence within a gene and the corresponding RNA sequence in RNA transcripts. The non-intron sequences that become joined by this RNA processing to form the mature RNA are called exons.

<span class="mw-page-title-main">RNA</span> Family of large biological molecules

Ribonucleic acid (RNA) is a polymeric molecule that is essential for most biological functions, either by performing the function itself or by forming a template for the production of proteins. RNA and deoxyribonucleic acid (DNA) are nucleic acids. The nucleic acids constitute one of the four major macromolecules essential for all known forms of life. RNA is assembled as a chain of nucleotides. Cellular organisms use messenger RNA (mRNA) to convey genetic information that directs synthesis of specific proteins. Many viruses encode their genetic information using an RNA genome.

<span class="mw-page-title-main">Human genome</span> Complete set of nucleic acid sequences for humans

The human genome is a complete set of nucleic acid sequences for humans, encoded as DNA within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria. These are usually treated separately as the nuclear genome and the mitochondrial genome. Human genomes include both protein-coding DNA sequences and various types of DNA that does not encode proteins. The latter is a diverse category that includes DNA coding for non-translated RNA, such as that for ribosomal RNA, transfer RNA, ribozymes, small nuclear RNAs, and several types of regulatory RNAs. It also includes promoters and their associated gene-regulatory elements, DNA playing structural and replicatory roles, such as scaffolding regions, telomeres, centromeres, and origins of replication, plus large numbers of transposable elements, inserted viral DNA, non-functional pseudogenes and simple, highly repetitive sequences. Introns make up a large percentage of non-coding DNA. Some of this non-coding DNA is non-functional junk DNA, such as pseudogenes, but there is no firm consensus on the total amount of junk DNA.

Non-coding DNA (ncDNA) sequences are components of an organism's DNA that do not encode protein sequences. Some non-coding DNA is transcribed into functional non-coding RNA molecules. Other functional regions of the non-coding DNA fraction include regulatory sequences that control gene expression; scaffold attachment regions; origins of DNA replication; centromeres; and telomeres. Some non-coding regions appear to be mostly nonfunctional, such as introns, pseudogenes, intergenic DNA, and fragments of transposons and viruses. Regions that are completely nonfunctional are called junk DNA.

Junk DNA is a DNA sequence that has no relevant biological function. Most organisms have some junk DNA in their genomes—mostly pseudogenes and fragments of transposons and viruses—but it is possible that some organisms have substantial amounts of junk DNA.

Heterochromatin is a tightly packed form of DNA or condensed DNA, which comes in multiple varieties. These varieties lie on a continuum between the two extremes of constitutive heterochromatin and facultative heterochromatin. Both play a role in the expression of genes. Because it is tightly packed, it was thought to be inaccessible to polymerases and therefore not transcribed; however, according to Volpe et al. (2002), and many other papers since, much of this DNA is in fact transcribed, but it is continuously turned over via RNA-induced transcriptional silencing (RITS). Recent studies with electron microscopy and OsO4 staining reveal that the dense packing is not due to the chromatin.

<span class="mw-page-title-main">Genetic recombination</span> Production of offspring with combinations of traits that differ from those found in either parent

Genetic recombination is the exchange of genetic material between different organisms which leads to production of offspring with combinations of traits that differ from those found in either parent. In eukaryotes, genetic recombination during meiosis can lead to a novel set of genetic information that can be further passed on from parents to offspring. Most recombination occurs naturally and can be classified into two types: (1) interchromosomal recombination, occurring through independent assortment of alleles whose loci are on different but homologous chromosomes ; & (2) intrachromosomal recombination, occurring through crossing over.

Molecular evolution is the process of change in the sequence composition of cellular molecules such as DNA, RNA, and proteins across generations. The field of molecular evolution uses principles of evolutionary biology and population genetics to explain patterns in these changes. Major topics in molecular evolution concern the rates and impacts of single nucleotide changes, neutral evolution vs. natural selection, origins of new genes, the genetic nature of complex traits, the genetic basis of speciation, the evolution of development, and ways that evolutionary forces influence genomic and phenotypic changes.

<span class="mw-page-title-main">Pseudogene</span> Functionless relative of a gene

Pseudogenes are nonfunctional segments of DNA that resemble functional genes. Most arise as superfluous copies of functional genes, either directly by gene duplication or indirectly by reverse transcription of an mRNA transcript. Pseudogenes are usually identified when genome sequence analysis finds gene-like sequences that lack regulatory sequences needed for transcription or translation, or whose coding sequences are obviously defective due to frameshifts or premature stop codons. Pseudogenes are a type of junk DNA.

The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The term transcriptome is a portmanteau of the words transcript and genome; it is associated with the process of transcript production during the biological process of transcription.

C-value is the amount, in picograms, of DNA contained within a haploid nucleus or one half the amount in a diploid somatic cell of a eukaryotic organism. In some cases, the terms C-value and genome size are used interchangeably; however, in polyploids the C-value may represent two or more genomes contained within the same nucleus. Greilhuber et al. have suggested some new layers of terminology and associated abbreviations to clarify this issue, but these somewhat complex additions are yet to be used by other authors.

<span class="mw-page-title-main">Genome size</span> Amount of DNA contained in a genome

Genome size is the total amount of DNA contained within one copy of a single complete genome. It is typically measured in terms of mass in picograms or less frequently in daltons, or as the total number of nucleotide base pairs, usually in megabases. One picogram is equal to 978 megabases. In diploid organisms, genome size is often used interchangeably with the term C-value.

<span class="mw-page-title-main">Gene</span> Sequence of DNA or RNA that codes for an RNA or protein product

In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA, that is transcribed to produce a functional RNA. There are two types of molecular genes: protein-coding genes and non-coding genes.

The evolution of biological complexity is one important outcome of the process of evolution. Evolution has produced some remarkably complex organisms – although the actual level of complexity is very hard to define or measure accurately in biology, with properties such as gene content, the number of cell types or morphology all proposed as possible metrics.

Evolution of cells refers to the evolutionary origin and subsequent evolutionary development of cells. Cells first emerged at least 3.8 billion years ago approximately 750 million years after Earth was formed.

<span class="mw-page-title-main">Genome evolution</span> Process by which a genome changes in structure or size over time

Genome evolution is the process by which a genome changes in structure (sequence) or size over time. The study of genome evolution involves multiple fields such as structural analysis of the genome, the study of genomic parasites, gene and ancient genome duplications, polyploidy, and comparative genomics. Genome evolution is a constantly changing and evolving field due to the steadily growing number of sequenced genomes, both prokaryotic and eukaryotic, available to the scientific community and the public at large.

Periannan Senapathy is a molecular biologist, geneticist, author and entrepreneur. He is the founder, president and chief scientific officer at Genome International Corporation, a biotechnology, bioinformatics, and information technology firm based in Madison, Wisconsin, which develops computational genomics applications of next-generation DNA sequencing (NGS) and clinical decision support systems for analyzing patient genome data that aids in diagnosis and treatment of diseases.

James A. Lake is an American evolutionary biologist and a Distinguished Professor of Molecular, Cell, and Developmental Biology and of Human Genetics at UCLA. Lake is best known for the New Animal Phylogeny and for the first three-dimensional structure of the ribosome. He has also made significant contributions to understanding genome evolution across all kingdoms of life, including discovering informational and operational genes, elucidating the complexity hypothesis for gene transfer, rooting the tree of life, and understanding the early transition from prokaryotic to eukaryotic life.

The split gene theory is a theory of the origin of introns, long non-coding sequences in eukaryotic genes between the exons. The theory holds that the randomness of primordial DNA sequences would only permit small (< 600bp) open reading frames (ORFs), and that important intron structures and regulatory sequences are derived from stop codons. In this introns-first framework, the spliceosomal machinery and the nucleus evolved due to the necessity to join these ORFs into larger proteins, and that intronless bacterial genes are less ancestral than the split eukaryotic genes. The theory originated with Periannan Senapathy.

References

  1. 1 2 Ezkurdia I, Juan D, Rodriguez JM, Frankish A, Diekhans M, Harrow J, et al. (November 2014). "Multiple evidence strands suggest that there may be as few as 19,000 human protein-coding genes". Human Molecular Genetics. 23 (22): 5866–78. doi:10.1093/hmg/ddu309. PMC   4204768 . PMID   24939910.
  2. 1 2 Hillier LW, Coulson A, Murray JI, Bao Z, Sulston JE, Waterston RH (December 2005). "Genomics in C. elegans: so many genes, such a little worm". Genome Research. 15 (12): 1651–60. doi: 10.1101/gr.3729105 . PMID   16339362.
  3. 1 2 3 4 5 6 Hahn MW, Wray GA (2002). "The g-value paradox". Evolution & Development. 4 (2): 73–5. doi:10.1046/j.1525-142X.2002.01069.x. PMID   12004964. S2CID   2810069.
  4. 1 2 Gall JG (December 1981). "Chromosome structure and the C-value paradox". The Journal of Cell Biology. 91 (3 Pt 2): 3s–14s. doi:10.1083/jcb.91.3.3s. PMC   2112778 . PMID   7033242.
  5. Cavalier-Smith T (December 1978). "Nuclear volume control by nucleoskeletal DNA, selection for cell volume and cell growth rate, and the solution of the DNA C-value paradox". Journal of Cell Science. 34: 247–78. doi:10.1242/jcs.34.1.247. PMID   372199.
  6. Holm-Hansen O (January 1969). "Algae: amounts of DNA and organic carbon in single cells". Science. 163 (3862): 87–8. Bibcode:1969Sci...163...87H. doi:10.1126/science.163.3862.87. PMID   5812598. S2CID   44975843.
  7. Thomas CA (1971). "The genetic organization of chromosomes". Annual Review of Genetics. 5 (1): 237–56. doi:10.1146/annurev.ge.05.120171.001321. PMID   16097657.
  8. 1 2 Gregory TR (September 2005). "Synergy between sequence and size in large-scale genomics". Nature Reviews. Genetics. 6 (9): 699–708. doi:10.1038/nrg1674. PMID   16151375. S2CID   24237594.
  9. Ohno, S. (1972). "So much "junk" DNA in our genome". Brookhaven Symp. Biol. 23: 366–370. PMID   5065367.
  10. Gilbert W (May 1985). "Genes-in-pieces revisited". Science. 228 (4701): 823–4. Bibcode:1985Sci...228..823G. doi:10.1126/science.4001923. PMID   4001923.
  11. Orgel LE, Crick FH (April 1980). "Selfish DNA: the ultimate parasite". Nature. 284 (5757): 604–7. Bibcode:1980Natur.284..604O. doi:10.1038/284604a0. PMID   7366731. S2CID   4233826.
  12. Balakirev ES, Ayala FJ (2003). "Pseudogenes: are they "junk" or functional DNA?". Annual Review of Genetics. 37 (1): 123–51. doi:10.1146/annurev.genet.37.040103.103949. PMID   14616058. S2CID   24683075.
  13. Rands CM, Meader S, Ponting CP, Lunter G (July 2014). Schierup MH (ed.). "8.2% of the Human genome is constrained: variation in rates of turnover across functional element classes in the human lineage". PLOS Genetics. 10 (7): e1004525. doi: 10.1371/journal.pgen.1004525 . PMC   4109858 . PMID   25057982.
  14. Dunham I, Kundaje A, Aldred SF, Collins PJ, Davis CA, Doyle F, et al. (ENCODE Project Consortium) (September 2012). "An integrated encyclopedia of DNA elements in the human genome". Nature. 489 (7414): 57–74. Bibcode:2012Natur.489...57T. doi:10.1038/nature11247. PMC   3439153 . PMID   22955616.
  15. Graur D (July 2017). Martin B (ed.). "An Upper Limit on the Functional Fraction of the Human Genome". Genome Biology and Evolution. 9 (7): 1880–1885. doi:10.1093/gbe/evx121. PMC   5570035 . PMID   28854598.
  16. Taft RJ, Pheasant M, Mattick JS (March 2007). "The relationship between non-protein-coding DNA and eukaryotic complexity". BioEssays. 29 (3): 288–99. doi:10.1002/bies.20544. PMID   17295292. S2CID   16226307.
  17. 1 2 Claverie JM (February 2001). "Gene number. What if there are only 30,000 human genes?". Science. 291 (5507): 1255–7. doi:10.1126/science.1058969. PMID   11233450. S2CID   11444318.
  18. Fields C, Adams MD, White O, Venter JC (July 1994). "How many genes in the human genome?". Nature Genetics. 7 (3): 345–6. doi:10.1038/ng0794-345. PMID   7920649. S2CID   26164550.
  19. Colbourne JK, Pfrender ME, Gilbert D, Thomas WK, Tucker A, Oakley TH, et al. (February 2011). "The ecoresponsive genome of Daphnia pulex". Science. 331 (6017): 555–61. Bibcode:2011Sci...331..555C. doi:10.1126/science.1197761. PMC   3529199 . PMID   21292972.
  20. Hales KG, Korey CA, Larracuente AM, Roberts DM (November 2015). "Genetics on the Fly: A Primer on the Drosophila Model System". Genetics. 201 (3): 815–42. doi:10.1534/genetics.115.183392. PMC   4649653 . PMID   26564900.
  21. Howe K, Clark MD, Torroja CF, Torrance J, Berthelot C, Muffato M, et al. (April 2013). "The zebrafish reference genome sequence and its relationship to the human genome". Nature. 496 (7446): 498–503. Bibcode:2013Natur.496..498H. doi:10.1038/nature12111. PMC   3703927 . PMID   23594743.
  22. Swarbreck D, Wilks C, Lamesch P, Berardini TZ, Garcia-Hernandez M, Foerster H, et al. (January 2008). "The Arabidopsis Information Resource (TAIR): gene structure and function annotation". Nucleic Acids Research. 36 (Database issue): D1009-14. doi:10.1093/nar/gkm965. PMC   2238962 . PMID   17986450.
  23. 1 2 Sterck L, Rombauts S, Vandepoele K, Rouzé P, Van de Peer Y (April 2007). "How many genes are there in plants (... and why are they there)?". Current Opinion in Plant Biology. 10 (2): 199–203. doi:10.1016/j.pbi.2007.01.004. PMID   17289424.
  24. 1 2 3 Wray GA, Hahn MW, Abouheif E, Balhoff JP, Pizer M, Rockman MV, Romano LA (September 2003). "The evolution of transcriptional regulation in eukaryotes". Molecular Biology and Evolution. 20 (9): 1377–419. doi: 10.1093/molbev/msg140 . PMID   12777501.
  25. Nagasaki H, Arita M, Nishizawa T, Suwa M, Gotoh O (December 2005). "Species-specific variation of alternative splicing and transcriptional initiation in six eukaryotes". Gene. 364: 53–62. doi:10.1016/j.gene.2005.07.027. PMID   16219431.
  26. 1 2 Gaiti F, Calcino AD, Tanurdžić M, Degnan BM (July 2017). "Origin and evolution of the metazoan non-coding regulatory genome". Developmental Biology. 427 (2): 193–202. doi: 10.1016/j.ydbio.2016.11.013 . PMID   27880868.
  27. Leone S, Santoro R (August 2016). "Challenges in the analysis of long noncoding RNA functionality". FEBS Letters. 590 (15): 2342–53. doi: 10.1002/1873-3468.12308 . PMID   27417130. S2CID   19766152.
  28. Szathmáry E, Jordán F, Pál C (May 2001). "Molecular biology and evolution. Can genes explain biological complexity?". Science. 292 (5520): 1315–6. doi:10.1126/science.1060852. PMID   11360989. S2CID   86104866.
  29. McShea DW (April 1996). "Perspective Metazoan Complexity and Evolution: Is There a Trend?". Evolution; International Journal of Organic Evolution. 50 (2): 477–492. doi:10.1111/j.1558-5646.1996.tb03861.x. PMID   28568940. S2CID   29590466.
  30. Jacob F (June 1977). "Evolution and tinkering". Science. 196 (4295): 1161–6. Bibcode:1977Sci...196.1161J. doi:10.1126/science.860134. PMID   860134. S2CID   29756896.