C-value is the amount, in picograms, of DNA contained within a haploid nucleus (e.g. a gamete) or one half the amount in a diploid somatic cell of a eukaryotic organism. In some cases (notably among diploid organisms), the terms C-value and genome size are used interchangeably; however, in polyploids the C-value may represent two or more genomes contained within the same nucleus. Greilhuber et al. [1] have suggested some new layers of terminology and associated abbreviations to clarify this issue, but these somewhat complex additions are yet to be used by other authors.
Many authors have incorrectly assumed that the 'C' in "C-value" refers to "characteristic", "content", or "complement". Even among authors who have attempted to trace the origin of the term, there had been some confusion because Hewson Swift did not define it explicitly when he coined it in 1950. [2] In his original paper, Swift appeared to use the designation "1C value", "2C value", etc., in reference to "classes" of DNA content (e.g., Gregory 2001, [3] 2002 [4] ); however, Swift explained in personal correspondence to Prof. Michael D. Bennett in 1975 that "I am afraid the letter C stood for nothing more glamorous than 'constant', i.e., the amount of DNA that was characteristic of a particular genotype" (quoted in Bennett and Leitch 2005 [5] ). This is in reference to the report in 1948 by Vendrely and Vendrely of a "remarkable constancy in the nuclear DNA content of all the cells in all the individuals within a given animal species" (translated from the original French). [6] Swift's study of this topic related specifically to variation (or lack thereof) among chromosome sets in different cell types within individuals, but his notation evolved into "C-value" in reference to the haploid DNA content of individual species and retains this usage today.
C-values vary enormously among species. In animals they range more than 3,300-fold, and in land plants they differ by a factor of about 1,000. [5] [7] Protist genomes have been reported to vary more than 300,000-fold in size, but the high end of this range (Amoeba) has been called into question. Variation in C-values bears no relationship to the complexity of the organism or the number of genes contained in its genome; for example, some single-celled protists have genomes much larger than that of humans. This observation was deemed counterintuitive before the discovery of repetitive DNA. It became known as the C-value paradox as a result. However, although there is no longer any paradoxical aspect to the discrepancy between C-value and gene number, this term remains in common usage. For reasons of conceptual clarification, the various puzzles that remain with regard to genome size variation instead have been suggested to more accurately comprise a complex but clearly defined puzzle known as the C-value enigma. C-values correlate with a range of features at the cell and organism levels, including cell size, cell division rate, and, depending on the taxon, body size, metabolic rate, developmental rate, organ complexity, geographical distribution, or extinction risk (for recent reviews, see Bennett and Leitch 2005; [5] Gregory 2005 [7] ).
The C-value enigma or C-value paradox is the complex puzzle surrounding the extensive variation in nuclear genome size among eukaryotic species. At the center of the C-value enigma is the observation that genome size does not correlate with organismal complexity; for example, some single-celled protists have genomes much larger than that of humans.
Some prefer the term C-value enigma because it explicitly includes all of the questions that will need to be answered if a complete understanding of genome size evolution is to be achieved (Gregory 2005). Moreover, the term paradox implies a lack of understanding of one of the most basic features of eukaryotic genomes: namely that they are composed primarily of non-coding DNA. Some have claimed that the term paradox also has the unfortunate tendency to lead authors to seek simple one-dimensional solutions to what is, in actuality, a multi-faceted puzzle. [8] For these reasons, in 2003 the term "C-value enigma" was endorsed in preference to "C-value paradox" at the Second Plant Genome Size Discussion Meeting and Workshop at the Royal Botanic Gardens, Kew, UK, [8] and an increasing number of authors have begun adopting this term.
In 1948, Roger and Colette Vendrely reported a "remarkable constancy in the nuclear DNA content of all the cells in all the individuals within a given animal species", [9] which they took as evidence that DNA, rather than protein, was the substance of which genes are composed. The term C-value reflects this observed constancy. However, it was soon found that C-values (genome sizes) vary enormously among species and that this bears no relationship to the presumed number of genes (as reflected by the complexity of the organism). [10] For example, the cells of some salamanders may contain 40 times more DNA than those of humans. [11] Given that C-values were assumed to be constant because genetic information is encoded by DNA, and yet bore no relationship to presumed gene number, this was understandably considered paradoxical; the term "C-value paradox" was used to describe this situation by C.A. Thomas Jr. in 1971.
The discovery of repetitive DNA in the late 1960s resolved the main question of the C-value paradox: genome size does not reflect gene number in eukaryotes since most of the excess DNA in many species appears to be Junk DNA. The human genome, for example, contains about 10% functional elements and the remaining 90% is thought to be junk. Species with larger genomes are thought to contain a higher proportion of junk DNA.
The term "C-value enigma" represents an update of the more common but outdated term "C-value paradox" (Thomas 1971), being ultimately derived from the term "C-value" (Swift 1950) in reference to haploid nuclear DNA contents. The term was coined by Canadian biologist Dr. T. Ryan Gregory of the University of Guelph in 2000/2001. In general terms, the C-value enigma relates to the issue of variation in the amount of non-coding DNA found within the genomes of different eukaryotes.
The C-value enigma, unlike the older C-value paradox, is explicitly defined as a series of independent but equally important component questions, including:
Nucleotide | Chemical formula | Relative molecular mass (Da) |
---|---|---|
2′-deoxyadenosine 5′-monophosphate | C10H14N5O6P | 331.2213 |
2′-deoxythymidine 5′-monophosphate | C10H15N2O8P | 322.2079 |
2′-deoxyguanosine 5′-monophosphate | C10H14N5O7P | 347.2207 |
2′-deoxycytidine 5′-monophosphate | C9H14N3O7P | 307.1966 |
†Source of table: Doležel et al., 2003 [12]
The formulas for converting the number of nucleotide pairs (or base pairs) to picograms of DNA and vice versa are: [12]
genome size (bp) = (0.978 x 109) x DNA content (pg) DNA content (pg) = genome size (bp) / (0.978 x 109) 1 pg = 978 Mbp
By using the data in Table 1, relative masses of nucleotide pairs can be calculated as follows: A/T = 615.383 and G/C = 616.3711, bearing in mind that formation of one phosphodiester linkage involves a loss of one H2O molecule. Further, phosphates of nucleotides in the DNA chain are acidic, so at physiologic pH the H+ ion is dissociated. Provided the ratio of A/T to G/C pairs is 1:1 (the GC-content is 50%), the mean relative mass of one nucleotide pair is 615.8771.
The relative molecular mass may be converted to an absolute value by multiplying it by the atomic mass unit (1 u) in picograms. Thus, 615.8771 is multiplied by 1.660539 × 10−12 pg. Consequently, the mean mass per nucleotide pair would be 1.023 × 10−9 pg, and 1 pg of DNA would represent 0.978 × 109 base pairs (978 Mbp). [12]
No species has a GC-content of exactly 50% (equal amounts of A/T and G/C nucleotide bases) as assumed by Doležel et al. However, as a G/C pair is only heavier than an A/T pair by about 1/6 of 1%, the effect of variations in GC content is small. The actual GC content varies between species, between chromosomes, and between isochores (sections of a chromosome with like GC content). Adjusting Doležel's calculation for GC content, the theoretical variation in base pairs per picogram ranges from 977.0317 Mbp/pg for 100% GC content to 978.6005 Mbp/pg for 0% GC content (A/T being lighter, has more Mbp/pg), with a midpoint of 977.8155 Mbp/pg for 50% GC content.
The human genome [13] varies in size; however, the current estimate of the nuclear haploid size of the reference human genome [14] is 3,031,042,417 bp for the X gamete and 2,932,228,937 bp for the Y gamete. The X gamete and Y gamete both contain 22 autosomes whose combined lengths comprise the majority of the genome in both gametes. The X gamete contains an X chromosome, while the Y gamete contains a Y chromosome. The larger size of the X chromosome is responsible for the difference in the size of the two gametes. When the gametes are combined, the XX female zygote has a size of 6,062,084,834 bp while the XY male zygote has a size 5,963,271,354 bp. However, the base pairs of the XX female zygote are distributed among 2 homologous groups of 23 heterologous chromosomes each, while the base pairs of the XY male zygote are distributed among 2 homologous groups of 22 heterologous chromosomes each plus 2 heterologous chromosomes. Although each zygote has 46 chromosomes, 23 chromosomes of the XX female zygote are heterologous while 24 chromosomes of the XY male zygote are heterologous. As a result, the C-value for the XX female zygote is 3.099361 while the C-value for the XY male zygote is 3.157877.
The human genome's GC content is about 41%. [15] Accounting for the autosomal, X, and Y chromosomes, [16] human haploid GC contents are 40.97460% for X gametes, and 41.01724% for Y gametes.
Summarizing these numbers:
Cell | Chromosomes Description | Type | Ploidy | Base Pairs (bp) | GC Content (%) | Density (Mbp/pg) | Mass (pg) | C-Value |
---|---|---|---|---|---|---|---|---|
Sperm or egg | 23 heterologous chromosomes | X Gamete | Haploid | 3,031,042,417 | 40.97460% | 977.9571 | 3.099361 | 3.099361 |
Sperm only | 23 heterologous chromosomes | Y Gamete | Haploid | 2,932,228,937 | 41.01724% | 977.9564 | 2.998323 | 2.998323 |
Zygote | 46 chromosomes consisting of 2 homologous sets of 23 heterologous chromosomes each | XX Female | Diploid | 6,062,084,834 | 40.97460% | 977.9571 | 6.198723 | 3.099361 |
Zygote | 46 chromosomes consisting of 2 homologous sets of 22 heterologous chromosomes each plus 2 heterologous chromosomes | XY Male | Mostly diploid | 5,963,271,354 | 40.99557% | 977.9567 | 6.097684 | 3.157877 |
A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both DNA and RNA. Dictated by specific hydrogen bonding patterns, "Watson–Crick" base pairs allow the DNA helix to maintain a regular helical structure that is subtly dependent on its nucleotide sequence. The complementary nature of this based-paired structure provides a redundant copy of the genetic information encoded within each strand of DNA. The regular structure and data redundancy provided by the DNA double helix make DNA well suited to the storage of genetic information, while base-pairing between DNA and incoming nucleotides provides the mechanism through which DNA polymerase replicates DNA and RNA polymerase transcribes DNA into RNA. Many DNA-binding proteins can recognize specific base-pairing patterns that identify particular regulatory regions of genes.
A gamete is a haploid cell that fuses with another haploid cell during fertilization in organisms that reproduce sexually. Gametes are an organism's reproductive cells, also referred to as sex cells. The name gamete was introduced by the German cytologist Eduard Strasburger in 1878.
In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA. The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as regulatory sequences, and often a substantial fraction of junk DNA with no evident function. Almost all eukaryotes have mitochondria and a small mitochondrial genome. Algae and plants also contain chloroplasts with a chloroplast genome.
Meiosis (; from Ancient Greek μείωσις 'lessening', is a special type of cell division of germ cells in sexually-reproducing organisms that produces the gametes, the sperm or egg cells. It involves two rounds of division that ultimately result in four cells, each with only one copy of each chromosome. Additionally, prior to the division, genetic material from the paternal and maternal copies of each chromosome is crossed over, creating new combinations of code on each chromosome. Later on, during fertilisation, the haploid cells produced by meiosis from a male and a female will fuse to create a zygote, a cell with two copies of each chromosome again.
Ploidy is the number of complete sets of chromosomes in a cell, and hence the number of possible alleles for autosomal and pseudoautosomal genes. Sets of chromosomes refer to the number of maternal and paternal chromosome copies, respectively, in each homologous chromosome pair, which chromosomes naturally exist as. Somatic cells, tissues, and individual organisms can be described according to the number of sets of chromosomes present : monoploid, diploid, triploid, tetraploid, pentaploid, hexaploid, heptaploid or septaploid, etc. The generic term polyploid is often used to describe cells with three or more sets of chromosomes.
A zygote is a eukaryotic cell formed by a fertilization event between two gametes. The zygote's genome is a combination of the DNA in each gamete, and contains all of the genetic information of a new individual organism. The sexual fusion of haploid cells is called karyogamy, the result of which is the formation of a diploid cell called the zygote or zygospore.
Non-coding DNA (ncDNA) sequences are components of an organism's DNA that do not encode protein sequences. Some non-coding DNA is transcribed into functional non-coding RNA molecules. Other functional regions of the non-coding DNA fraction include regulatory sequences that control gene expression; scaffold attachment regions; origins of DNA replication; centromeres; and telomeres. Some non-coding regions appear to be mostly nonfunctional, such as introns, pseudogenes, intergenic DNA, and fragments of transposons and viruses. Regions that are completely nonfunctional are called junk DNA.
Alternation of generations is the predominant type of life cycle in plants and algae. In plants both phases are multicellular: the haploid sexual phase – the gametophyte – alternates with a diploid asexual phase – the sporophyte.
A karyotype is the general appearance of the complete set of chromosomes in the cells of a species or in an individual organism, mainly including their sizes, numbers, and shapes. Karyotyping is the process by which a karyotype is discerned by determining the chromosome complement of an individual, including the number of chromosomes and any abnormalities.
Molecular evolution describes how inherited DNA and/or RNA change over evolutionary time, and the consequences of this for proteins and other components of cells and organisms. Molecular evolution is the basis of phylogenetic approaches to describing the tree of life. Molecular evolution overlaps with population genetics, especially on shorter timescales. Topics in molecular evolution include the origins of new genes, the genetic nature of complex traits, the genetic basis of adaptation and speciation, the evolution of development, and patterns and processes underlying genomic changes during evolution.
In cellular biology, a somatic cell, or vegetal cell, is any biological cell forming the body of a multicellular organism other than a gamete, germ cell, gametocyte or undifferentiated stem cell. Somatic cells compose the body of an organism and divide through mitosis.
Gametogenesis is a biological process by which diploid or haploid precursor cells undergo cell division and differentiation to form mature haploid gametes. Depending on the biological life cycle of the organism, gametogenesis occurs by meiotic division of diploid gametocytes into various gametes, or by mitosis. For example, plants produce gametes through mitosis in gametophytes. The gametophytes grow from haploid spores after sporic meiosis. The existence of a multicellular, haploid phase in the life cycle between meiosis and gametogenesis is also referred to as alternation of generations.
In biology, a biological life cycle is a series of stages of the life of an organism, that begins as a zygote, often in an egg, and concludes as an adult that reproduces, producing an offspring in the form of a new zygote which then itself goes through the same series of stages, the process repeating in a cyclic fashion.
Nuclear DNA (nDNA), or nuclear deoxyribonucleic acid, is the DNA contained within each cell nucleus of a eukaryotic organism. It encodes for the majority of the genome in eukaryotes, with mitochondrial DNA and plastid DNA coding for the rest. It adheres to Mendelian inheritance, with information coming from two parents, one male and one female—rather than matrilineally as in mitochondrial DNA.
Genome size is the total amount of DNA contained within one copy of a single complete genome. It is typically measured in terms of mass in picograms or less frequently in daltons, or as the total number of nucleotide base pairs, usually in megabases. One picogram is equal to 978 megabases. In diploid organisms, genome size is often used interchangeably with the term C-value.
A postzygotic mutation is a change in an organism's genome that is acquired during its lifespan, instead of being inherited from its parent(s) through fusion of two haploid gametes. Mutations that occur after the zygote has formed can be caused by a variety of sources that fall under two classes: spontaneous mutations and induced mutations. How detrimental a mutation is to an organism is dependent on what the mutation is, where it occurred in the genome and when it occurred.
Sexual reproduction is a type of reproduction that involves a complex life cycle in which a gamete with a single set of chromosomes combines with another gamete to produce a zygote that develops into an organism composed of cells with two sets of chromosomes (diploid). This is typical in animals, though the number of chromosome sets and how that number changes in sexual reproduction varies, especially among plants, fungi, and other eukaryotes.
Genome evolution is the process by which a genome changes in structure (sequence) or size over time. The study of genome evolution involves multiple fields such as structural analysis of the genome, the study of genomic parasites, gene and ancient genome duplications, polyploidy, and comparative genomics. Genome evolution is a constantly changing and evolving field due to the steadily growing number of sequenced genomes, both prokaryotic and eukaryotic, available to the scientific community and the public at large.
The onion test is a way of assessing the validity of an argument for a functional role for junk DNA. It relates to the paradox that would emerge if the majority of eukaryotic non-coding DNA were assumed to be functional and the difficulty of reconciling that assumption with the diversity in genome sizes among species. The term "onion test" was originally proposed informally in a blog post by T. Ryan Gregory in order to help clarify the debate about junk DNA. The term has been mentioned in newspapers and online media, scientific journal articles, and a textbook. The test is defined as:
The onion test is a simple reality check for anyone who thinks they have come up with a universal function for junk DNA. Whatever your proposed function, ask yourself this question: Can I explain why an onion needs about five times more non-coding DNA for this function than a human?
This glossary of cellular and molecular biology is a list of definitions of terms and concepts commonly used in the study of cell biology, molecular biology, and related disciplines, including genetics, biochemistry, and microbiology. It is split across two articles: