Compositional domain

Last updated
Example of a hypothetical genomic sequence composed of 9 compositionally homogeneous domains used to demonstrate the model. The segmentation algorithm partitioned the sequence and correctly identified 4 domains as compositionally homogeneous domains and 2 compositionally nonhomogeneous domains. CompositionalDomainsInGenome.jpg
Example of a hypothetical genomic sequence composed of 9 compositionally homogeneous domains used to demonstrate the model. The segmentation algorithm partitioned the sequence and correctly identified 4 domains as compositionally homogeneous domains and 2 compositionally nonhomogeneous domains.

A compositional domain in genetics is a region of DNA with a distinct guanine (G) and cytosine (C) G-C and C-G content (collectively GC content). [1] The homogeneity of compositional domains is compared to that of the chromosome on which they reside. As such, compositional domains can be homogeneous or nonhomogeneous domains. Compositionally homogeneous domains that are sufficiently long (= 300 kb) are termed isochores or isochoric domains.

The compositional domain model was proposed as an alternative to the isochoric model. The isochore model was proposed by Bernardi and colleagues to explain the observed non-uniformity of genomic fragments in the genome. [2] However, recent sequencing of complete genomic data refuted the isochoric model. Its main predictions were:

The compositional domain model describes the genome as a mosaic of short and long homogeneous and nonhomogeneous domains. The composition and organization of the domains were shaped by different evolutionary processes that either fused or broke down the domains. This genomic organization model was confirmed in many new genomic studies of cow, [14] honeybee, [15] sea urchin, [16] body louse, [17] Nasonia , [18] beetle, [19] and ant genomes. [20] [21] [22] The human genome was described as consisting of a mixture of compositionally nonhomogeneous domains with numerous short compositionally homogeneous domains and relatively few long ones. [1]

Related Research Articles

<span class="mw-page-title-main">Human genome</span> Complete set of nucleic acid sequences for humans

The human genome is a complete set of nucleic acid sequences for humans, encoded as DNA within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria. These are usually treated separately as the nuclear genome and the mitochondrial genome. Human genomes include both protein-coding DNA sequences and various types of DNA that does not encode proteins. The latter is a diverse category that includes DNA coding for non-translated RNA, such as that for ribosomal RNA, transfer RNA, ribozymes, small nuclear RNAs, and several types of regulatory RNAs. It also includes promoters and their associated gene-regulatory elements, DNA playing structural and replicatory roles, such as scaffolding regions, telomeres, centromeres, and origins of replication, plus large numbers of transposable elements, inserted viral DNA, non-functional pseudogenes and simple, highly repetitive sequences. Introns make up a large percentage of non-coding DNA. Some of this non-coding DNA is non-functional junk DNA, such as pseudogenes, but there is no firm consensus on the total amount of junk DNA.

<span class="mw-page-title-main">Mitochondrial DNA</span> DNA located in mitochondria

Mitochondrial DNA is the DNA located in mitochondria, cellular organelles within eukaryotic cells that convert chemical energy from food into a form that cells can use, such as adenosine triphosphate (ATP). Mitochondrial DNA is only a small portion of the DNA in a eukaryotic cell; most of the DNA can be found in the cell nucleus and, in plants and algae, also in plastids such as chloroplasts.

<span class="mw-page-title-main">CpG site</span> Region of often-methylated DNA with a cytosine followed by a guanine

The CpG sites or CG sites are regions of DNA where a cytosine nucleotide is followed by a guanine nucleotide in the linear sequence of bases along its 5' → 3' direction. CpG sites occur with high frequency in genomic regions called CpG islands.

<span class="mw-page-title-main">Horizontal gene transfer</span> Type of nonhereditary genetic change

Horizontal gene transfer (HGT) or lateral gene transfer (LGT) is the movement of genetic material between organisms other than by the ("vertical") transmission of DNA from parent to offspring (reproduction). HGT is an important factor in the evolution of many organisms. HGT is influencing scientific understanding of higher order evolution while more significantly shifting perspectives on bacterial evolution.

<span class="mw-page-title-main">Comparative genomics</span>

Comparative genomics is a field of biological research in which the genomic features of different organisms are compared. The genomic features may include the DNA sequence, genes, gene order, regulatory sequences, and other genomic structural landmarks. In this branch of genomics, whole or large parts of genomes resulting from genome projects are compared to study basic biological similarities and differences as well as evolutionary relationships between organisms. The major principle of comparative genomics is that common features of two organisms will often be encoded within the DNA that is evolutionarily conserved between them. Therefore, comparative genomic approaches start with making some form of alignment of genome sequences and looking for orthologous sequences in the aligned genomes and checking to what extent those sequences are conserved. Based on these, genome and molecular evolution are inferred and this may in turn be put in the context of, for example, phenotypic evolution or population genetics.

<span class="mw-page-title-main">GC-content</span> Percentage of guanine and cytosine in DNA or RNA molecules

In molecular biology and genetics, GC-content is the percentage of nitrogenous bases in a DNA or RNA molecule that are either guanine (G) or cytosine (C). This measure indicates the proportion of G and C bases out of an implied four total bases, also including adenine and thymine in DNA and adenine and uracil in RNA.

<span class="mw-page-title-main">Sequence homology</span> Shared ancestry between DNA, RNA or protein sequences

Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a speciation event (orthologs), or a duplication event (paralogs), or else a horizontal gene transfer event (xenologs).

<span class="mw-page-title-main">Paleopolyploidy</span> State of having undergone whole genome duplication in deep evolutionary time

Paleopolyploidy is the result of genome duplications which occurred at least several million years ago (MYA). Such an event could either double the genome of a single species (autopolyploidy) or combine those of two species (allopolyploidy). Because of functional redundancy, genes are rapidly silenced or lost from the duplicated genomes. Most paleopolyploids, through evolutionary time, have lost their polyploid status through a process called diploidization, and are currently considered diploids, e.g., baker's yeast, Arabidopsis thaliana, and perhaps humans.

<span class="mw-page-title-main">Ewan Birney</span> English businessman

John Frederick William Birney is joint director of EMBL's European Bioinformatics Institute (EMBL-EBI), in Hinxton, Cambridgeshire and deputy director general of the European Molecular Biology Laboratory (EMBL). He also serves as non-executive director of Genomics England, chair of the Global Alliance for Genomics and Health (GA4GH) and honorary professor of bioinformatics at the University of Cambridge. Birney has made significant contributions to genomics, through his development of innovative bioinformatics and computational biology tools. He previously served as an associate faculty member at the Wellcome Trust Sanger Institute.

<i>k</i>-mer Substrings of length k contained in a biological sequence

In bioinformatics, k-mers are substrings of length contained within a biological sequence. Primarily used within the context of computational genomics and sequence analysis, in which k-mers are composed of nucleotides, k-mers are capitalized upon to assemble DNA sequences, improve heterologous gene expression, identify species in metagenomic samples, and create attenuated vaccines. Usually, the term k-mer refers to all of a sequence's subsequences of length , such that the sequence AGAT would have four monomers, three 2-mers, two 3-mers and one 4-mer (AGAT). More generally, a sequence of length will have k-mers and total possible k-mers, where is number of possible monomers.

In genetics, an isochore is a large region of genomic DNA with a high degree of uniformity in GC content; that is, guanine (G) and cytosine (C) bases. The distribution of bases within a genome is non-random: different regions of the genome have different amounts of G-C base pairs, such that regions can be classified and identified by the proportion of G-C base pairs they contain.

<i>Strongylocentrotus purpuratus</i> Species of sea urchin

Strongylocentrotus purpuratus, the purple sea urchin, lives along the eastern edge of the Pacific Ocean extending from Ensenada, Mexico, to British Columbia, Canada. This sea urchin species is deep purple in color, and lives in lower inter-tidal and nearshore sub-tidal communities. Its eggs are orange when secreted in water. January, February, and March function as the typical active reproductive months for the species. Sexual maturity is reached around two years. It normally grows to a diameter of about 10 cm (4 inches) and may live as long as 70 years.

<span class="mw-page-title-main">OR7C1</span> Protein-coding gene in the species Homo sapiens

Olfactory receptor 7C1 is a protein that in humans is encoded by the OR7C1 gene.

<span class="mw-page-title-main">OR12D3</span> Protein-coding gene in the species Homo sapiens

Olfactory receptor 12D3 is a protein that in humans is encoded by the OR12D3 gene.

<span class="mw-page-title-main">CAPN5</span> Protein-coding gene in the species Homo sapiens

Calpain-5 is a protein that in humans is encoded by the CAPN5 gene.

<span class="mw-page-title-main">Pan-genome</span> All genes of all strains in a clade

In the fields of molecular biology and genetics, a pan-genome is the entire set of genes from all strains within a clade. More generally, it is the union of all the genomes of a clade. The pan-genome can be broken down into a "core pangenome" that contains genes present in all individuals, a "shell pangenome" that contains genes present in two or more strains, and a "cloud pangenome" that contains genes only found in a single strain. Some authors also refer to the cloud genome as "accessory genome" containing 'dispensable' genes present in a subset of the strains and strain-specific genes. Note that the use of the term 'dispensable' has been questioned, at least in plant genomes, as accessory genes play "an important role in genome evolution and in the complex interplay between the genome and the environment". The field of study of pangenomes is called pangenomics.

<span class="mw-page-title-main">Genome evolution</span> Process by which a genome changes in structure or size over time

Genome evolution is the process by which a genome changes in structure (sequence) or size over time. The study of genome evolution involves multiple fields such as structural analysis of the genome, the study of genomic parasites, gene and ancient genome duplications, polyploidy, and comparative genomics. Genome evolution is a constantly changing and evolving field due to the steadily growing number of sequenced genomes, both prokaryotic and eukaryotic, available to the scientific community and the public at large.

hCONDELs refer to regions of deletions within the human genome containing sequences that are highly conserved among closely related relatives. Almost all of these deletions fall within regions that perform non-coding functions. These represent a new class of regulatory sequences and may have played an important role in the development of specific traits and behavior that distinguish closely related organisms from each other.

Essential genes are indispensable genes for organisms to grow and reproduce offspring under certain environment. However, being essential is highly dependent on the circumstances in which an organism lives. For instance, a gene required to digest starch is only essential if starch is the only source of energy. Recently, systematic attempts have been made to identify those genes that are absolutely required to maintain life, provided that all nutrients are available. Such experiments have led to the conclusion that the absolutely required number of genes for bacteria is on the order of about 250–300. Essential genes of single-celled organisms encode proteins for three basic functions including genetic information processing, cell envelopes and energy production. Those gene functions are used to maintain a central metabolism, replicate DNA, translate genes into proteins, maintain a basic cellular structure, and mediate transport processes into and out of the cell. Compared with single-celled organisms, multicellular organisms have more essential genes related to communication and development. Most of the essential genes in viruses are related to the processing and maintenance of genetic information. In contrast to most single-celled organisms, viruses lack many essential genes for metabolism, which forces them to hijack the host's metabolism. Most genes are not essential but convey selective advantages and increased fitness. Hence, the vast majority of genes are not essential and many can be deleted without consequences, at least under most circumstances.

References

  1. 1 2 3 Elhaik, Eran; Graur, Dan; Josić, Krešimir; Landan, Giddy (2010). "Identifying compositionally homogeneous and nonhomogeneous domains within the human genome using a novel segmentation algorithm". Nucleic Acids Research. 38 (15): e158. doi:10.1093/nar/gkq532. PMC   2926622 . PMID   20571085.
  2. Bernardi, G; Olofsson, B; Filipski, J; Zerial, M; Salinas, J; Cuny, G; Meunier-Rotival, M; Rodier, F (1985). "The mosaic genome of warm-blooded vertebrates". Science. 228 (4702): 953–8. Bibcode:1985Sci...228..953B. doi:10.1126/science.4001930. PMID   4001930.
  3. Bernardi, Giorgio (2001). "Misunderstandings about isochores. Part 1". Gene. 276 (1–2): 3–13. doi:10.1016/S0378-1119(01)00644-8. PMID   11591466.
  4. Elhaik, E.; Landan, G.; Graur, D. (2009). "Can GC Content at Third-Codon Positions Be Used as a Proxy for Isochore Composition?". Molecular Biology and Evolution. 26 (8): 1829–33. doi: 10.1093/molbev/msp100 . PMID   19443854.
  5. Tatarinova, Tatiana V; Alexandrov, Nickolai N; Bouck, John B; Feldmann, Kenneth A (2010). "GC3 biology in corn, rice, sorghum and other grasses". BMC Genomics. 11: 308. doi: 10.1186/1471-2164-11-308 . PMC   2895627 . PMID   20470436.
  6. Bernardi, Giorgio (2000). "The compositional evolution of vertebrate genomes". Gene. 259 (1–2): 31–43. doi:10.1016/S0378-1119(00)00441-8. PMID   11163959.
  7. Lander, Eric S.; Linton, Lauren M.; Birren, Bruce; Nusbaum, Chad; Zody, Michael C.; Baldwin, Jennifer; Devon, Keri; Dewar, Ken; et al. (2001). "Initial sequencing and analysis of the human genome" (PDF). Nature. 409 (6822): 860–921. Bibcode:2001Natur.409..860L. doi: 10.1038/35057062 . PMID   11237011.
  8. Belle, Elise M. S.; Duret, Laurent; Galtier, Nicolas; Eyre-Walker, Adam (2004). "The Decline of Isochores in Mammals: An Assessment of the GC ContentVariation Along the Mammalian Phylogeny". Journal of Molecular Evolution. 58 (6): 653–60. Bibcode:2004JMolE..58..653B. CiteSeerX   10.1.1.333.2159 . doi:10.1007/s00239-004-2587-x. PMID   15461422. S2CID   18281444.
  9. Cohen, N.; Dagan, T; Stone, L; Graur, D (2005). "GC Composition of the Human Genome: In Search of Isochores". Molecular Biology and Evolution. 22 (5): 1260–72. doi: 10.1093/molbev/msi115 . PMID   15728737.
  10. Bernardi, Giorgio (2000). "Isochores and the evolutionary genomics of vertebrates". Gene. 241 (1): 3–17. doi:10.1016/S0378-1119(99)00485-0. PMID   10607893.
  11. Hamada, Kazuo; Horiike, Tokumasa; Ota, Hidetoshi; Mizuno, Keiko; Shinozawa, Takao (2003). "Presence of isochore structures in reptile genomes suggested by the relationship between GC contents of intron regions and those of coding regions". Genes & Genetic Systems. 78 (2): 195–8. doi: 10.1266/ggs.78.195 . PMID   12773820.
  12. Chojnowski, J. L.; Braun, E. L. (2008). "Turtle isochore structure is intermediate between amphibians and other amniotes". Integrative and Comparative Biology. 48 (4): 454–62. doi: 10.1093/icb/icn062 . PMID   21669806.
  13. Costantini, Maria; Clay, Oliver; Federico, Concetta; Saccone, Salvatore; Auletta, Fabio; Bernardi, Giorgio (2006). "Human chromosomal bands: Nested structure, high-definition map and molecular basis". Chromosoma. 116 (1): 29–40. doi:10.1007/s00412-006-0078-0. PMID   17072634. S2CID   22571376.
  14. Elsik, C. G.; Tellam, R. L.; Worley, K. C.; Gibbs, R. A.; Muzny, D. M.; Weinstock, G. M.; Adelson, D. L.; Eichler, E. E.; et al. (2009). "The Genome Sequence of Taurine Cattle: A Window to Ruminant Biology and Evolution". Science. 324 (5926): 522–8. Bibcode:2009Sci...324..522A. doi:10.1126/science.1169588. PMC   2943200 . PMID   19390049.
  15. Weinstock, George M.; Robinson, Gene E.; Gibbs, Richard A.; Weinstock, George M.; Weinstock, George M.; Robinson, Gene E.; Worley, Kim C.; Evans, Jay D.; et al. (2006). "Insights into social insects from the genome of the honeybee Apis mellifera". Nature. 443 (7114): 931–49. Bibcode:2006Natur.443..931T. doi:10.1038/nature05260. PMC   2048586 . PMID   17073008.
  16. Sodergren, E.; Weinstock, G. M.; Davidson, E. H; Cameron, R. A.; Gibbs, R. A.; Angerer, R. C.; Angerer, L. M.; Arnone, M. I.; et al. (2006). "The Genome of the Sea Urchin Strongylocentrotus purpuratus". Science. 314 (5801): 941–52. Bibcode:2006Sci...314..941S. doi:10.1126/science.1133609. PMC   3159423 . PMID   17095691.
  17. Kirkness, Ewen F.; Haas, Brian J.; Sun, Weilin; Braig, Henk R.; Perotti, M. Alejandra; Clark, John M.; Lee, Si Hyeock; Robertson, Hugh M.; et al. (2010). "Genome sequences of the human body louse and its primary endosymbiont provide insights into the permanent parasitic lifestyle". Proceedings of the National Academy of Sciences. 107 (27): 12168–73. Bibcode:2010PNAS..10712168K. doi: 10.1073/pnas.1003379107 . PMC   2901460 . PMID   20566863.
  18. Werren, J. H.; Richards, S.; Desjardins, C. A.; Niehuis, O.; Gadau, J.; Colbourne, J. K.; Beukeboom, L. W.; Desplan, C.; et al. (2010). "Functional and Evolutionary Insights from the Genomes of Three Parasitoid Nasonia Species". Science. 327 (5963): 343–8. Bibcode:2010Sci...327..343.. doi:10.1126/science.1178028. PMC   2849982 . PMID   20075255.
  19. Richards, Stephen; Gibbs, Richard A.; Weinstock, George M.; Brown, Susan J.; Denell, Robin; Beeman, Richard W.; Gibbs, Richard; Beeman, Richard W.; et al. (2008). "The genome of the model beetle and pest Tribolium castaneum". Nature. 452 (7190): 949–55. Bibcode:2008Natur.452..949R. doi: 10.1038/nature06784 . PMID   18362917.
  20. Smith, Christopher D.; Zimin, Aleksey; Holt, Carson; Abouheif, Ehab; Benton, Richard; Cash, Elizabeth; Croset, Vincent; Currie, Cameron R.; et al. (2011). "Draft genome of the globally widespread and invasive Argentine ant (Linepithema humile)". Proceedings of the National Academy of Sciences. 108 (14): 5673–8. Bibcode:2011PNAS..108.5673S. doi: 10.1073/pnas.1008617108 . PMC   3078359 . PMID   21282631.
  21. Smith, Chris R.; Smith, Christopher D.; Robertson, Hugh M.; Helmkampf, Martin; Zimin, Aleksey; Yandell, Mark; Holt, Carson; Hu, Hao; et al. (2011). "Draft genome of the red harvester ant Pogonomyrmex barbatus". Proceedings of the National Academy of Sciences. 108 (14): 5667–72. Bibcode:2011PNAS..108.5667S. doi: 10.1073/pnas.1007901108 . PMC   3078412 . PMID   21282651.
  22. Suen, Garret; Teiling, Clotilde; Li, Lewyn; Holt, Carson; Abouheif, Ehab; Bornberg-Bauer, Erich; Bouffard, Pascal; Caldera, Eric J.; et al. (2011). Copenhaver, Gregory (ed.). "The Genome Sequence of the Leaf-Cutter Ant Atta cephalotes Reveals Insights into Its Obligate Symbiotic Lifestyle". PLOS Genetics. 7 (2): e1002007. doi: 10.1371/journal.pgen.1002007 . PMC   3037820 . PMID   21347285.