Compositional domain

Last updated
Example of a hypothetical genomic sequence composed of 9 compositionally homogeneous domains used to demonstrate the model. The segmentation algorithm partitioned the sequence and correctly identified 4 domains as compositionally homogeneous domains and 2 compositionally nonhomogeneous domains. CompositionalDomainsInGenome.jpg
Example of a hypothetical genomic sequence composed of 9 compositionally homogeneous domains used to demonstrate the model. The segmentation algorithm partitioned the sequence and correctly identified 4 domains as compositionally homogeneous domains and 2 compositionally nonhomogeneous domains.

A compositional domain in genetics is a region of DNA with a distinct guanine (G) and cytosine (C) G-C and C-G content (collectively GC content). [1] The homogeneity of compositional domains is compared to that of the chromosome on which they reside. As such, compositional domains can be homogeneous or nonhomogeneous domains. Compositionally homogeneous domains that are sufficiently long (= 300 kb) are termed isochores or isochoric domains.

The compositional domain model was proposed as an alternative to the isochoric model. The isochore model was proposed by Bernardi and colleagues to explain the observed non-uniformity of genomic fragments in the genome. [2] However, recent sequencing of complete genomic data refuted the isochoric model. Its main predictions were:

The compositional domain model describes the genome as a mosaic of short and long homogeneous and nonhomogeneous domains. The composition and organization of the domains were shaped by different evolutionary processes that either fused or broke down the domains. This genomic organization model was confirmed in many new genomic studies of cow, [14] honeybee, [15] sea urchin, [16] body louse, [17] Nasonia , [18] beetle, [19] and ant genomes. [20] [21] [22] The human genome was described as consisting of a mixture of compositionally nonhomogeneous domains with numerous short compositionally homogeneous domains and relatively few long ones. [1]

Related Research Articles

<span class="mw-page-title-main">Genome</span> All genetic material of an organism

In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA. The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as regulatory sequences, and often a substantial fraction of junk DNA with no evident function. Almost all eukaryotes have mitochondria and a small mitochondrial genome. Algae and plants also contain chloroplasts with a chloroplast genome.

<span class="mw-page-title-main">Human genome</span> Complete set of nucleic acid sequences for humans

The human genome is a complete set of nucleic acid sequences for humans, encoded as the DNA within each of the 24 distinct chromosomes in the cell nucleus. A small DNA molecule is found within individual mitochondria. These are usually treated separately as the nuclear genome and the mitochondrial genome. Human genomes include both protein-coding DNA sequences and various types of DNA that does not encode proteins. The latter is a diverse category that includes DNA coding for non-translated RNA, such as that for ribosomal RNA, transfer RNA, ribozymes, small nuclear RNAs, and several types of regulatory RNAs. It also includes promoters and their associated gene-regulatory elements, DNA playing structural and replicatory roles, such as scaffolding regions, telomeres, centromeres, and origins of replication, plus large numbers of transposable elements, inserted viral DNA, non-functional pseudogenes and simple, highly repetitive sequences. Introns make up a large percentage of non-coding DNA. Some of this non-coding DNA is non-functional junk DNA, such as pseudogenes, but there is no firm consensus on the total amount of junk DNA.

<span class="mw-page-title-main">Mitochondrial DNA</span> DNA located in mitochondria

Mitochondrial DNA is the DNA located in the mitochondria organelles in a eukaryotic cell that converts chemical energy from food into adenosine triphosphate (ATP). Mitochondrial DNA is a small portion of the DNA contained in a eukaryotic cell; most of the DNA is in the cell nucleus, and, in plants and algae, the DNA also is found in plastids, such as chloroplasts.

Molecular evolution describes how inherited DNA and/or RNA change over evolutionary time, and the consequences of this for proteins and other components of cells and organisms. Molecular evolution is the basis of phylogenetic approaches to describing the tree of life. Molecular evolution overlaps with population genetics, especially on shorter timescales. Topics in molecular evolution include the origins of new genes, the genetic nature of complex traits, the genetic basis of adaptation and speciation, the evolution of development, and patterns and processes underlying genomic changes during evolution.

<span class="mw-page-title-main">CpG site</span> Region of often-methylated DNA with a cytosine followed by a guanine

The CpG sites or CG sites are regions of DNA where a cytosine nucleotide is followed by a guanine nucleotide in the linear sequence of bases along its 5' → 3' direction. CpG sites occur with high frequency in genomic regions called CpG islands.

<span class="mw-page-title-main">Comparative genomics</span> Field of biological research

Comparative genomics is a branch of biological research that examines genome sequences across a spectrum of species, spanning from humans and mice to a diverse array of organisms from bacteria to chimpanzees. This large-scale holistic approach compares two or more genomes to discover the similarities and differences between the genomes and to study the biology of the individual genomes. Comparison of whole genome sequences provides a highly detailed view of how organisms are related to each other at the gene level. By comparing whole genome sequences, researchers gain insights into genetic relationships between organisms and study evolutionary changes. The major principle of comparative genomics is that common features of two organisms will often be encoded within the DNA that is evolutionarily conserved between them. Therefore, Comparative genomics provides a powerful tool for studying evolutionary changes among organisms, helping to identify genes that are conserved or common among species, as well as genes that give unique characteristics of each organism. Moreover, these studies can be performed at different levels of the genomes to obtain multiple perspectives about the organisms.

<span class="mw-page-title-main">Functional genomics</span> Field of molecular biology

Functional genomics is a field of molecular biology that attempts to describe gene functions and interactions. Functional genomics make use of the vast data generated by genomic and transcriptomic projects. Functional genomics focuses on the dynamic aspects such as gene transcription, translation, regulation of gene expression and protein–protein interactions, as opposed to the static aspects of the genomic information such as DNA sequence or structures. A key characteristic of functional genomics studies is their genome-wide approach to these questions, generally involving high-throughput methods rather than a more traditional "candidate-gene" approach.

<span class="mw-page-title-main">GC-content</span> Percentage of guanine and cytosine in DNA or RNA molecules

In molecular biology and genetics, GC-content is the percentage of nitrogenous bases in a DNA or RNA molecule that are either guanine (G) or cytosine (C). This measure indicates the proportion of G and C bases out of an implied four total bases, also including adenine and thymine in DNA and adenine and uracil in RNA.

<span class="mw-page-title-main">Sequence homology</span> Shared ancestry between DNA, RNA or protein sequences

Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a speciation event (orthologs), or a duplication event (paralogs), or else a horizontal gene transfer event (xenologs).

Gene conversion is the process by which one DNA sequence replaces a homologous sequence such that the sequences become identical after the conversion. Gene conversion can be either allelic, meaning that one allele of the same gene replaces another allele, or ectopic, meaning that one paralogous DNA sequence converts another.

<span class="mw-page-title-main">Ewan Birney</span> English businessman

John Frederick William Birney is joint director of EMBL's European Bioinformatics Institute (EMBL-EBI), in Hinxton, Cambridgeshire and deputy director general of the European Molecular Biology Laboratory (EMBL). He also serves as non-executive director of Genomics England, chair of the Global Alliance for Genomics and Health (GA4GH) and honorary professor of bioinformatics at the University of Cambridge. Birney has made significant contributions to genomics, through his development of innovative bioinformatics and computational biology tools. He previously served as an associate faculty member at the Wellcome Trust Sanger Institute.

<i>k</i>-mer Substrings of length k contained in a biological sequence

In bioinformatics, k-mers are substrings of length contained within a biological sequence. Primarily used within the context of computational genomics and sequence analysis, in which k-mers are composed of nucleotides, k-mers are capitalized upon to assemble DNA sequences, improve heterologous gene expression, identify species in metagenomic samples, and create attenuated vaccines. Usually, the term k-mer refers to all of a sequence's subsequences of length , such that the sequence AGAT would have four monomers, three 2-mers, two 3-mers and one 4-mer (AGAT). More generally, a sequence of length will have k-mers and total possible k-mers, where is number of possible monomers.

In genetics, an isochore is a large region of genomic DNA with a high degree of uniformity in GC content; that is, guanine (G) and cytosine (C) bases. The distribution of bases within a genome is non-random: different regions of the genome have different amounts of G-C base pairs, such that regions can be classified and identified by the proportion of G-C base pairs they contain.

<i>Strongylocentrotus purpuratus</i> Species of sea urchin

Strongylocentrotus purpuratus is a species of sea urchin in the family Strongylocentrotidae commonly known as the purple sea urchin. It lives along the eastern edge of the Pacific Ocean extending from Ensenada, Mexico, to British Columbia, Canada. This sea urchin species is deep purple in color, and lives in lower inter-tidal and nearshore sub-tidal communities. Its eggs are orange when secreted in water. January, February, and March function as the typical active reproductive months for the species. Sexual maturity is reached around two years. It normally grows to a diameter of about 10 cm (4 inches) and may live as long as 70 years.

<span class="mw-page-title-main">OR12D3</span> Protein-coding gene in the species Homo sapiens

Olfactory receptor 12D3 is a protein that in humans is encoded by the OR12D3 gene.

<span class="mw-page-title-main">Pan-genome</span> All genes of all strains in a clade

In the fields of molecular biology and genetics, a pan-genome is the entire set of genes from all strains within a clade. More generally, it is the union of all the genomes of a clade. The pan-genome can be broken down into a "core pangenome" that contains genes present in all individuals, a "shell pangenome" that contains genes present in two or more strains, and a "cloud pangenome" that contains genes only found in a single strain. Some authors also refer to the cloud genome as "accessory genome" containing 'dispensable' genes present in a subset of the strains and strain-specific genes. Note that the use of the term 'dispensable' has been questioned, at least in plant genomes, as accessory genes play "an important role in genome evolution and in the complex interplay between the genome and the environment". The field of study of pangenomes is called pangenomics.

<span class="mw-page-title-main">Genome evolution</span> Process by which a genome changes in structure or size over time

Genome evolution is the process by which a genome changes in structure (sequence) or size over time. The study of genome evolution involves multiple fields such as structural analysis of the genome, the study of genomic parasites, gene and ancient genome duplications, polyploidy, and comparative genomics. Genome evolution is a constantly changing and evolving field due to the steadily growing number of sequenced genomes, both prokaryotic and eukaryotic, available to the scientific community and the public at large.

hCONDELs refer to regions of deletions within the human genome containing sequences that are highly conserved among closely related relatives. Almost all of these deletions fall within regions that perform non-coding functions. These represent a new class of regulatory sequences and may have played an important role in the development of specific traits and behavior that distinguish closely related organisms from each other.

Single-cell sequencing examines the nucleic acid sequence information from individual cells with optimized next-generation sequencing technologies, providing a higher resolution of cellular differences and a better understanding of the function of an individual cell in the context of its microenvironment. For example, in cancer, sequencing the DNA of individual cells can give information about mutations carried by small populations of cells. In development, sequencing the RNAs expressed by individual cells can give insight into the existence and behavior of different cell types. In microbial systems, a population of the same species can appear genetically clonal. Still, single-cell sequencing of RNA or epigenetic modifications can reveal cell-to-cell variability that may help populations rapidly adapt to survive in changing environments.

References

  1. 1 2 3 Elhaik, Eran; Graur, Dan; Josić, Krešimir; Landan, Giddy (2010). "Identifying compositionally homogeneous and nonhomogeneous domains within the human genome using a novel segmentation algorithm". Nucleic Acids Research. 38 (15): e158. doi:10.1093/nar/gkq532. PMC   2926622 . PMID   20571085.
  2. Bernardi, G; Olofsson, B; Filipski, J; Zerial, M; Salinas, J; Cuny, G; Meunier-Rotival, M; Rodier, F (1985). "The mosaic genome of warm-blooded vertebrates". Science. 228 (4702): 953–8. Bibcode:1985Sci...228..953B. doi:10.1126/science.4001930. PMID   4001930.
  3. Bernardi, Giorgio (2001). "Misunderstandings about isochores. Part 1". Gene. 276 (1–2): 3–13. doi:10.1016/S0378-1119(01)00644-8. PMID   11591466.
  4. Elhaik, E.; Landan, G.; Graur, D. (2009). "Can GC Content at Third-Codon Positions Be Used as a Proxy for Isochore Composition?". Molecular Biology and Evolution. 26 (8): 1829–33. doi: 10.1093/molbev/msp100 . PMID   19443854.
  5. Tatarinova, Tatiana V; Alexandrov, Nickolai N; Bouck, John B; Feldmann, Kenneth A (2010). "GC3 biology in corn, rice, sorghum and other grasses". BMC Genomics. 11: 308. doi: 10.1186/1471-2164-11-308 . PMC   2895627 . PMID   20470436.
  6. Bernardi, Giorgio (2000). "The compositional evolution of vertebrate genomes". Gene. 259 (1–2): 31–43. doi:10.1016/S0378-1119(00)00441-8. PMID   11163959.
  7. Lander, Eric S.; Linton, Lauren M.; Birren, Bruce; Nusbaum, Chad; Zody, Michael C.; Baldwin, Jennifer; Devon, Keri; Dewar, Ken; et al. (2001). "Initial sequencing and analysis of the human genome" (PDF). Nature. 409 (6822): 860–921. Bibcode:2001Natur.409..860L. doi: 10.1038/35057062 . PMID   11237011.
  8. Belle, Elise M. S.; Duret, Laurent; Galtier, Nicolas; Eyre-Walker, Adam (2004). "The Decline of Isochores in Mammals: An Assessment of the GC ContentVariation Along the Mammalian Phylogeny". Journal of Molecular Evolution. 58 (6): 653–60. Bibcode:2004JMolE..58..653B. CiteSeerX   10.1.1.333.2159 . doi:10.1007/s00239-004-2587-x (inactive 2024-10-31). PMID   15461422. S2CID   18281444.{{cite journal}}: CS1 maint: DOI inactive as of October 2024 (link)
  9. Cohen, N.; Dagan, T; Stone, L; Graur, D (2005). "GC Composition of the Human Genome: In Search of Isochores". Molecular Biology and Evolution. 22 (5): 1260–72. doi: 10.1093/molbev/msi115 . PMID   15728737.
  10. Bernardi, Giorgio (2000). "Isochores and the evolutionary genomics of vertebrates". Gene. 241 (1): 3–17. doi:10.1016/S0378-1119(99)00485-0. PMID   10607893.
  11. Hamada, Kazuo; Horiike, Tokumasa; Ota, Hidetoshi; Mizuno, Keiko; Shinozawa, Takao (2003). "Presence of isochore structures in reptile genomes suggested by the relationship between GC contents of intron regions and those of coding regions". Genes & Genetic Systems. 78 (2): 195–8. doi: 10.1266/ggs.78.195 . PMID   12773820.
  12. Chojnowski, J. L.; Braun, E. L. (2008). "Turtle isochore structure is intermediate between amphibians and other amniotes". Integrative and Comparative Biology. 48 (4): 454–62. doi: 10.1093/icb/icn062 . PMID   21669806.
  13. Costantini, Maria; Clay, Oliver; Federico, Concetta; Saccone, Salvatore; Auletta, Fabio; Bernardi, Giorgio (2006). "Human chromosomal bands: Nested structure, high-definition map and molecular basis". Chromosoma. 116 (1): 29–40. doi:10.1007/s00412-006-0078-0 (inactive 2024-10-31). PMID   17072634. S2CID   22571376.{{cite journal}}: CS1 maint: DOI inactive as of October 2024 (link)
  14. Elsik, C. G.; Tellam, R. L.; Worley, K. C.; Gibbs, R. A.; Muzny, D. M.; Weinstock, G. M.; Adelson, D. L.; Eichler, E. E.; et al. (2009). "The Genome Sequence of Taurine Cattle: A Window to Ruminant Biology and Evolution". Science. 324 (5926): 522–8. Bibcode:2009Sci...324..522A. doi:10.1126/science.1169588. PMC   2943200 . PMID   19390049.
  15. Weinstock, George M.; Robinson, Gene E.; Gibbs, Richard A.; Weinstock, George M.; Weinstock, George M.; Robinson, Gene E.; Worley, Kim C.; Evans, Jay D.; et al. (2006). "Insights into social insects from the genome of the honeybee Apis mellifera". Nature. 443 (7114): 931–49. Bibcode:2006Natur.443..931T. doi:10.1038/nature05260. PMC   2048586 . PMID   17073008.
  16. Sodergren, E.; Weinstock, G. M.; Davidson, E. H; Cameron, R. A.; Gibbs, R. A.; Angerer, R. C.; Angerer, L. M.; Arnone, M. I.; et al. (2006). "The Genome of the Sea Urchin Strongylocentrotus purpuratus". Science. 314 (5801): 941–52. Bibcode:2006Sci...314..941S. doi:10.1126/science.1133609. PMC   3159423 . PMID   17095691.
  17. Kirkness, Ewen F.; Haas, Brian J.; Sun, Weilin; Braig, Henk R.; Perotti, M. Alejandra; Clark, John M.; Lee, Si Hyeock; Robertson, Hugh M.; et al. (2010). "Genome sequences of the human body louse and its primary endosymbiont provide insights into the permanent parasitic lifestyle". Proceedings of the National Academy of Sciences. 107 (27): 12168–73. Bibcode:2010PNAS..10712168K. doi: 10.1073/pnas.1003379107 . PMC   2901460 . PMID   20566863.
  18. Werren, J. H.; Richards, S.; Desjardins, C. A.; Niehuis, O.; Gadau, J.; Colbourne, J. K.; Beukeboom, L. W.; Desplan, C.; et al. (2010). "Functional and Evolutionary Insights from the Genomes of Three Parasitoid Nasonia Species". Science. 327 (5963): 343–8. Bibcode:2010Sci...327..343.. doi:10.1126/science.1178028. PMC   2849982 . PMID   20075255.
  19. Richards, Stephen; Gibbs, Richard A.; Weinstock, George M.; Brown, Susan J.; Denell, Robin; Beeman, Richard W.; Gibbs, Richard; Beeman, Richard W.; et al. (2008). "The genome of the model beetle and pest Tribolium castaneum". Nature. 452 (7190): 949–55. Bibcode:2008Natur.452..949R. doi: 10.1038/nature06784 . PMID   18362917.
  20. Smith, Christopher D.; Zimin, Aleksey; Holt, Carson; Abouheif, Ehab; Benton, Richard; Cash, Elizabeth; Croset, Vincent; Currie, Cameron R.; et al. (2011). "Draft genome of the globally widespread and invasive Argentine ant (Linepithema humile)". Proceedings of the National Academy of Sciences. 108 (14): 5673–8. Bibcode:2011PNAS..108.5673S. doi: 10.1073/pnas.1008617108 . PMC   3078359 . PMID   21282631.
  21. Smith, Chris R.; Smith, Christopher D.; Robertson, Hugh M.; Helmkampf, Martin; Zimin, Aleksey; Yandell, Mark; Holt, Carson; Hu, Hao; et al. (2011). "Draft genome of the red harvester ant Pogonomyrmex barbatus". Proceedings of the National Academy of Sciences. 108 (14): 5667–72. Bibcode:2011PNAS..108.5667S. doi: 10.1073/pnas.1007901108 . PMC   3078412 . PMID   21282651.
  22. Suen, Garret; Teiling, Clotilde; Li, Lewyn; Holt, Carson; Abouheif, Ehab; Bornberg-Bauer, Erich; Bouffard, Pascal; Caldera, Eric J.; et al. (2011). Copenhaver, Gregory (ed.). "The Genome Sequence of the Leaf-Cutter Ant Atta cephalotes Reveals Insights into Its Obligate Symbiotic Lifestyle". PLOS Genetics. 7 (2): e1002007. doi: 10.1371/journal.pgen.1002007 . PMC   3037820 . PMID   21347285.