In genetics, an isochore is a large region of genomic DNA (greater than 300 kilobases) with a high degree of uniformity in GC content; that is, guanine (G) and cytosine (C) bases. The distribution of bases within a genome is non-random: different regions of the genome have different amounts of G-C base pairs, such that regions can be classified and identified by the proportion of G-C base pairs they contain.
Bernardi and colleagues first noticed the compositional non-uniformity of vertebrate genomes using thermal melting and density gradient centrifugation. [1] [2] [3] The DNA fragments extracted by the gradient centrifugation were later termed "isochores", [4] which was subsequently defined as "very long (much greater than 200 KB) DNA segments" that "are fairly homogeneous in base composition and belong to a small number of major classes distinguished by differences in guanine-cytosine (GC) content". [3] Subsequently, the isochores "grew" and were claimed to be ">300 kb in size." [5] [6] The theory proposed that the isochore composition of genomes varies markedly between "warm-blooded" (homeotherm) vertebrates and "cold-blooded" (poikilotherm) vertebrates [3] and later became known as the isochore theory.
The isochore theory purported that the genome of "warm-blooded" vertebrates (mammals and birds) are mosaics of long isochoric regions of alternating GC-poor and GC-rich composition, as opposed to the genome of "cold-blooded" vertebrates (fishes and amphibians) that were supposed to lack GC-rich isochores. [3] [7] [8] [9] [10] [11] These findings were explained by the thermodynamic stability hypothesis, attributing genomic structure to body temperature. GC-rich isochores were purported to be a form of adaptation to environmental pressures, as an increase in genomic GC-content could protect DNA, RNA, and proteins from degradation by heat. [3] [4] Despite its attractive simplicity, the thermodynamic stability hypothesis has been repeatedly shown to be in error [12] [13] [14] . [15] [16] [17] [18] [19] Many authors showed the absence of a relationship between temperature and GC-content in vertebrates, [17] [18] while others showed the existence of GC-rich domains in "cold-blooded" vertebrates such as crocodiles, amphibians, and fish. [14] [20] [21] [22]
The isochore theory was the first to identify the nonuniformity of nucleotide composition within vertebrate genomes and predict that the genome of "warm-blooded" vertebrates such as mammals and birds are mosaic of isochores (Bernardi et al. 1985). The human genome, for example, was described as a mosaic of alternating low and high GC content isochores belonging to five compositional families, L1, L2, H1, H2, and H3, whose corresponding ranges of GC contents were said to be <38%, 38%-42%, 42%-47%, 47%-52%, and >52%, respectively. [23]
The main predictions of the isochore theory are that:
Two opposite explanations that endeavored to explain the formations of isochores were vigorously debated as part of the neutralist-selectionist controversy. The first view was that isochores reflect variable mutation processes among genomic regions consistent with the neutral model. [26] [27] Alternatively, isochores were posited as a result of natural selection for certain compositional environment required by certain genes. [28] Several hypotheses derive from the selectionist view, such as the thermodynamic stability hypothesis [6] [29] and the biased gene conversion hypothesis. [27] Thus far, none of the theories provides a comprehensive explanation to the genome structure, and the topic is still under debate.
The isochore theory became one of the most useful theories in molecular evolution for many years. It was the first and most comprehensive attempt to explain the long-range compositional heterogeneity of vertebrate genomes within an evolutionary framework. Despite the interest in the early years in the isochore model, in recent years, the theory’s methodology, terminology, and predictions have been challenged.
Because this theory was proposed in the 20th century before complete genomes were sequenced, it could not be fully tested for nearly 30 years. In the beginning of the 21st century, when the first genomes were made available it was clear that isochores do not exist in the human genome [30] nor in other mammalian genomes. [31] When failed to find isochores, many attacked the very existence of isochores. [30] [32] [33] [34] [35] The most important predictor of isochores, GC3 was shown to have no predictable power [36] [37] to the GC content of nearby genomic regions, refuting findings from over 30 years of research, which were the basis for many isochore studies. Isochore-originators replied that the term was misinterpreted [23] [38] [39] as isochores are not "homogeneous" but rather fairly homogeneous regions with a heterogeneous nature (especially) of GC-rich regions at the 5 kb scale, [40] which only added to the already growing confusion. The reason for this ongoing frustration was the ambiguous definition of isochores as long and homogeneous, allowed some researchers to discover "isochores" and others to dismiss them, although both camps used the same data.
The unfortunate side effect of this controversy was an "arms race" in which isochores are frequently redefined and relabeled following conflicting findings that failed to reveal "mosaic of isochores." [23] [32] [34] The unfortunate outcomes of this controversy and the following terminological-methodological mud were the loss of interest in isochores by the scientific community. When the most important core-concept in isochoric literature, the thermodynamic stability hypothesis, was rejected, the theory lost its appeal. Even today, there is no clear definition to isochores nor is there an algorithm that detects isochores. [41] Isochores are detected manually by visual inspection of GC content curves , [42] however because this approach lacks scientific merit and is difficult to replicate by independent groups, the findings remain disputed.
As the study of isochores was de facto abandoned by most scientists, an alternative theory was proposed to describe the compositional organization of genomes in accordance with the most recent genomic studies. The Compositional Domain Model depicts genomes as a medley of short and long homogeneous and nonhomogeneous domains. [35] The theory defines "compositional domains" as genomic regions with distinct GC-contents as determined by a computational segmentation algorithm. [35] The homogeneity of compositional domains is compared to that of the chromosome on which they reside using the F-test, which separated them into compositionally homogeneous domains and compositionally nonhomogeneous domains based on the outcome of test. Compositionally homogeneous domains that are sufficiently long (≥ 300 kb) are termed isochores or isochoric domains. These terms are in accordance with the literature as they provide clear distinction between isochoric- and nonisochoric-domains.
A comprehensive study of the human genome unraveled a genomic organization where two-thirds of the genome is a mixture of many short compositionally homogeneous domains and relatively few long ones. The remaining portion of the genome is composed of nonhomogeneous domains. In terms of coverage, only 1% of the total number of compositionally homogeneous domains could be considered "isochores" which covered less than 20% of the genome. [35]
Since its inception the theory received wide attention and was extensively used to explain findings emerging from over dozen new genome sequencing studies. [31] [43] [44] [45] [46] [47] [48] [49] [50] However, many important questions remain open, such as which evolutionary forces shaped the structure of compositional domains and the ways they differ between different species.
Molecular evolution describes how inherited DNA and/or RNA change over evolutionary time, and the consequences of this for proteins and other components of cells and organisms. Molecular evolution is the basis of phylogenetic approaches to describing the tree of life. Molecular evolution overlaps with population genetics, especially on shorter timescales. Topics in molecular evolution include the origins of new genes, the genetic nature of complex traits, the genetic basis of adaptation and speciation, the evolution of development, and patterns and processes underlying genomic changes during evolution.
The coding region of a gene, also known as the coding sequence (CDS), is the portion of a gene's DNA or RNA that codes for a protein. Studying the length, composition, regulation, splicing, structures, and functions of coding regions compared to non-coding regions over different species and time periods can provide a significant amount of important information regarding gene organization and evolution of prokaryotes and eukaryotes. This can further assist in mapping the human genome and developing gene therapy.
The Rickettsiales, informally called rickettsias, are an order of small Alphaproteobacteria. They are obligate intracellular parasites, and some are notable pathogens, including Rickettsia, which causes a variety of diseases in humans, and Ehrlichia, which causes diseases in livestock. Another genus of well-known Rickettsiales is the Wolbachia, which infect about two-thirds of all arthropods and nearly all filarial nematodes. Genetic studies support the endosymbiotic theory according to which mitochondria and related organelles developed from members of this group.
In molecular biology and genetics, GC-content is the percentage of nitrogenous bases in a DNA or RNA molecule that are either guanine (G) or cytosine (C). This measure indicates the proportion of G and C bases out of an implied four total bases, also including adenine and thymine in DNA and adenine and uracil in RNA.
Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a speciation event (orthologs), or a duplication event (paralogs), or else a horizontal gene transfer event (xenologs).
Gene conversion is the process by which one DNA sequence replaces a homologous sequence such that the sequences become identical after the conversion. Gene conversion can be either allelic, meaning that one allele of the same gene replaces another allele, or ectopic, meaning that one paralogous DNA sequence converts another.
The ParaHox gene cluster is an array of homeobox genes from the Gsx, Xlox (Pdx) and Cdx gene families.
Genome size is the total amount of DNA contained within one copy of a single complete genome. It is typically measured in terms of mass in picograms or less frequently in daltons, or as the total number of nucleotide base pairs, usually in megabases. One picogram is equal to 978 megabases. In diploid organisms, genome size is often used interchangeably with the term C-value.
Alphaproteobacteria is a class of bacteria in the phylum Pseudomonadota. The Magnetococcales and Mariprofundales are considered basal or sister to the Alphaproteobacteria. The Alphaproteobacteria are highly diverse and possess few commonalities, but nevertheless share a common ancestor. Like all Proteobacteria, its members are gram-negative, although some of its intracellular parasitic members lack peptidoglycan and are consequently gram variable.
In bioinformatics, k-mers are substrings of length contained within a biological sequence. Primarily used within the context of computational genomics and sequence analysis, in which k-mers are composed of nucleotides, k-mers are capitalized upon to assemble DNA sequences, improve heterologous gene expression, identify species in metagenomic samples, and create attenuated vaccines. Usually, the term k-mer refers to all of a sequence's subsequences of length , such that the sequence AGAT would have four monomers, three 2-mers, two 3-mers and one 4-mer (AGAT). More generally, a sequence of length will have k-mers and total possible k-mers, where is number of possible monomers.
The 2R hypothesis or Ohno's hypothesis, first proposed by Susumu Ohno in 1970, is a hypothesis that the genomes of the early vertebrate lineage underwent two whole genome duplications, and thus modern vertebrate genomes reflect paleopolyploidy. The name derives from the 2 rounds of duplication originally hypothesized by Ohno, but refined in a 1994 version, and the term 2R hypothesis was probably coined in 1999. Variations in the number and timings of genome duplications typically still are referred to as examples of the 2R hypothesis.
Takashi Gojobori is a Japanese molecular biologist, Vice-Director of the National Institute of Genetics (NIG) and the DNA Data Bank of Japan (DDBJ) at NIG, in Mishima, Japan. Gojobori is a Distinguished Professor at King Abdullah University of Science and Technology (KAUST) in Thuwal, Saudi Arabia. He is a Professor of Bioscience and Acting Director at the Computational Bioscience Research Center at KAUST.
Aequornithes, or core water birds, are defined in the PhyloCode as "the least inclusive crown clade containing Pelecanus onocrotalus and Gavia immer".
In the fields of molecular biology and genetics, a pan-genome is the entire set of genes from all strains within a clade. More generally, it is the union of all the genomes of a clade. The pan-genome can be broken down into a "core pangenome" that contains genes present in all individuals, a "shell pangenome" that contains genes present in two or more strains, and a "cloud pangenome" that contains genes only found in a single strain. Some authors also refer to the cloud genome as "accessory genome" containing 'dispensable' genes present in a subset of the strains and strain-specific genes. Note that the use of the term 'dispensable' has been questioned, at least in plant genomes, as accessory genes play "an important role in genome evolution and in the complex interplay between the genome and the environment". The field of study of pangenomes is called pangenomics.
Genome evolution is the process by which a genome changes in structure (sequence) or size over time. The study of genome evolution involves multiple fields such as structural analysis of the genome, the study of genomic parasites, gene and ancient genome duplications, polyploidy, and comparative genomics. Genome evolution is a constantly changing and evolving field due to the steadily growing number of sequenced genomes, both prokaryotic and eukaryotic, available to the scientific community and the public at large.
A compositional domain in genetics is a region of DNA with a distinct guanine (G) and cytosine (C) G-C and C-G content. The homogeneity of compositional domains is compared to that of the chromosome on which they reside. As such, compositional domains can be homogeneous or nonhomogeneous domains. Compositionally homogeneous domains that are sufficiently long are termed isochores or isochoric domains.
Horizontal or lateral gene transfer is the transmission of portions of genomic DNA between organisms through a process decoupled from vertical inheritance. In the presence of HGT events, different fragments of the genome are the result of different evolutionary histories. This can therefore complicate investigations of the evolutionary relatedness of lineages and species. Also, as HGT can bring into genomes radically different genotypes from distant lineages, or even new genes bearing new functions, it is a major source of phenotypic innovation and a mechanism of niche adaptation. For example, of particular relevance to human health is the lateral transfer of antibiotic resistance and pathogenicity determinants, leading to the emergence of pathogenic lineages.
De novo gene birth is the process by which new genes evolve from non-coding DNA. De novo genes represent a subset of novel genes, and may be protein-coding or instead act as RNA genes. The processes that govern de novo gene birth are not well understood, although several models exist that describe possible mechanisms by which de novo gene birth may occur.
Mutation bias is a pattern in which some type of mutation occurs more often than expected under uniformity. The types are most often defined by the molecular nature of the mutational change, but sometimes they are based on downstream effects, e.g., Ostrow, et al.
{{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: DOI inactive as of October 2024 (link) CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: DOI inactive as of October 2024 (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: DOI inactive as of October 2024 (link) CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: DOI inactive as of October 2024 (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: DOI inactive as of October 2024 (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: |first7=
has generic name (help)