Intergenic region

Last updated

An intergenic region is a stretch of DNA sequences located between genes. [1] Intergenic regions may contain functional elements and junk DNA.

Contents

Properties and functions

Intergenic regions may contain a number of functional DNA sequences such as promoters and regulatory elements, enhancers, spacers, and (in eukaryotes) centromeres. [2] They may also contain origins of replication, scaffold attachment regions, and transposons and viruses. [2]

Non-functional DNA elements such as pseudogenes and repetitive DNA, both of which are types of junk DNA, can also be found in intergenic regions—although they may also be located within genes in introns. [2] It is possible that these regions contain as of yet unidentified functional elements, such as non-coding genes or regulatory sequences. [3] This indeed occurs occasionally, but the amount of functional DNA discovered usually constitute only a tiny fraction of the overall amount of intergenic or intronic DNA. [3]

Intergenic regions in different organisms

In humans, intergenic regions comprise about 50% of the genome, whereas this number is much less in bacteria (15%) and yeast (30%). [4]

As with most other non-coding DNA, the GC-content of intergenic regions vary considerably among species. For example in Plasmodium falciparum , many intergenic regions have an AT content of 90%. [5]

Molecular evolution of intergenic regions

Functional elements in intergenic regions will evolve slowly because their sequence is maintained by negative selection. In species with very large genomes, a large percentage of intergenic regions is probably junk DNA and it will evolve at the neutral rate of evolution. [6] [7] [ verification needed ] Junk DNA sequences are not maintained by purifying selection but gain-of-function mutations with deleterious fitness effects can occur. [8]

Phylostratigraphic inference and bioinformatics methods have shown that intergenic regions can—on geological timescales—transiently evolve into open reading frame sequences that mimic those of protein coding genes, and can therefore lead to the evolution of novel protein-coding genes in a process known as de novo gene birth. [9]

See also

Related Research Articles

<span class="mw-page-title-main">Human genome</span> Complete set of nucleic acid sequences for humans

The human genome is a complete set of nucleic acid sequences for humans, encoded as DNA within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria. These are usually treated separately as the nuclear genome and the mitochondrial genome. Human genomes include both protein-coding DNA sequences and various types of DNA that does not encode proteins. The latter is a diverse category that includes DNA coding for non-translated RNA, such as that for ribosomal RNA, transfer RNA, ribozymes, small nuclear RNAs, and several types of regulatory RNAs. It also includes promoters and their associated gene-regulatory elements, DNA playing structural and replicatory roles, such as scaffolding regions, telomeres, centromeres, and origins of replication, plus large numbers of transposable elements, inserted viral DNA, non-functional pseudogenes and simple, highly repetitive sequences. Introns make up a large percentage of non-coding DNA. Some of this non-coding DNA is non-functional junk DNA, such as pseudogenes, but there is no firm consensus on the total amount of junk DNA.

Non-coding DNA (ncDNA) sequences are components of an organism's DNA that do not encode protein sequences. Some non-coding DNA is transcribed into functional non-coding RNA molecules. Other functional regions of the non-coding DNA fraction include regulatory sequences that control gene expression; scaffold attachment regions; origins of DNA replication; centromeres; and telomeres. Some non-coding regions appear to be mostly nonfunctional, such as introns, pseudogenes, intergenic DNA, and fragments of transposons and viruses. Regions that are completely nonfunctional are called junk DNA.

Junk DNA is a DNA sequence that has no known biological function. Most organisms have some junk DNA in their genomes—mostly, pseudogenes and fragments of transposons and viruses—but it is possible that some organisms have substantial amounts of junk DNA.

Molecular evolution describes how inherited DNA and/or RNA change over evolutionary time, and the consequences of this for proteins and other components of cells and organisms. Molecular evolution is the basis of phylogenetic approaches to describing the tree of life. Molecular evolution overlaps with population genetics, especially on shorter timescales. Topics in molecular evolution include the origins of new genes, the genetic nature of complex traits, the genetic basis of adaptation and speciation, the evolution of development, and patterns and processes underlying genomic changes during evolution.

The coding region of a gene, also known as the coding sequence (CDS), is the portion of a gene's DNA or RNA that codes for a protein. Studying the length, composition, regulation, splicing, structures, and functions of coding regions compared to non-coding regions over different species and time periods can provide a significant amount of important information regarding gene organization and evolution of prokaryotes and eukaryotes. This can further assist in mapping the human genome and developing gene therapy.

<span class="mw-page-title-main">Pseudogene</span> Functionless relative of a gene

Pseudogenes are nonfunctional segments of DNA that resemble functional genes. Most arise as superfluous copies of functional genes, either directly by gene duplication or indirectly by reverse transcription of an mRNA transcript. Pseudogenes are usually identified when genome sequence analysis finds gene-like sequences that lack regulatory sequences needed for transcription or translation, or whose coding sequences are obviously defective due to frameshifts or premature stop codons. Pseudogenes are a type of junk DNA.

Repeated sequences are short or long patterns that occur in multiple copies throughout the genome. In many organisms, a significant fraction of the genomic DNA is repetitive, with over two-thirds of the sequence consisting of repetitive elements in humans. Some of these repeated sequences are necessary for maintaining important genome structures such as telomeres or centromeres.

In computational biology, gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. This includes protein-coding genes as well as RNA genes, but may also include prediction of other functional elements such as regulatory regions. Gene finding is one of the first and most important steps in understanding the genome of a species once it has been sequenced.

<span class="mw-page-title-main">Functional genomics</span> Field of molecular biology

Functional genomics is a field of molecular biology that attempts to describe gene functions and interactions. Functional genomics make use of the vast data generated by genomic and transcriptomic projects. Functional genomics focuses on the dynamic aspects such as gene transcription, translation, regulation of gene expression and protein–protein interactions, as opposed to the static aspects of the genomic information such as DNA sequence or structures. A key characteristic of functional genomics studies is their genome-wide approach to these questions, generally involving high-throughput methods rather than a more traditional "candidate-gene" approach.

<span class="mw-page-title-main">GC-content</span> Percentage of guanine and cytosine in DNA or RNA molecules

In molecular biology and genetics, GC-content is the percentage of nitrogenous bases in a DNA or RNA molecule that are either guanine (G) or cytosine (C). This measure indicates the proportion of G and C bases out of an implied four total bases, also including adenine and thymine in DNA and adenine and uracil in RNA.

The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The term transcriptome is a portmanteau of the words transcript and genome; it is associated with the process of transcript production during the biological process of transcription.

Subtelomeres are segments of DNA between telomeric caps and chromatin.

<span class="mw-page-title-main">Gene</span> Sequence of DNA or RNA that codes for an RNA or protein product

In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protein-coding genes and non-coding genes.

Exon shuffling is a molecular mechanism for the formation of new genes. It is a process through which two or more exons from different genes can be brought together ectopically, or the same exon can be duplicated, to create a new exon-intron structure. There are different mechanisms through which exon shuffling occurs: transposon mediated exon shuffling, crossover during sexual recombination of parental genomes and illegitimate recombination.

<span class="mw-page-title-main">DNA annotation</span> The process of describing the structure and function of a genome

In molecular biology and genetics, DNA annotation or genome annotation is the process of describing the structure and function of the components of a genome, by analyzing and interpreting them in order to extract their biological significance and understand the biological processes in which they participate. Among other things, it identifies the locations of genes and all the coding regions in a genome and determines what those genes do.

A conserved non-coding sequence (CNS) is a DNA sequence of noncoding DNA that is evolutionarily conserved. These sequences are of interest for their potential to regulate gene production.

<span class="mw-page-title-main">Genome evolution</span> Process by which a genome changes in structure or size over time

Genome evolution is the process by which a genome changes in structure (sequence) or size over time. The study of genome evolution involves multiple fields such as structural analysis of the genome, the study of genomic parasites, gene and ancient genome duplications, polyploidy, and comparative genomics. Genome evolution is a constantly changing and evolving field due to the steadily growing number of sequenced genomes, both prokaryotic and eukaryotic, available to the scientific community and the public at large.

An overlapping gene is a gene whose expressible nucleotide sequence partially overlaps with the expressible nucleotide sequence of another gene. In this way, a nucleotide sequence may make a contribution to the function of one or more gene products. Overlapping genes are present in and a fundamental feature of both cellular and viral genomes. The current definition of an overlapping gene varies significantly between eukaryotes, prokaryotes, and viruses. In prokaryotes and viruses overlap must be between coding sequences but not mRNA transcripts, and is defined when these coding sequences share a nucleotide on either the same or opposite strands. In eukaryotes, gene overlap is almost always defined as mRNA transcript overlap. Specifically, a gene overlap in eukaryotes is defined when at least one nucleotide is shared between the boundaries of the primary mRNA transcripts of two or more genes, such that a DNA base mutation at any point of the overlapping region would affect the transcripts of all genes involved. This definition includes 5′ and 3′ untranslated regions (UTRs) along with introns.

<i>De novo</i> gene birth Evolution of novel genes from non-genic DNA sequence

De novo gene birth is the process by which new genes evolve from non-coding DNA. De novo genes represent a subset of novel genes, and may be protein-coding or instead act as RNA genes. The processes that govern de novo gene birth are not well understood, although several models exist that describe possible mechanisms by which de novo gene birth may occur.

The G-value paradox arises from the lack of correlation between the number of protein-coding genes among eukaryotes and their relative biological complexity. The microscopic nematode Caenorhabditis elegans, for example, is composed of only a thousand cells but has about the same number of genes as a human. Researchers suggest resolution of the paradox may lie in mechanisms such as alternative splicing and complex gene regulation that make the genes of humans and other complex eukaryotes relatively more productive.

References

  1. Tropp BE (2008). Molecular Biology: Genes to Proteins. Jones & Bartlett Learning. ISBN   9780763709167.
  2. 1 2 3 Alberts, Bruce (2014). Essential Cell Biology (4th ed.). Garland Pub. pp. 172–209. ISBN   978-0815345251.
  3. 1 2 Palazzo AF, Lee ES (January 2015). "Non-coding RNA: what is functional and what is junk?". Frontiers in Genetics. 60 (2): e1004351. doi: 10.3389/fgene.2015.00002 . PMC   4306305 . PMID   25674102.
  4. Francis WR, Wörheide G (June 2017). "Similar Ratios of Introns to Intergenic Sequence across Animal Genomes". Genome Biology and Evolution. 9 (6): 1582–1598. doi:10.1093/gbe/evx103. PMC   5534336 . PMID   28633296.
  5. Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, et al. (October 2002). "Genome sequence of the human malaria parasite Plasmodium falciparum". Nature. 419 (6906): 498–511. doi: 10.1093/molbev/msj050 . PMID   16280547.
  6. Lynch, Michael (2006-02-01). "The Origins of Eukaryotic Gene Structure". Molecular Biology and Evolution. 23 (2): 450–468. doi: 10.1093/molbev/msj050 . ISSN   1537-1719. PMID   16280547.
  7. Papadopoulos, Chris; Callebaut, Isabelle; Gelly, Jean-Christophe; Hatin, Isabelle; Namy, Olivier; Renard, Maxime; Lespinet, Olivier; Lopes, Anne (2021). "Intergenic ORFs as elementary structural modules of de novo gene birth and protein evolution". Genome Research. 31 (12): 2303–2315. doi:10.1101/gr.275638.121. ISSN   1088-9051. PMC   8647833 . PMID   34810219.
  8. Palazzo AF, Gregory TR (May 2014). "The Case for Junk DNA". PLOS Genetics. 10 (5): e1004351. doi: 10.1371/journal.pgen.1004351 . PMC   4014423 . PMID   24809441.
  9. Papadopoulos C, Callebaut I, Gelly JC, Hatin I, Namy O, Renard M, Lespinet O, Lopes A (December 2021). "Intergenic ORFs as elementary structural modules of de novo gene birth and protein evolution". Genome Research. 31 (12): 2303–2315. doi:10.1101/gr.275638.121. PMC   8647833 . PMID   34810219.