Intergenic region

Last updated

An intergenic region is a stretch of DNA sequences located between genes. [1] Intergenic regions may contain functional elements and junk DNA.

Contents

Properties and functions

Intergenic regions may contain a number of functional DNA sequences such as promoters and regulatory elements, enhancers, spacers, and (in eukaryotes) centromeres. [2] They may also contain origins of replication, scaffold attachment regions, and transposons and viruses. [2]

Non-functional DNA elements such as pseudogenes and repetitive DNA, both of which are types of junk DNA, can also be found in intergenic regions—although they may also be located within genes in introns. [2] It is possible that these regions contain as of yet unidentified functional elements, such as non-coding genes or regulatory sequences. [3] This indeed occurs occasionally, but the amount of functional DNA discovered usually constitute only a tiny fraction of the overall amount of intergenic or intronic DNA. [3]

Intergenic regions in different organisms

In humans, intergenic regions comprise about 50% of the genome, whereas this number is much less in bacteria (15%) and yeast (30%). [4]

As with most other non-coding DNA, the GC-content of intergenic regions vary considerably among species. For example in Plasmodium falciparum , many intergenic regions have an AT content of 90%. [5]

Molecular evolution of intergenic regions

Functional elements in intergenic regions will evolve slowly because their sequence is maintained by negative selection. In species with very large genomes, a large percentage of intergenic regions is probably junk DNA and it will evolve at the neutral rate of evolution. [6] Junk DNA sequences are not maintained by purifying selection but gain-of-function mutations with deleterious fitness effects can occur. [7]

Phylostratigraphic inference and bioinformatics methods have shown that intergenic regions can—on geological timescales—transiently evolve into open reading frame sequences that mimic those of protein coding genes, and can therefore lead to the evolution of novel protein-coding genes in a process known as de novo gene birth. [8]

See also

Related Research Articles

<span class="mw-page-title-main">Exon</span> A region of a transcribed gene present in the final functional mRNA molecule

An exon is any part of a gene that will form a part of the final mature RNA produced by that gene after introns have been removed by RNA splicing. The term exon refers to both the DNA sequence within a gene and to the corresponding sequence in RNA transcripts. In RNA splicing, introns are removed and exons are covalently joined to one another as part of generating the mature RNA. Just as the entire set of genes for a species constitutes the genome, the entire set of exons constitutes the exome.

An intron is any nucleotide sequence within a gene that is not expressed or operative in the final RNA product. The word intron is derived from the term intragenic region, i.e., a region inside a gene. The term intron refers to both the DNA sequence within a gene and the corresponding RNA sequence in RNA transcripts. The non-intron sequences that become joined by this RNA processing to form the mature RNA are called exons.

<span class="mw-page-title-main">Human genome</span> Complete set of nucleic acid sequences for humans

The human genome is a complete set of nucleic acid sequences for humans, encoded as DNA within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria. These are usually treated separately as the nuclear genome and the mitochondrial genome. Human genomes include both protein-coding DNA sequences and various types of DNA that does not encode proteins. The latter is a diverse category that includes DNA coding for non-translated RNA, such as that for ribosomal RNA, transfer RNA, ribozymes, small nuclear RNAs, and several types of regulatory RNAs. It also includes promoters and their associated gene-regulatory elements, DNA playing structural and replicatory roles, such as scaffolding regions, telomeres, centromeres, and origins of replication, plus large numbers of transposable elements, inserted viral DNA, non-functional pseudogenes and simple, highly repetitive sequences. Introns make up a large percentage of non-coding DNA. Some of this non-coding DNA is non-functional junk DNA, such as pseudogenes, but there is no firm consensus on the total amount of junk DNA.

Non-coding DNA (ncDNA) sequences are components of an organism's DNA that do not encode protein sequences. Some non-coding DNA is transcribed into functional non-coding RNA molecules. Other functional regions of the non-coding DNA fraction include regulatory sequences that control gene expression; scaffold attachment regions; origins of DNA replication; centromeres; and telomeres. Some non-coding regions appear to be mostly nonfunctional such as introns, pseudogenes, intergenic DNA, and fragments of transposons and viruses.

Junk DNA is a DNA sequence that has no relevant biological function. Most organisms have some junk DNA in their genomes - mostly pseudogenes and fragments of transposons and viruses - but it is possible that some organisms have substantial amounts of junk DNA.

Molecular evolution is the process of change in the sequence composition of cellular molecules such as DNA, RNA, and proteins across generations. The field of molecular evolution uses principles of evolutionary biology and population genetics to explain patterns in these changes. Major topics in molecular evolution concern the rates and impacts of single nucleotide changes, neutral evolution vs. natural selection, origins of new genes, the genetic nature of complex traits, the genetic basis of speciation, the evolution of development, and ways that evolutionary forces influence genomic and phenotypic changes.

The coding region of a gene, also known as the coding sequence(CDS), is the portion of a gene's DNA or RNA that codes for protein. Studying the length, composition, regulation, splicing, structures, and functions of coding regions compared to non-coding regions over different species and time periods can provide a significant amount of important information regarding gene organization and evolution of prokaryotes and eukaryotes. This can further assist in mapping the human genome and developing gene therapy.

<span class="mw-page-title-main">Pseudogene</span> Functionless relative of a gene

Pseudogenes are nonfunctional segments of DNA that resemble functional genes. Most arise as superfluous copies of functional genes, either directly by gene duplication or indirectly by reverse transcription of an mRNA transcript. Pseudogenes are usually identified when genome sequence analysis finds gene-like sequences that lack regulatory sequences needed for transcription or translation, or whose coding sequences are obviously defective due to frameshifts or premature stop codons. Pseudogenes are a type of junk DNA.

An Alu element is a short stretch of DNA originally characterized by the action of the Arthrobacter luteus (Alu) restriction endonuclease. Alu elements are the most abundant transposable elements, containing over one million copies dispersed throughout the human genome. Alu elements were thought to be selfish or parasitic DNA, because their sole known function is self reproduction. However, they are likely to play a role in evolution and have been used as genetic markers. They are derived from the small cytoplasmic 7SL RNA, a component of the signal recognition particle. Alu elements are highly conserved within primate genomes and originated in the genome of an ancestor of Supraprimates.

Repeated sequences are short or long patterns of nucleic acids that occur in multiple copies throughout the genome. In many organisms, a significant fraction of the genomic DNA is repetitive, with over two-thirds of the sequence consisting of repetitive elements in humans. Some of these repeated sequences are necessary for maintaining important genome structures such as telomeres or centromeres.

The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The term transcriptome is a portmanteau of the words transcript and genome; it is associated with the process of transcript production during the biological process of transcription.

<span class="mw-page-title-main">Gene</span> Sequence of DNA or RNA that codes for an RNA or protein product

In biology, the word gene can have several different meanings. The Mendelian gene is a basic unit of heredity and the molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protein-coding genes and non-coding genes.

Gene structure is the organisation of specialised sequence elements within a gene. Genes contain most of the information necessary for living cells to survive and reproduce. In most organisms, genes are made of DNA, where the particular DNA sequence determines the function of the gene. A gene is transcribed (copied) from DNA into RNA, which can either be non-coding (ncRNA) with a direct function, or an intermediate messenger (mRNA) that is then translated into protein. Each of these steps is controlled by specific sequence elements, or regions, within the gene. Every gene, therefore, requires multiple sequence elements to be functional. This includes the sequence that actually encodes the functional protein or ncRNA, as well as multiple regulatory sequence regions. These regions may be as short as a few base pairs, up to many thousands of base pairs long.

Exon shuffling is a molecular mechanism for the formation of new genes. It is a process through which two or more exons from different genes can be brought together ectopically, or the same exon can be duplicated, to create a new exon-intron structure. There are different mechanisms through which exon shuffling occurs: transposon mediated exon shuffling, crossover during sexual recombination of parental genomes and illegitimate recombination.

Orphan genes, ORFans, or taxonomically restricted genes (TRGs) are genes that lack a detectable homologue outside of a given species or lineage. Most genes have known homologues. Two genes are homologous when they share an evolutionary history, and the study of groups of homologous genes allows for an understanding of their evolutionary history and divergence. Common mechanisms that have been uncovered as sources for new genes through studies of homologues include gene duplication, exon shuffling, gene fusion and fission, etc. Studying the origins of a gene becomes more difficult when there is no evident homologue. The discovery that about 10% or more of the genes of the average microbial species is constituted by orphan genes raises questions about the evolutionary origins of different species as well as how to study and uncover the evolutionary origins of orphan genes.

A conserved non-coding sequence (CNS) is a DNA sequence of noncoding DNA that is evolutionarily conserved. These sequences are of interest for their potential to regulate gene production.

<span class="mw-page-title-main">STARR-seq</span>

STARR-seq is a method to assay enhancer activity for millions of candidates from arbitrary sources of DNA. It is used to identify the sequences that act as transcriptional enhancers in a direct, quantitative, and genome-wide manner.

<span class="mw-page-title-main">Short interspersed nuclear element</span>

Short interspersed nuclear elements (SINEs) are non-autonomous, non-coding transposable elements (TEs) that are about 100 to 700 base pairs in length. They are a class of retrotransposons, DNA elements that amplify themselves throughout eukaryotic genomes, often through RNA intermediates. SINEs compose about 13% of the mammalian genome.

The split gene theory is a theory of the origin of introns, long non-coding sequences in eukaryotic genes between the exons. The theory holds that the randomness of primordial DNA sequences would only permit small (< 600bp) open reading frames (ORFs), and that important intron structures and regulatory sequences are derived from stop codons. In this introns-first framework, the spliceosomal machinery and the nucleus evolved due to the necessity to join these ORFs into larger proteins, and that intronless bacterial genes are less ancestral than the split eukaryotic genes. The theory originated with Periannan Senapathy.

The G-value paradox arises from the lack of correlation between the number of protein-coding genes among eukaryotes and their relative biological complexity. The microscopic nematode Caenorhabditis elegans, for example, is composed of only a thousand cells but has about the same number of genes as a human. Researchers suggest resolution of the paradox may lie in mechanisms such as alternative splicing and complex gene regulation that make the genes of humans and other complex eukaryotes relatively more productive.

References

  1. Tropp BE (2008). Molecular Biology: Genes to Proteins. Jones & Bartlett Learning. ISBN   9780763709167.
  2. 1 2 3 Alberts, Bruce (2014). Essential Cell Biology (4th ed.). Garland Pub. pp. 172–209. ISBN   978-0815345251.
  3. 1 2 Pallazo AF, Lee ES (January 2015). "Non-coding RNA: what is functional and what is junk?". Frontiers in Genetics. 60 (2): e1004351. doi: 10.3389/fgene.2015.00002 . PMC   4306305 . PMID   25674102.
  4. Francis WR, Wörheide G (June 2017). "Similar Ratios of Introns to Intergenic Sequence across Animal Genomes". Genome Biology and Evolution. 9 (6): 1582–1598. doi:10.1093/gbe/evx103. PMC   5534336 . PMID   28633296.
  5. Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, et al. (October 2002). "Genome sequence of the human malaria parasite Plasmodium falciparum". Nature. 419 (6906): 498–511. doi: 10.1093/molbev/msj050 . PMID   16280547.
  6. Lynch, Michael (February 2006). "The origins of eukaryotic gene structure". Molecular Biology and Evolution. 23 (2): 450–468. doi: 10.1101/gr.275638.121 . PMC   8647833 . PMID   34810219. S2CID   233328735.
  7. Pallazo AF, Gregory TR (May 2014). "The Case for Junk DNA". PLOS Genetics. 10 (5): e1004351. doi: 10.1371/journal.pgen.1004351 . PMC   4014423 . PMID   24809441.
  8. Papadopoulos C, Callebaut I, Gelly JC, Hatin I, Namy O, Renard M, Lespinet O, Lopes A (December 2021). "Intergenic ORFs as elementary structural modules of de novo gene birth and protein evolution". Genome Research. 31 (12): 2303–2315. doi:10.1101/gr.275638.121. PMC   8647833 . PMID   34810219.