This article needs to be updated.(February 2021) |
An Alu element is a short stretch of DNA originally characterized by the action of the Arthrobacter luteus (Alu) restriction endonuclease. [1] Alu elements are the most abundant transposable elements in the human genome, present in excess of one million copies. [2] Alu elements were thought to be selfish or parasitic DNA, because their sole known function is self reproduction. However, they are likely to play a role in evolution and have been used as genetic markers. [3] [4] They are derived from the small cytoplasmic 7SL RNA, a component of the signal recognition particle. Alu elements are highly conserved within primate genomes and originated in the genome of an ancestor of Supraprimates. [5]
Alu insertions have been implicated in several inherited human diseases and in various forms of cancer.
The study of Alu elements has also been important in elucidating human population genetics and the evolution of primates, including the evolution of humans.
The Alu family is a family of repetitive elements in primate genomes, including the human genome. [6] Modern Alu elements are about 300 base pairs long and are therefore classified as short interspersed nuclear elements (SINEs) among the class of repetitive RNA elements. The typical structure is 5' - Part A - A5TACA6 - Part B - PolyA Tail - 3', where Part A and Part B (also known as "left arm" and "right arm") are similar nucleotide sequences. Expressed another way, it is believed modern Alu elements emerged from a head to tail fusion of two distinct FAMs (fossil antique monomers) over 100 million years ago, hence its dimeric structure of two similar, but distinct monomers (left and right arms) joined by an A-rich linker. Both monomers are thought to have evolved from 7SL, also known as SRP RNA. [7] The length of the polyA tail varies between Alu families.
There are over one million Alu elements interspersed throughout the human genome, and it is estimated that about 10.7% of the human genome consists of Alu sequences. However, less than 0.5% are polymorphic (i.e., occurring in more than one form or morph). [8] In 1988, Jerzy Jurka and Temple Smith discovered that Alu elements were split in two major subfamilies known as AluJ (named after Jurka) and AluS (named after Smith), and other Alu subfamilies were also independently discovered by several groups. [9] Later on, a sub-subfamily of AluS which included active Alu elements was given the separate name AluY. Dating back 65 million years, the AluJ lineage is the oldest and least active in the human genome. The younger AluS lineage is about 30 million years old and still contains some active elements. Finally, the AluY elements are the youngest of the three and have the greatest disposition to move along the human genome. [10] The discovery of Alu subfamilies led to the hypothesis of master/source genes, and provided the definitive link between transposable elements (active elements) and interspersed repetitive DNA (mutated copies of active elements). [11]
B1 elements in rats and mice are similar to Alus in that they also evolved from 7SL RNA, but they only have one left monomer arm. 95% percent of human Alus are also found in chimpanzees, and 50% of B elements in mice are also found in rats. These elements are mostly found in introns and upstream regulatory elements of genes. [12]
The ancestral form of Alu and B1 is the fossil Alu monomer (FAM). Free-floating forms of the left and right arms exist, termed free left Alu monomers (FLAMs) and free right Alu monomers (FRAMs) respectively. [13] A notable FLAM in primates is the BC200 lncRNA.
Two main promoter "boxes" are found in Alu: a 5' A box with the consensus TGGCTCACGCC, and a 3' B box with the consensus GTTCGAGAC (IUPAC nucleic acid notation). tRNAs, which are transcribed by RNA polymerase III, have a similar but stronger promoter structure. [14] Both boxes are located in the left arm. [7]
Alu elements contain four or fewer Retinoic Acid response element hexamer sites in its internal promoter, with the last one overlapping with the "B box". [15] In this 7SL (SRP) RNA example below, functional hexamers are underlined using a solid line, with the non-functional third hexamer denoted using a dotted line:
GCCGGGCGCGGTGGCGCGTGCCTGTAGTCCCAGCTACTCGGGAGGCTGAGGCTGGAGGATCGCTTGAGTCCAGGAGTTCTGGGCTGTAGTGCGCTATGCCGATCGGAATAGCCACTGCACTCCAGCCTGGGCAACATAGCGAGACCCCGTCTC.
The recognition sequence of the Alu I endonuclease is 5' ag/ct 3'; that is, the enzyme cuts the DNA segment between the guanine and cytosine residues (in lowercase above). [16]
Alu elements are responsible for regulation of tissue-specific genes. They are also involved in the transcription of nearby genes and can sometimes change the way a gene is expressed. [17]
Alu elements are retrotransposons and look like DNA copies made from RNA polymerase III-encoded RNAs. Alu elements do not encode for protein products. They are replicated as any other DNA sequence, but depend on LINE retrotransposons for generation of new elements. [18]
Alu element replication and mobilization begins by interactions with signal recognition particles (SRPs), which aid newly translated proteins to reach their final destinations. [19] Alu RNA forms a specific RNA:protein complex with a protein heterodimer consisting of SRP9 and SRP14. [19] SRP9/14 facilitates Alu's attachment to ribosomes that capture nascent L1 proteins. Thus, an Alu element can take control of the L1 protein's reverse transcriptase, ensuring that the Alu's RNA sequence gets copied into the genome rather than the L1's mRNA. [10]
Alu elements in primates form a fossil record that is relatively easy to decipher because Alu element insertion events have a characteristic signature that is both easy to read and faithfully recorded in the genome from generation to generation. The study of Alu Y elements (the more recently evolved) thus reveals details of ancestry because individuals will most likely only share a particular Alu element insertion if they have a common ancestor. This is because insertion of an Alu element occurs only 100 - 200 times per million years, and no known mechanism of deletion of one has been found. Therefore, individuals with an element likely descended from an ancestor with one—and vice versa, for those without. In genetics, the presence or lack thereof of a recently inserted Alu element may be a good property to consider when studying human evolution. [20] Most human Alu element insertions can be found in the corresponding positions in the genomes of other primates, but about 7,000 Alu insertions are unique to humans. [21]
Alu elements have been proposed to affect gene expression and been found to contain functional promoter regions for steroid hormone receptors. [15] [22] Due to the abundant content of CpG dinucleotides found in Alu elements, these regions serve as a site of methylation, contributing to up to 30% of the methylation sites in the human genome. [23] Alu elements are also a common source of mutations in humans; however, such mutations are often confined to non-coding regions of pre-mRNA (introns), where they have little discernible impact on the bearer. [24] Mutations in the introns (or non-coding regions of RNA) have little or no effect on phenotype of an individual if the coding portion of individual's genome does not contain mutations. The Alu insertions that can be detrimental to the human body are inserted into coding regions (exons) or into mRNA after the process of splicing. [25]
However, the variation generated can be used in studies of the movement and ancestry of human populations, [26] and the mutagenic effect of Alu [27] and retrotransposons in general [28] has played a major role in the evolution of the human genome. There are also a number of cases where Alu insertions or deletions are associated with specific effects in humans:
Alu insertions are sometimes disruptive and can result in inherited disorders. However, most Alu variation acts as markers that segregate with the disease so the presence of a particular Alu allele does not mean that the carrier will definitely get the disease. The first report of Alu-mediated recombination causing a prevalent inherited predisposition to cancer was a 1995 report about hereditary nonpolyposis colorectal cancer . [29] In the human genome, the most recently active have been the 22 AluY and 6 AluS Transposon Element subfamilies due to their inherited activity to cause various cancers. Thus due to their major heritable damage it is important to understand the causes that affect their transpositional activity. [30]
The following human diseases have been linked with Alu insertions: [26] [31]
And the following diseases have been associated with single-nucleotide DNA variations in Alu elements affecting transcription levels: [33]
The following disease have been associated with repeat expansion of AAGGG pentamere in Alu element :
In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA. The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as regulatory sequences, and often a substantial fraction of junk DNA with no evident function. Almost all eukaryotes have mitochondria and a small mitochondrial genome. Algae and plants also contain chloroplasts with a chloroplast genome.
A transposable element is a nucleic acid sequence in DNA that can change its position within a genome, sometimes creating or reversing mutations and altering the cell's genetic identity and genome size. Transposition often results in duplication of the same genetic material. In the human genome, L1 and Alu elements are two examples. Barbara McClintock's discovery of them earned her a Nobel Prize in 1983. Its importance in personalized medicine is becoming increasingly relevant, as well as gaining more attention in data analytics given the difficulty of analysis in very high dimensional spaces.
The human genome is a complete set of nucleic acid sequences for humans, encoded as DNA within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria. These are usually treated separately as the nuclear genome and the mitochondrial genome. Human genomes include both protein-coding DNA sequences and various types of DNA that does not encode proteins. The latter is a diverse category that includes DNA coding for non-translated RNA, such as that for ribosomal RNA, transfer RNA, ribozymes, small nuclear RNAs, and several types of regulatory RNAs. It also includes promoters and their associated gene-regulatory elements, DNA playing structural and replicatory roles, such as scaffolding regions, telomeres, centromeres, and origins of replication, plus large numbers of transposable elements, inserted viral DNA, non-functional pseudogenes and simple, highly repetitive sequences. Introns make up a large percentage of non-coding DNA. Some of this non-coding DNA is non-functional junk DNA, such as pseudogenes, but there is no firm consensus on the total amount of junk DNA.
Non-coding DNA (ncDNA) sequences are components of an organism's DNA that do not encode protein sequences. Some non-coding DNA is transcribed into functional non-coding RNA molecules. Other functional regions of the non-coding DNA fraction include regulatory sequences that control gene expression; scaffold attachment regions; origins of DNA replication; centromeres; and telomeres. Some non-coding regions appear to be mostly nonfunctional, such as introns, pseudogenes, intergenic DNA, and fragments of transposons and viruses. Regions that are completely nonfunctional are called junk DNA.
Pseudogenes are nonfunctional segments of DNA that resemble functional genes. Most arise as superfluous copies of functional genes, either directly by gene duplication or indirectly by reverse transcription of an mRNA transcript. Pseudogenes are usually identified when genome sequence analysis finds gene-like sequences that lack regulatory sequences needed for transcription or translation, or whose coding sequences are obviously defective due to frameshifts or premature stop codons. Pseudogenes are a type of junk DNA.
Repeated sequences are short or long patterns of nucleic acids that occur in multiple copies throughout the genome. In many organisms, a significant fraction of the genomic DNA is repetitive, with over two-thirds of the sequence consisting of repetitive elements in humans. Some of these repeated sequences are necessary for maintaining important genome structures such as telomeres or centromeres.
Retrotransposons are mobile elements which move in the host genome by converting their transcribed RNA into DNA through the reverse transcription. Thus, they differ from Class II transposable elements, or DNA transposons, in utilizing an RNA intermediate for the transposition and leaving the transposition donor site unchanged.
Interspersed repetitive DNA is found in all eukaryotic genomes. They differ from tandem repeat DNA in that rather than the repeat sequences coming right after one another, they are dispersed throughout the genome and nonadjacent. The sequence that repeats can vary depending on the type of organism, and many other factors. Certain classes of interspersed repeat sequences propagate themselves by RNA mediated transposition; they have been called retrotransposons, and they constitute 25–40% of most mammalian genomes. Some types of interspersed repetitive DNA elements allow new genes to evolve by uncoupling similar DNA sequences from gene conversion during meiosis.
Endogenous retroviruses (ERVs) are endogenous viral elements in the genome that closely resemble and can be derived from retroviruses. They are abundant in the genomes of jawed vertebrates, and they comprise up to 5–8% of the human genome.
Retrotransposon markers are components of DNA which are used as cladistic markers. They assist in determining the common ancestry, or not, of related taxa. The "presence" of a given retrotransposon in related taxa suggests their orthologous integration, a derived condition acquired via a common ancestry, while the "absence" of particular elements indicates the plesiomorphic condition prior to integration in more distant taxa. The use of presence/absence analyses to reconstruct the systematic biology of mammals depends on the availability of retrotransposons that were actively integrating before the divergence of a particular species.
Exon shuffling is a molecular mechanism for the formation of new genes. It is a process through which two or more exons from different genes can be brought together ectopically, or the same exon can be duplicated, to create a new exon-intron structure. There are different mechanisms through which exon shuffling occurs: transposon mediated exon shuffling, crossover during sexual recombination of parental genomes and illegitimate recombination.
Mobile genetic elements (MGEs), sometimes called selfish genetic elements, are a type of genetic material that can move around within a genome, or that can be transferred from one species or replicon to another. MGEs are found in all organisms. In humans, approximately 50% of the genome is thought to be MGEs. MGEs play a distinct role in evolution. Gene duplication events can also happen through the mechanism of MGEs. MGEs can also cause mutations in protein coding regions, which alters the protein functions. These mechanisms can also rearrange genes in the host genome generating variation. These mechanism can increase fitness by gaining new or additional functions. An example of MGEs in evolutionary context are that virulence factors and antibiotic resistance genes of MGEs can be transported to share genetic code with neighboring bacteria. However, MGEs can also decrease fitness by introducing disease-causing alleles or mutations. The set of MGEs in an organism is called a mobilome, which is composed of a large number of plasmids, transposons and viruses.
In the fields of bioinformatics and computational biology, Genome survey sequences (GSS) are nucleotide sequences similar to expressed sequence tags (ESTs) that the only difference is that most of them are genomic in origin, rather than mRNA.
LTR retrotransposons are class I transposable elements (TEs) characterized by the presence of long terminal repeats (LTRs) directly flanking an internal coding region. As retrotransposons, they mobilize through reverse transcription of their mRNA and integration of the newly created cDNA into another genomic location. Their mechanism of retrotransposition is shared with retroviruses, with the difference that the rate of horizontal transfer in LTR-retrotransposons is much lower than the vertical transfer by passing active TE insertions to the progeny. LTR retrotransposons that form virus-like particles are classified under Ortervirales.
A conserved non-coding sequence (CNS) is a DNA sequence of noncoding DNA that is evolutionarily conserved. These sequences are of interest for their potential to regulate gene production.
Long interspersed nuclear elements (LINEs) are a group of non-LTR retrotransposons that are widespread in the genome of many eukaryotes. LINEs contain an internal Pol II promoter to initiate transcription into mRNA, and encode one or two proteins, ORF1 and ORF2. The functional domains present within ORF1 vary greatly among LINEs, but often exhibit RNA/DNA binding activity. ORF2 is essential to successful retrotransposition, and encodes a protein with both reverse transcriptase and endonuclease activity.
LINE1 is a family of related class I transposable elements in the DNA of some organisms, classified with the long interspersed elements (LINEs). L1 transposons comprise approximately 17% of the human genome. These active L1s can interrupt the genome through insertions, deletions, rearrangements, and copy number variations. L1 activity has contributed to the instability and evolution of genomes and is tightly regulated in the germline by DNA methylation, histone modifications, and piRNA. L1s can further impact genome variation through mispairing and unequal crossing over during meiosis due to its repetitive DNA sequences.
Short interspersed nuclear elements (SINEs) are non-autonomous, non-coding transposable elements (TEs) that are about 100 to 700 base pairs in length. They are a class of retrotransposons, DNA elements that amplify themselves throughout eukaryotic genomes, often through RNA intermediates. SINEs compose about 13% of the mammalian genome.
DNA transposons are DNA sequences, sometimes referred to "jumping genes", that can move and integrate to different locations within the genome. They are class II transposable elements (TEs) that move through a DNA intermediate, as opposed to class I TEs, retrotransposons, that move through an RNA intermediate. DNA transposons can move in the DNA of an organism via a single-or double-stranded DNA intermediate. DNA transposons have been found in both prokaryotic and eukaryotic organisms. They can make up a significant portion of an organism's genome, particularly in eukaryotes. In prokaryotes, TE's can facilitate the horizontal transfer of antibiotic resistance or other genes associated with virulence. After replicating and propagating in a host, all transposon copies become inactivated and are lost unless the transposon passes to a genome by starting a new life cycle with horizontal transfer. It is important to note that DNA transposons do not randomly insert themselves into the genome, but rather show preference for specific sites.
Haig Hagop Kazazian Jr. was an American professor in the Department of Genetic Medicine at Johns Hopkins University School of Medicine in Baltimore, Maryland. Kazazian was an elected member of the National Academy of Sciences and the American Academy of Arts and Sciences.