LTR retrotransposons are class I transposable elements (TEs) characterized by the presence of long terminal repeats (LTRs) directly flanking an internal coding region. As retrotransposons, they mobilize through reverse transcription of their mRNA and integration of the newly created cDNA into another genomic location. Their mechanism of retrotransposition is shared with retroviruses, with the difference that the rate of horizontal transfer in LTR-retrotransposons is much lower than the vertical transfer by passing active TE insertions to the progeny. LTR retrotransposons that form virus-like particles are classified under Ortervirales .
Their size ranges from a few hundred base pairs to 30 kb, the largest species reported to date are members of the Burro retrotransposon family in Schmidtea mediterranea . [1]
In plant genomes, LTR retrotransposons are the major repetitive sequence class constituting more than 75% of the maize genome. [2] LTR retrotransposons make up about 8% of the human genome and approximately 10% of the mouse genome. [3]
LTR retrotransposons have direct long terminal repeats that range from ~100 bp to over 5 kb in size. LTR retrotransposons are further sub-classified into the Ty1-copia-like (Pseudoviridae), Ty3-like (Metaviridae, formally referred to as Gypsy-like, a name that is being considered for retirement [4] ), and BEL-Pao-like (Belpaoviridae) groups based on both their degree of sequence similarity and the order of encoded gene products. Ty1-copia and Ty3-Metaviridae groups of retrotransposons are commonly found in high copy number (up to a few million copies per haploid nucleus) in animals, fungi, protista, and plants genomes. BEL-Pao like elements have so far only been found in animals. [5] [6]
All functional LTR-retrotransposons encode a minimum of two genes, gag and pol, that are sufficient for their replication. Gag encodes a polyprotein with a capsid and a nucleocapsid domain. [7] Gag proteins form virus-like particles in the cytoplasm inside which reverse-transcription occurs. The Pol gene produces three proteins: a protease (PR), a reverse transcriptase endowed with an RT (reverse-transcriptase) and an RNAse H domains, and an integrase (IN). [8]
Typically, LTR-retrotransposon mRNAs are produced by the host RNA pol II acting on a promoter located in their 5’ LTR. The Gag and Pol genes are encoded in the same mRNA. Depending on the host species, two different strategies can be used to express the two polyproteins: a fusion into a single open reading frame (ORF) that is then cleaved or the introduction of a frameshift between the two ORFs. [9] Occasional ribosomal frameshifting allows the production of both proteins, while ensuring that much more Gag protein is produced to form virus-like particles.
Reverse transcription usually initiates at a short sequence located immediately downstream of the 5’-LTR and termed the primer binding site (PBS). Specific host tRNAs bind to the PBS and act as primers for reverse-transcription, which occurs in a complex and multi-step process, ultimately producing a double- stranded cDNA molecule. The cDNA is finally integrated into a new location, creating short TSDs (Target Site Duplications) [10] and adding a new copy in the host genome
Ty1-copia retrotransposons are abundant in species ranging from single-cell algae to bryophytes, gymnosperms, and angiosperms. They encode four protein domains in the following order: protease, integrase, reverse transcriptase, and ribonuclease H.
At least two classification systems exist for the subdivision of Ty1-copia retrotransposons into five lineages: [11] [12] Sireviruses/Maximus, Oryco/Ivana, Retrofit/Ale, TORK (subdivided in Angela/Sto, TAR/Fourf, GMR/Tork), and Bianca.
Sireviruses/Maximus retrotransposons contain an additional putative envelope gene. This lineage is named for the founder element SIRE1 in the Glycine max genome, [13] and was later described in many species such as Zea mays , [14] Arabidopsis thaliana , [15] Beta vulgaris , [16] and Pinus pinaster . [17] Plant Sireviruses of many sequenced plant genomes are summarized at the MASIVEdb Sirevirus database. [18]
Ty3-retrotransposons are widely distributed in the plant kingdom, including both gymnosperms and angiosperms. They encode at least four protein domains in the order: protease, reverse transcriptase, ribonuclease H, and integrase. Based on structure, presence/absence of specific protein domains, and conserved protein sequence motifs, they can be subdivided into several lineages:
Errantiviruses contain an additional defective envelope ORF with similarities to the retroviral envelope gene. First described as Athila-elements in Arabidopsis thaliana , [19] [20] they have been later identified in many species, such as Glycine max [21] and Beta vulgaris . [22]
Chromoviruses contain an additional chromodomain (chromatin organization modifier domain) at the C-terminus of their integrase protein. [23] [24] They are widespread in plants and fungi, probably retaining protein domains during evolution of these two kingdoms. [25] It is thought that the chromodomain directs retrotransposon integration to specific target sites. [26] According to sequence and structure of the chromodomain, chromoviruses are subdivided into the four clades CRM, Tekay, Reina and Galadriel. Chromoviruses from each clade show distinctive integration patterns, e.g. into centromeres or into the rRNA genes. [27] [28]
Ogre-elements are gigantic Ty3-retrotransposons reaching lengths up to 25 kb. [29] Ogre elements have been first described in Pisum sativum . [30]
Metaviruses describe conventional Ty3-gypsy retrotransposons that do not contain additional domains or ORFs.
The Sushi family of Ty3 long terminal repeat retrotransposons were first identified in teleost fish and Sushi-like neogenes were subsequently identified in mammals. [31] Mammalian retrotransposon-derived transcripts (MARTs) cannot transpose but have retained open reading frames, demonstrate high levels of evolutionary conservation and are subject to selective pressures, which suggests some have become neofunctionalized genes with new cellular functions. [31] Retrotransposon gag-like-3 (RTL3/ZCCHC5/MART3) is one of eleven Sushi-like neogenes identified in the human genome. [31]
The BEL/pao family is found in animals. [32]
Although retroviruses are often classified separately, they share many features with LTR retrotransposons. A major difference with Ty1-copia and Ty3-gypsy retrotransposons is that retroviruses have an envelope protein (ENV). A retrovirus can be transformed into an LTR retrotransposon through inactivation or deletion of the domains that enable extracellular mobility. If such a retrovirus infects and subsequently inserts itself in the genome in germ line cells, it may become transmitted vertically and become an Endogenous Retrovirus. [6]
Some LTR retrotransposons lack all of their coding domains. Due to their short size, they are referred to as terminal repeat retrotransposons in miniature (TRIMs). [33] [34] Nevertheless, TRIMs can be able to retrotranspose, as they may rely on the coding domains of autonomous Ty1-copia or Ty3-gypsy retrotransposons. Among the TRIMs, the Cassandra family plays an exceptional role, as the family is unusually wide-spread among higher plants. [35] In contrast to all other characterized TRIMs, Cassandra elements harbor a 5S rRNA promoter in their LTR sequence. [36] Due to their short overall length and the relatively high contribution of the flanking LTRs, TRIMs are prone to re-arrangements by recombination. [37]
In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA. The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as regulatory sequences, and often a substantial fraction of junk DNA with no evident function. Almost all eukaryotes have mitochondria and a small mitochondrial genome. Algae and plants also contain chloroplasts with a chloroplast genome.
Retroposons are repetitive DNA fragments which are inserted into chromosomes after they had been reverse transcribed from any RNA molecule.
A transposable element is a nucleic acid sequence in DNA that can change its position within a genome, sometimes creating or reversing mutations and altering the cell's genetic identity and genome size. Transposition often results in duplication of the same genetic material. In the human genome, L1 and Alu elements are two examples. Barbara McClintock's discovery of them earned her a Nobel Prize in 1983. Its importance in personalized medicine is becoming increasingly relevant, as well as gaining more attention in data analytics given the difficulty of analysis in very high dimensional spaces.
Retrotransposons are mobile elements which move in the host genome by converting their transcribed RNA into DNA through the reverse transcription. Thus, they differ from Class II transposable elements, or DNA transposons, in utilizing an RNA intermediate for the transposition and leaving the transposition donor site unchanged.
Metaviridae is a family of viruses which exist as Ty3-gypsy LTR retrotransposons in a eukaryotic host's genome. They are closely related to retroviruses: members of the family Metaviridae share many genomic elements with retroviruses, including length, organization, and genes themselves. This includes genes that encode reverse transcriptase, integrase, and capsid proteins. The reverse transcriptase and integrase proteins are needed for the retrotransposon activity of the virus. In some cases, virus-like particles can be formed from capsid proteins.
Pseudoviridae is a family of viruses, which includes three genera.
Lentivirus is a genus of retroviruses that cause chronic and deadly diseases characterized by long incubation periods, in humans and other mammalian species. The genus includes the human immunodeficiency virus (HIV), which causes AIDS. Lentiviruses are distributed worldwide, and are known to be hosted in apes, cows, goats, horses, cats, and sheep as well as several other mammals.
Endogenous retroviruses (ERVs) are endogenous viral elements in the genome that closely resemble and can be derived from retroviruses. They are abundant in the genomes of jawed vertebrates, and they comprise up to 5–8% of the human genome.
The genome and proteins of HIV (human immunodeficiency virus) have been the subject of extensive research since the discovery of the virus in 1983. "In the search for the causative agent, it was initially believed that the virus was a form of the Human T-cell leukemia virus (HTLV), which was known at the time to affect the human immune system and cause certain leukemias. However, researchers at the Pasteur Institute in Paris isolated a previously unknown and genetically distinct retrovirus in patients with AIDS which was later named HIV." Each virion comprises a viral envelope and associated matrix enclosing a capsid, which itself encloses two copies of the single-stranded RNA genome and several enzymes. The discovery of the virus itself occurred two years following the report of the first major cases of AIDS-associated illnesses.
Simian foamy virus (SFV) is a species of the genus Spumavirus that belongs to the family of Retroviridae. It has been identified in a wide variety of primates, including prosimians, New World and Old World monkeys, as well as apes, and each species has been shown to harbor a unique (species-specific) strain of SFV, including African green monkeys, baboons, macaques, and chimpanzees. As it is related to the more well-known retrovirus human immunodeficiency virus (HIV), its discovery in primates has led to some speculation that HIV may have been spread to the human species in Africa through contact with blood from apes, monkeys, and other primates, most likely through bushmeat-hunting practices.
Exon shuffling is a molecular mechanism for the formation of new genes. It is a process through which two or more exons from different genes can be brought together ectopically, or the same exon can be duplicated, to create a new exon-intron structure. There are different mechanisms through which exon shuffling occurs: transposon mediated exon shuffling, crossover during sexual recombination of parental genomes and illegitimate recombination.
Group-specific antigen, or gag, is the polyprotein that contains the core structural proteins of an Ortervirus. It was named as such because scientists used to believe it was antigenic. Now it is known that it makes up the inner shell, not the envelope exposed outside. It makes up all the structural units of viral conformation and provides supportive framework for mature virion.
Retrotransposon-derived protein PEG10 is a protein that in humans is encoded by the PEG10 gene.
Long interspersed nuclear elements (LINEs) are a group of non-LTR retrotransposons that are widespread in the genome of many eukaryotes. LINEs contain an internal Pol II promoter to initiate transcription into mRNA, and encode one or two proteins, ORF1 and ORF2. The functional domains present within ORF1 vary greatly among LINEs, but often exhibit RNA/DNA binding activity. ORF2 is essential to successful retrotransposition, and encodes a protein with both reverse transcriptase and endonuclease activity.
Short interspersed nuclear elements (SINEs) are non-autonomous, non-coding transposable elements (TEs) that are about 100 to 700 base pairs in length. They are a class of retrotransposons, DNA elements that amplify themselves throughout eukaryotic genomes, often through RNA intermediates. SINEs compose about 13% of the mammalian genome.
Retrotransposon Gag Like 6 is a protein encoded by the RTL6 gene in humans. RTL6 is a member of the Mart family of genes, which are related to Sushi-like retrotransposons and were derived from fish and amphibians. The RTL6 protein is localized to the nucleus and has a predicted leucine zipper motif that is known to bind nucleic acids in similar proteins, such as LDOC1.
Metavirus is a genus of viruses in the family Metaviridae. They are retrotransposons that invade a eukaryotic host genome and may only replicate once the virus has infected the host. These genetic elements exist to infect and replicate in their host genome and are derived from ancestral elements unrelated from their host. Metavirus may use several different hosts for transmission, and has been found to be transmissible through ovule and pollen of some plants.
Ortervirales is an order that contains all accepted species of single-stranded RNA viruses that replicate through a DNA intermediate and all accepted species of double-stranded DNA viruses that replicate through an RNA intermediate . The name is derived from the reverse of retro.
Retrozymes are a family of retrotransposons first discovered in the genomes of plants but now also known in genomes of animals. Retrozymes contain a hammerhead ribozyme (HHR) in their sequences, although they do not possess any coding regions. Retrozymes are nonautonomous retroelements, and so borrow proteins from other elements to move into new regions of a genome. Retrozymes are actively transcribed into covalently closed circular RNAs and are detected in both polarities, which may indicate the use of rolling circle replication in their lifecycle.
Cer6 is a LTR retrotransposon that is described from sequencing data in the chromosome III of C. elegans.
{{cite book}}
: CS1 maint: location missing publisher (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: Cite journal requires |journal=
(help)