Long interspersed nuclear elements (LINEs) [1] (also known as long interspersed nucleotide elements [2] or long interspersed elements [3] ) are a group of non-LTR (long terminal repeat) retrotransposons that are widespread in the genome of many eukaryotes. [4] [5] LINEs contain an internal Pol II promoter to initiate transcription into mRNA, and encode one or two proteins, ORF1 and ORF2. [6] The functional domains present within ORF1 vary greatly among LINEs, but often exhibit RNA/DNA binding activity. ORF2 is essential to successful retrotransposition, and encodes a protein with both reverse transcriptase and endonuclease activity. [7]
LINEs are the most abundant transposable element within the human genome, [8] with approximately 20.7% of the sequences identified as being derived from LINEs. The only active lineage of LINE found within humans belongs to the LINE-1 class, and is referred to as L1Hs. [9] The human genome contains an estimated 100,000 truncated and 4,000 full-length LINE-1 elements. [10] Due to the accumulation of random mutations, the sequence of many LINEs has degenerated to the extent that they are no longer transcribed or translated. Comparisons of LINE DNA sequences can be used to date transposon insertions in the genome.
The first description of an approximately 6.4 kb long LINE-derived sequence was published by J. Adams et al. in 1980. [11]
Based on structural features and the phylogeny of the essential protein ORF2p, LINEs can be separated into six main groups, referred to as R2, RanI, L1, RTE, I and Jockey. These groups can further be subdivided into at least 28 clades. [12]
In plant genomes, so far only LINEs of the L1 and RTE clade have been reported. [13] [14] [15] Whereas L1 elements diversify into several subclades, RTE-type LINEs are highly conserved, often constituting a single family. [16] [17]
In fungi, Tad, L1, CRE, Deceiver and Inkcap-like elements have been identified, [18] with Tad-like elements appearing exclusively in fungal genomes. [19]
All LINEs encode a least one protein, ORF2, which contains an RT and an endonuclease (EN) domain, either an N-terminal APE or a C-terminal RLE or rarely both. A ribonuclease H domain is occasionally present. Except for the evolutionary ancient R2 and RTE superfamilies, LINEs usually encode for another protein named ORF1, which may contain an Gag-knuckle, a L1-like RRM (InterPro : IPR035300 ), and/or an esterase. LINE elements are relatively rare compared to LTR-retrotransposons in plants, fungi or insects, but are dominant in vertebrates and especially in mammals, where they represent around 20% of the genome. [12] : fig. 1
The LINE-1/L1-element is one of the elements that are still active in the human genome today. It is found in all therian mammals [20] [21] except megabats. [22]
Remnants of L2 and L3 elements are found in the human genome. [23] It is estimated that L2 and L3 elements were active ~200-300 million years ago. Due to the age of L2 elements found within therian genomes, they lack flanking target site duplications. [24] The L2 (and L3) elements are in the same group as the CR1 clade, Jockey. [25]
In the first human genome draft the fraction of LINE elements of the human genome was given as 21% and their copy number as 850,000. Of these, L1, L2 and L3 elements made up 516,000, 315,000 and 37,000 copies, respectively. The non-autonomous SINE elements which depend on L1 elements for their proliferation make up 13% of the human genome and have a copy number of around 1.5 million. [23] They probably originated from the RTE family of LINEs. [26] Recent estimates show the typical human genome contains on average 100 L1 elements with potential for mobilization, however there is a fair amount of variation and some individuals may contain a larger number of active L1 elements, making these individuals more prone to L1-induced mutagenesis. [27]
Increased L1 copy numbers have also been found in the brains of people with schizophrenia, indicating that LINE elements may play a role in some neuronal diseases. [28]
LINE elements propagate by a so-called target primed reverse transcription mechanism (TPRT), which was first described for the R2 element from the silkworm Bombyx mori.
ORF2 (and ORF1 when present) proteins primarily associate in cis with their encoding mRNA, forming a ribonucleoprotein (RNP) complex, likely composed of two ORF2s and an unknown number of ORF1 trimers. [29] The complex is transported back into the nucleus, where the ORF2 endonuclease domain opens the DNA (at TTAAAA hexanucleotide motifs in mammals [30] ). Thus, a 3'OH group is freed for the reverse transcriptase to prime reverse transcription of the LINE RNA transcript. Following the reverse transcription the target strand is cleaved and the newly created cDNA is integrated [31]
New insertions create short target site duplications (TSDs), and the majority of new inserts are severely 5’-truncated (average insert size of 900bp in humans) and often inverted (Szak et al., 2002). Because they lack their 5’UTR, most of new inserts are non functional.
It has been shown that host cells regulate L1 retrotransposition activity, for example through epigenetic silencing. For example, the RNA interference (RNAi) mechanism of small interfering RNAs derived from L1 sequences can cause suppression of L1 retrotransposition. [32]
In plant genomes, epigenetic modification of LINEs can lead to expression changes of nearby genes and even to phenotypic changes: In the oil palm genome, methylation of a Karma-type LINE underlies the somaclonal, 'mantled' variant of this plant, responsible for drastic yield loss. [33]
Human APOBEC3C mediated restriction of LINE-1 elements were reported and it is due to the interaction between A3C with the ORF1p that affects the reverse transcriptase activity. [34]
A historic example of L1-conferred disease is Haemophilia A, which is caused by insertional mutagenesis. [35] There are nearly 100 examples of known diseases caused by retroelement insertions, including some types of cancer and neurological disorders. [36] Correlation between L1 mobilization and oncogenesis has been reported for epithelial cell cancer (carcinoma). [37] Hypomethylation of LINES is associated with chromosomal instability and altered gene expression [38] and is found in various cancer cell types in various tissues types. [39] [38] Hypomethylation of a specific L1 located in the MET onco gene is associated with bladder cancer tumorogenesis, [40] Shift work sleep disorder [41] is associated with increased cancer risk because light exposure at night reduces melatonin, a hormone that has been shown to reduce L1-induced genome instability. [42]
In genetics, complementary DNA (cDNA) is DNA synthesized from a single-stranded RNA template in a reaction catalyzed by the enzyme reverse transcriptase. cDNA is often used to express a specific protein in a cell that does not normally express that protein, or to sequence or quantify mRNA molecules using DNA based methods. cDNA that codes for a specific protein can be transferred to a recipient cell for expression, often bacterial or yeast expression systems. cDNA is also generated to analyze transcriptomic profiles in bulk tissue, single cells, or single nuclei in assays such as microarrays, qPCR, and RNA-seq.
In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA. The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as regulatory sequences, and often a substantial fraction of junk DNA with no evident function. Almost all eukaryotes have mitochondria and a small mitochondrial genome. Algae and plants also contain chloroplasts with a chloroplast genome.
A retrovirus is a type of virus that inserts a DNA copy of its RNA genome into the DNA of a host cell that it invades, thus changing the genome of that cell. After invading a host cell's cytoplasm, the virus uses its own reverse transcriptase enzyme to produce DNA from its RNA genome, the reverse of the usual pattern, thus retro (backwards). The new DNA is then incorporated into the host cell genome by an integrase enzyme, at which point the retroviral DNA is referred to as a provirus. The host cell then treats the viral DNA as part of its own genome, transcribing and translating the viral genes along with the cell's own genes, producing the proteins required to assemble new copies of the virus. Many retroviruses cause serious diseases in humans, other mammals, and birds.
Retroposons are repetitive DNA fragments which are inserted into chromosomes after they had been reverse transcribed from any RNA molecule.
A transposable element is a nucleic acid sequence in DNA that can change its position within a genome, sometimes creating or reversing mutations and altering the cell's genetic identity and genome size. Transposition often results in duplication of the same genetic material. In the human genome, L1 and Alu elements are two examples. Barbara McClintock's discovery of them earned her a Nobel Prize in 1983. Its importance in personalized medicine is becoming increasingly relevant, as well as gaining more attention in data analytics given the difficulty of analysis in very high dimensional spaces.
Pseudogenes are nonfunctional segments of DNA that resemble functional genes. Most arise as superfluous copies of functional genes, either directly by gene duplication or indirectly by reverse transcription of an mRNA transcript. Pseudogenes are usually identified when genome sequence analysis finds gene-like sequences that lack regulatory sequences needed for transcription or translation, or whose coding sequences are obviously defective due to frameshifts or premature stop codons. Pseudogenes are a type of junk DNA.
An Alu element is a short stretch of DNA originally characterized by the action of the Arthrobacter luteus (Alu) restriction endonuclease. Alu elements are the most abundant transposable elements, containing over one million copies dispersed throughout the human genome. Alu elements were thought to be selfish or parasitic DNA, because their sole known function is self reproduction. However, they are likely to play a role in evolution and have been used as genetic markers. They are derived from the small cytoplasmic 7SL RNA, a component of the signal recognition particle. Alu elements are highly conserved within primate genomes and originated in the genome of an ancestor of Supraprimates.
Retrotransposons are a type of genetic component that copy and paste themselves into different genomic locations (transposon) by converting RNA back into DNA through the reverse transcription process using an RNA transposition intermediate.
Metaviridae is a family of viruses which exist as Ty3-gypsy LTR retrotransposons in a eukaryotic host's genome. They are closely related to retroviruses: members of the family Metaviridae share many genomic elements with retroviruses, including length, organization, and genes themselves. This includes genes that encode reverse transcriptase, integrase, and capsid proteins. The reverse transcriptase and integrase proteins are needed for the retrotransposon activity of the virus. In some cases, virus-like particles can be formed from capsid proteins.
Endogenous retroviruses (ERVs) are endogenous viral elements in the genome that closely resemble and can be derived from retroviruses. They are abundant in the genomes of jawed vertebrates, and they comprise up to 5–8% of the human genome.
Exon shuffling is a molecular mechanism for the formation of new genes. It is a process through which two or more exons from different genes can be brought together ectopically, or the same exon can be duplicated, to create a new exon-intron structure. There are different mechanisms through which exon shuffling occurs: transposon mediated exon shuffling, crossover during sexual recombination of parental genomes and illegitimate recombination.
A long terminal repeat (LTR) is a pair of identical sequences of DNA, several hundred base pairs long, which occur in eukaryotic genomes on either end of a series of genes or pseudogenes that form a retrotransposon or an endogenous retrovirus or a retroviral provirus. All retroviral genomes are flanked by LTRs, while there are some retrotransposons without LTRs. Typically, an element flanked by a pair of LTRs will encode a reverse transcriptase and an integrase, allowing the element to be copied and inserted at a different location of the genome. Copies of such an LTR-flanked element can often be found hundreds or thousands of times in a genome. LTR retrotransposons comprise about 8% of the human genome.
A knockout rat is a genetically engineered rat with a single gene turned off through a targeted mutation used for academic and pharmaceutical research. Knockout rats can mimic human diseases and are important tools for studying gene function and for drug discovery and development. The production of knockout rats was not economically or technically feasible until 2008.
LTR retrotransposons are class I transposable element characterized by the presence of long terminal repeats (LTRs) directly flanking an internal coding region. As retrotransposons, they mobilize through reverse transcription of their mRNA and integration of the newly created cDNA into another location. Their mechanism of retrotransposition is shared with retroviruses, with the difference that most LTR-retrotransposons do not form infectious particles that leave the cells and therefore only replicate inside their genome of origin. Those that do (occasionally) form virus-like particles are classified under Ortervirales.
A conserved non-coding sequence (CNS) is a DNA sequence of noncoding DNA that is evolutionarily conserved. These sequences are of interest for their potential to regulate gene production.
LINE1 is a family of related class I transposable elements in the DNA of some organisms, classified with the long interspersed elements (LINEs). L1 transposons comprise approximately 17% of the human genome. These active L1s can interrupt the genome through insertions, deletions, rearrangements, and copy number variations. L1 activity has contributed to the instability and evolution of genomes and is tightly regulated in the germline by DNA methylation, histone modifications, and piRNA. L1s can further impact genome variation through mispairing and unequal crossing over during meiosis due to its repetitive DNA sequences.
Short interspersed nuclear elements (SINEs) are non-autonomous, non-coding transposable elements (TEs) that are about 100 to 700 base pairs in length. They are a class of retrotransposons, DNA elements that amplify themselves throughout eukaryotic genomes, often through RNA intermediates. SINEs compose about 13% of the mammalian genome.
Human somatic variations are somatic mutations both at early stages of development and in adult cells. These variations can lead either to pathogenic phenotypes or not, even if their function in healthy conditions is not completely clear yet.
Retrozymes are a family of retrotransposons first discovered in the genomes of plants but now also known in genomes of animals. Retrozymes contain a hammerhead ribozyme (HHR) in their sequences, although they do not possess any coding regions. Retrozymes are nonautonomous retroelements, and so borrow proteins from other elements to move into new regions of a genome. Retrozymes are actively transcribed into covalently closed circular RNAs and are detected in both polarities, which may indicate the use of rolling circle replication in their lifecycle.
Haig H. Kazazian, Jr. was a professor in the Department of Genetic Medicine at Johns Hopkins University School of Medicine in Baltimore, Maryland. Kazazian was an elected member of the National Academy of Sciences and the American Academy of Arts and Sciences.