LINE1 (an abbreviation of Long interspersed nuclear element-1, also known as L1 and LINE-1) is a family of related class I transposable elements in the DNA of many groups of eukaryotes, including animals and plants, classified with the long interspersed nuclear elements (LINEs). [1] L1 transposons are most ubiquitous in mammals, where they make up a significant fraction of the total genome length, [1] [2] for example they comprise approximately 17% of the human genome. [3] These active L1s can interrupt the genome through insertions, deletions, rearrangements, and copy number variations. [4] L1 activity has contributed to the instability and evolution of genomes and is tightly regulated in the germline by DNA methylation, histone modifications, and piRNA. [5] L1s can further impact genome variation through mispairing and unequal crossing over during meiosis due to its repetitive DNA sequences. [4]
L1 gene products are also required by many non-autonomous Alu and SVA SINE retrotransposons. Mutations induced by L1 and its non-autonomous counterparts have been found to cause a variety of heritable and somatic diseases. [6] [7]
In 2011, human L1 was reportedly discovered in the genome of the gonorrhea bacteria, evidently having arrived there by horizontal gene transfer. [8] [9]
A typical L1 element is approximately 6,000 base pairs (bp) long and consists of two non-overlapping open reading frames (ORFs) which are flanked by untranslated regions (UTRs) and target site duplications. In humans, ORF2 is thought to be translated by an unconventional termination/reinitiation mechanism, [10] while mouse L1s contain an internal ribosome entry site (IRES) upstream of each ORF. [11]
The 5' UTRs of mouse L1s contain a variable number of GC-rich tandemly repeated monomers of around 200 bp, followed by a short non-monomeric region. Human 5’ UTRs are ~900 bp in length and do not contain repeated motifs. All families of human L1s harbor in their most 5’ extremity a binding motif for the transcription factor YY1. [12] Younger families also have two binding sites for SOX-family transcription factors, and both YY1 and SOX sites were shown to be required for human L1 transcription initiation and activation. [13] [14] Both mouse and human 5’ UTRs also contain a weak antisense promoter of unknown function. [15] [16]
LINE-1 (L1.2) retrotransposable element ORF1 | |||||||
---|---|---|---|---|---|---|---|
Identifiers | |||||||
Symbol | L1RE1 | ||||||
Alt. symbols | L1ORF1p | ||||||
NCBI gene | 4029 | ||||||
HGNC | 6686 | ||||||
OMIM | 151626 | ||||||
PDB | 2LDY | ||||||
UniProt | Q9UN81 | ||||||
Other data | |||||||
Locus | Chr. 22 q12.1 | ||||||
Wikidata | Q18028646 | ||||||
|
The first ORF of L1 encodes a 500-amino acid, 40-kDa protein that lacks homology with any protein of known function. In vertebrates, it contains a conserved C-terminus domain and a highly variable coiled-coil N-terminus that mediates the formation of ORF1 trimeric complexes. ORF1 trimers have RNA-binding and nucleic acid chaperone activity that are necessary for retrotransposition. [17]
LINE-1 retrotransposable element ORF2 | |||||||
---|---|---|---|---|---|---|---|
Identifiers | |||||||
Symbol | L1RE2 | ||||||
Alt. symbols | L1ORF2p | ||||||
NCBI gene | 4030 | ||||||
HGNC | 6687 | ||||||
PDB | 1VYB | ||||||
UniProt | O00370 | ||||||
Other data | |||||||
Locus | Chr. 1 q | ||||||
Wikidata | Q18028649 | ||||||
|
The second ORF of L1 encodes a protein that has endonuclease and reverse transcriptase activity. The encoded protein has a molecular weight of 150 kDa. The structure of the ORF2 protein was solved in 2023. Its protein core contains three domains of unknown functions, termed "tower/EN-linker" and "wrist/RNA-binding domain" that bind Alu RNA's polyA tail and C-terminal domain that binds Alu RNA stem loop.
The nicking and reverse transcriptase activities of L1 ORF2p are boosted by single-stranded DNA structures likely present on the active replication forks. Unlike viral RTs, L1 ORF2p can be primed by RNA, including RNA hairpin primers produced by the Alu element.
As with other transposable elements, the host organism keeps a heavy check on LINE1 to prevent it from becoming overly active. In the primitive eukaryote Entamoeba histolytica , ORF2 is massively expressed in antisense, resulting in no detectable amounts of its protein product. [18]
L1 activity has been observed in numerous types of cancers, with particularly extensive insertions found in colorectal and lung cancers. [19] It is currently unclear if these insertions are causes or secondary effects of cancer progression. However, at least two cases have found somatic L1 insertions causative of cancer by disrupting the coding sequences of genes APC and PTEN in colon and endometrial cancer, respectively. [4]
Quantification of L1 copy number by qPCR or L1 methylation levels with bisulfite sequencing are used as diagnostic biomarkers in some types of cancers. L1 hypomethylation of colon tumor samples is correlated with cancer stage progression. [20] [21] Furthermore, less invasive blood assays for L1 copy number or methylation levels are indicative of breast or bladder cancer progression and may serve as methods for early detection. [22] [23]
Higher L1 copy numbers have been observed in the human brain compared to other organs. [24] [25] Studies of animal models and human cell lines have shown that L1s become active in neural progenitor cells (NPCs), and that experimental deregulation of or overexpression of L1 increases somatic mosaicism. This phenomenon is negatively regulated by Sox2, which is downregulated in NPCs, and by MeCP2 and methylation of the L1 5' UTR. [26] Human cell lines modeling the neurological disorder Rett syndrome, which carry MeCP2 mutations, exhibit increased L1 transposition, suggesting a link between L1 activity and neurological disorders. [27] [26] Current studies are aimed at investigating the potential roles of L1 activity in various neuropsychiatric disorders including schizophrenia, autism spectrum disorders, epilepsy, bipolar disorder, Tourette syndrome, and drug addiction. [28] L1s are also highly expressed in octopus brain, suggesting a convergent mechanism in complex cognition. [29]
Increased RNA levels of Alu, which requires L1 proteins, are associated with a form of age-related macular degeneration, a neurological disorder of the eyes. [30]
The naturally occurring mouse retinal degeneration model rd7 is caused by an L1 insertion in the Nr2e3 gene. [31]
In 2021, a study proposed that L1 elements may be responsible for potential endogenisation of the SARS-CoV-2 genome in Huh7 mutant cancer cells, [32] which would possibly explain why some patients test PCR positive for SARS-CoV-2 even after clearance of the virus. These results however have been criticized as "mechanistically plausible but likely very rare", [33] misleading and infrequent [34] or artefactual. [35]
A transposable element is a nucleic acid sequence in DNA that can change its position within a genome, sometimes creating or reversing mutations and altering the cell's genetic identity and genome size. Transposition often results in duplication of the same genetic material. In the human genome, L1 and Alu elements are two examples. Barbara McClintock's discovery of them earned her a Nobel Prize in 1983. Its importance in personalized medicine is becoming increasingly relevant, as well as gaining more attention in data analytics given the difficulty of analysis in very high dimensional spaces.
The human genome is a complete set of nucleic acid sequences for humans, encoded as DNA within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria. These are usually treated separately as the nuclear genome and the mitochondrial genome. Human genomes include both protein-coding DNA sequences and various types of DNA that does not encode proteins. The latter is a diverse category that includes DNA coding for non-translated RNA, such as that for ribosomal RNA, transfer RNA, ribozymes, small nuclear RNAs, and several types of regulatory RNAs. It also includes promoters and their associated gene-regulatory elements, DNA playing structural and replicatory roles, such as scaffolding regions, telomeres, centromeres, and origins of replication, plus large numbers of transposable elements, inserted viral DNA, non-functional pseudogenes and simple, highly repetitive sequences. Introns make up a large percentage of non-coding DNA. Some of this non-coding DNA is non-functional junk DNA, such as pseudogenes, but there is no firm consensus on the total amount of junk DNA.
In biology, epigenetics is the study of heritable traits, or a stable change of cell function, that happen without changes to the DNA sequence. The Greek prefix epi- in epigenetics implies features that are "on top of" or "in addition to" the traditional genetic mechanism of inheritance. Epigenetics usually involves a change that is not erased by cell division, and affects the regulation of gene expression. Such effects on cellular and physiological phenotypic traits may result from environmental factors, or be part of normal development. They can lead to cancer.
Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, proteins or non-coding RNA, and ultimately affect a phenotype. These products are often proteins, but in non-protein-coding genes such as transfer RNA (tRNA) and small nuclear RNA (snRNA), the product is a functional non-coding RNA. The process of gene expression is used by all known life—eukaryotes, prokaryotes, and utilized by viruses—to generate the macromolecular machinery for life.
The CpG sites or CG sites are regions of DNA where a cytosine nucleotide is followed by a guanine nucleotide in the linear sequence of bases along its 5' → 3' direction. CpG sites occur with high frequency in genomic regions called CpG islands.
Pseudogenes are nonfunctional segments of DNA that resemble functional genes. Most arise as superfluous copies of functional genes, either directly by gene duplication or indirectly by reverse transcription of an mRNA transcript. Pseudogenes are usually identified when genome sequence analysis finds gene-like sequences that lack regulatory sequences needed for transcription or translation, or whose coding sequences are obviously defective due to frameshifts or premature stop codons. Pseudogenes are a type of junk DNA.
An Alu element is a short stretch of DNA originally characterized by the action of the Arthrobacter luteus (Alu) restriction endonuclease. Alu elements are the most abundant transposable elements in the human genome, present in excess of one million copies. Alu elements were thought to be selfish or parasitic DNA, because their sole known function is self reproduction. However, they are likely to play a role in evolution and have been used as genetic markers. They are derived from the small cytoplasmic 7SL RNA, a component of the signal recognition particle. Alu elements are highly conserved within primate genomes and originated in the genome of an ancestor of Supraprimates.
Retrotransposons are mobile elements which move in the host genome by converting their transcribed RNA into DNA through the reverse transcription. Thus, they differ from Class II transposable elements, or DNA transposons, in utilizing an RNA intermediate for the transposition and leaving the transposition donor site unchanged.
In biology, the epigenome of an organism is the collection of chemical changes to its DNA and histone proteins that affects when, where, and how the DNA is expressed; these changes can be passed down to an organism's offspring via transgenerational epigenetic inheritance. Changes to the epigenome can result in changes to the structure of chromatin and changes to the function of the genome. The human epigenome, including DNA methylation and histone modification, is maintained through cell division. The epigenome is essential for normal development and cellular differentiation, enabling cells with the same genetic code to perform different functions. The human epigenome is dynamic and can be influenced by environmental factors such as diet, stress, and toxins.
Endogenous retroviruses (ERVs) are endogenous viral elements in the genome that closely resemble and can be derived from retroviruses. They are abundant in the genomes of jawed vertebrates, and they comprise up to 5–8% of the human genome.
DNA excision repair protein ERCC-1 is a protein that in humans is encoded by the ERCC1 gene. Together with ERCC4, ERCC1 forms the ERCC1-XPF enzyme complex that participates in DNA repair and DNA recombination.
Probable DNA dC->dU-editing enzyme APOBEC-3B is a protein that in humans is encoded by the APOBEC3B gene.
A knockout rat is a genetically engineered rat with a single gene turned off through a targeted mutation used for academic and pharmaceutical research. Knockout rats can mimic human diseases and are important tools for studying gene function and for drug discovery and development. The production of knockout rats was not economically or technically feasible until 2008.
N6-Methyladenosine (m6A) was originally identified and partially characterised in the 1970s, and is an abundant modification in mRNA and DNA. It is found within some viruses, and most eukaryotes including mammals, insects, plants and yeast. It is also found in tRNA, rRNA, and small nuclear RNA (snRNA) as well as several long non-coding RNA, such as Xist.
Long interspersed nuclear elements (LINEs) are a group of non-LTR retrotransposons that are widespread in the genome of many eukaryotes. LINEs contain an internal Pol II promoter to initiate transcription into mRNA, and encode one or two proteins, ORF1 and ORF2. The functional domains present within ORF1 vary greatly among LINEs, but often exhibit RNA/DNA binding activity. ORF2 is essential to successful retrotransposition, and encodes a protein with both reverse transcriptase and endonuclease activity.
Transposable elements are pieces of genetic material that are capable of splicing themselves into a host genome and then self propagating throughout the genome, much like a virus. Retrotransposons are a subset of transposable elements that use an RNA intermediate and reverse transcribe themselves into the genome. Retrotransposon proliferation may lead to insertional mutagenesis, disrupt the process of DNA repair, or cause errors during chromosomal crossover, and so it is advantageous for an organism to possess the means to suppress or "silence" retrotransposon activity.
Short interspersed nuclear elements (SINEs) are non-autonomous, non-coding transposable elements (TEs) that are about 100 to 700 base pairs in length. They are a class of retrotransposons, DNA elements that amplify themselves throughout eukaryotic genomes, often through RNA intermediates. SINEs compose about 13% of the mammalian genome.
FANTOM is an international research consortium first established in 2000 as part of the RIKEN research institute in Japan. The original meeting gathered international scientists from diverse backgrounds to help annotate the function of mouse cDNA clones generated by the Hayashizaki group. Since the initial FANTOM1 effort, the consortium has released multiple projects that look to understand the mechanisms governing the regulation of mammalian genomes. Their work has generated a large collection of shared data and helped advance biochemical and bioinformatic methodologies in genomics research.
RNA-directed DNA methylation (RdDM) is a biological process in which non-coding RNA molecules direct the addition of DNA methylation to specific DNA sequences. The RdDM pathway is unique to plants, although other mechanisms of RNA-directed chromatin modification have also been described in fungi and animals. To date, the RdDM pathway is best characterized within angiosperms, and particularly within the model plant Arabidopsis thaliana. However, conserved RdDM pathway components and associated small RNAs (sRNAs) have also been found in other groups of plants, such as gymnosperms and ferns. The RdDM pathway closely resembles other sRNA pathways, particularly the highly conserved RNAi pathway found in fungi, plants, and animals. Both the RdDM and RNAi pathways produce sRNAs and involve conserved Argonaute, Dicer and RNA-dependent RNA polymerase proteins.
Haig Hagop Kazazian Jr. was an American professor in the Department of Genetic Medicine at Johns Hopkins University School of Medicine in Baltimore, Maryland. Kazazian was an elected member of the National Academy of Sciences and the American Academy of Arts and Sciences.