Long non-coding RNAs (long ncRNAs, lncRNA) are a type of RNA, generally defined as transcripts more than 200 nucleotides that are not translated into protein. [2] This arbitrary limit distinguishes long ncRNAs from small non-coding RNAs, such as microRNAs (miRNAs), small interfering RNAs (siRNAs), Piwi-interacting RNAs (piRNAs), small nucleolar RNAs (snoRNAs), and other short RNAs. [3] Given that some lncRNAs have been reported to have the potential to encode small proteins or micro-peptides, the latest definition of lncRNA is a class of transcripts of over 200 nucleotides that have no or limited coding capacity. [4] However, John S. Mattick and colleagues suggested to change definition of long non-coding RNAs to transcripts more than 500 nt, which are mostly generated by Pol II. [5] That means that question of lncRNA exact definition is still under discussion in the field. Long intervening/intergenic noncoding RNAs (lincRNAs) are sequences of transcripts that do not overlap protein-coding genes. [6]
Long non-coding RNAs include intergenic lincRNAs, intronic ncRNAs, and sense and antisense lncRNAs, each type showing different genomic positions in relation to genes and exons. [1] [3]
The definition of lncRNAs differs from that of other RNAs such as siRNAs, mRNAs, miRNAs, and snoRNAs because it is not connected to the function of the RNA. A lncRNA is any transcript that is not one of the other well-characterized RNAs and is longer than 200-500 nucleotides. Some scientists think that most lncRNAs do not have a biologically relevant function because they are transcripts of junk DNA. [7] [8]
Long non-coding transcripts are found in many species. Large-scale complementary DNA (cDNA) sequencing projects such as FANTOM reveal the complexity of these transcripts in humans. [9] The FANTOM3 project identified ~35,000 non-coding transcripts that bear many signatures of messenger RNAs, including 5' capping, splicing, and poly-adenylation, but have little or no open reading frame (ORF). [9] This number represents a conservative lower estimate, since it omitted many singleton transcripts and non-polyadenylated transcripts (tiling array data shows more than 40% of transcripts are non-polyadenylated). [10] Identifying ncRNAs within these cDNA libraries is challenging since it can be difficult to distinguish protein-coding transcripts from non-coding transcripts. It has been suggested through multiple studies that testis, [11] and neural tissues express the greatest amount of long non-coding RNAs of any tissue type. [12] Using FANTOM5, 27,919 long ncRNAs have been identified in various human sources. [13]
Quantitatively, lncRNAs demonstrate ~10-fold lower abundance than mRNAs, [14] [15] which is explained by higher cell-to-cell variation of expression levels of lncRNA genes in the individual cells, when compared to protein-coding genes. [16] In general, the majority (~78%) of lncRNAs are characterized as tissue-specific, as opposed to only ~19% of mRNAs. [14] Only 3.6% of human lncRNA genes are expressed in various biological contexts and 34% of lncRNA genes are expressed at high level (top 25% of both lncRNAs and mRNAs) in at least one biological context. [17] In addition to higher tissue specificity, lncRNAs are characterized by higher developmental stage specificity, [18] and cell subtype specificity in tissues such as human neocortex [19] and other parts of the brain, regulating correct brain development and function. [20] In 2022, a comprehensive integration of lncRNAs from existing databases, revealed that there are 95,243 lncRNA genes and 323,950 transcripts in humans. [21]
In comparison to mammals relatively few studies have focused on the prevalence of lncRNAs in plants. However an extensive study considering 37 higher plant species and six algae identified ~200,000 non-coding transcripts using an in-silico approach, [22] which also established the associated Green Non-Coding Database (GreeNC), a repository of plant lncRNAs.
In 2005 the landscape of the mammalian genome was described as numerous 'foci' of transcription that are separated by long stretches of intergenic space. [9] While some long ncRNAs are located within the intergenic stretches, the majority are overlapping sense and antisense transcripts that often include protein-coding genes, [23] giving rise to a complex hierarchy of overlapping isoforms. [24] Genomic sequences within these transcriptional foci are often shared within a number of coding and non-coding transcripts in the sense and antisense directions [25] For example, 3012 out of 8961 cDNAs previously annotated as truncated coding sequences within FANTOM2 were later designated as genuine ncRNA variants of protein-coding cDNAs. [9] While the abundance and conservation of these arrangements suggest they have biological relevance, the complexity of these foci frustrates easy evaluation.
The GENCODE consortium has collated and analysed a comprehensive set of human lncRNA annotations and their genomic organisation, modifications, cellular locations and tissue expression profiles. [12] Their analysis indicates human lncRNAs show a bias toward two-exon transcripts. [12]
Name | Taxonomic group | Web server | Repository | Input file | Main model / algorithm | Training set | Year published | Reference |
---|---|---|---|---|---|---|---|---|
PLEKv2 | Plant, animal | PLEKv2 Paper | PLEKv2 source code | FASTA | CNN | No | 2024 | [26] |
DeepPlnc | Plant | DeepPlnc Server | DeepPlnc | FASTA | Neural network | Yes | 2022 | [27] |
RNAsamba | All | RNAsamba | RNAsamba | FASTA | Neural network | Yes | 2020 | [28] |
LGC | Plant, animal | LGC | FASTA, BED, GTF | Relationship between ORF length and GC content | No | 2019 | [29] | |
CPAT | Human, fly, mouse, zebrafish | CPAT | CPAT | FASTA/BED | Logistic regression | Yes | 2013 | [30] |
COME | Plant, human, mouse, fly, worm | COME | COME | GTF | Random forest | Yes | 2017 | [31] |
CNCI | Plant, animal | NA | FASTA, GTF | Support vector machine | No | 2013 | [32] | |
PLEK | Vertebrate, Plant | NA | PLEK | FASTA | Support vector machine | No | 2014 | [33] |
FEELnc | All | NA | FEELnc | FASTA, GTF | Random forest | Yes | 2017 | [34] |
PhyloCSF | Vertebrate, fly, mosquito, yeast, worm | NA | FASTA | Phylogenetic codon model | Yes | 2011 | [35] | |
slncky | All | NA | slncky | FASTA, BED | Evolutionary conservation | Yes | 2016 | [36] |
There has been considerable debate about whether lncRNAs have been misannotated and do in fact encode proteins. Several lncRNAs have been found to in fact encode for peptides with biologically significant function. [37] [38] [39] Ribosome profiling studies have suggested that anywhere from 40% to 90% of annotated lncRNAs are in fact translated, [40] [41] although there is disagreement about the correct method for analyzing ribosome profiling data. [42] Additionally, it is thought that many of the peptides produced by lncRNAs may be highly unstable and without biological function. [41]
Unlike protein coding genes, sequence of long non-coding RNAs has lower level of conservation. Initial studies into lncRNA conservation noted that as a class, they were enriched for conserved sequence elements, [43] depleted in substitution and insertion/deletion rates [44] and depleted in rare frequency variants, [45] indicative of purifying selection maintaining lncRNA function. However, further investigations into vertebrate lncRNAs revealed that while lncRNAs are conserved in sequence, they are not conserved in transcription. [46] [47] [11] In other words, even when the sequence of a human lncRNA is conserved in another vertebrate species, there is often no transcription of a lncRNA in the orthologous genomic region. Some argue that these observations suggest non-functionality of the majority of lncRNAs, [48] [49] [7] while others argue that they may be indicative of rapid species-specific adaptive selection. [50]
While the turnover of lncRNA transcription is much higher than initially expected, it is important to note that still, hundreds of lncRNAs are conserved at the sequence level. There have been several attempts to delineate the different categories of selection signatures seen amongst lncRNAs including: lncRNAs with strong sequence conservation across the entire length of the gene, lncRNAs in which only a portion of the transcript (e.g. 5′ end, splice sites) is conserved, and lncRNAs that are transcribed from syntenic regions of the genome but have no recognizable sequence similarity. [51] [52] [53] Additionally, there have been attempts to identify conserved secondary structures in lncRNAs, though these studies have currently given way to conflicting results. [54] [55]
Despite claims that the majority of long noncoding RNAs in mammals are likely to be functional, [56] [57] it seems likely that most of them are transcriptional noise and only a relatively small proportion has been demonstrated to be biologically relevant. [7] [8]
Some lncRNAs have been functionally annotated in LncRNAdb (a database of literature described lncRNAs), [58] [59] with the majority of these being described in humans. Over 2600 human lncRNAs with experimental evidences have been community-curated in LncRNAWiki (a wiki-based, publicly editable and open-content platform for community curation of human lncRNAs). [60] According to the curation of functional mechanisms of lncRNAs based on the literatures, lncRNAs are extensively reported to be involved in ceRNA regulation, transcriptional regulation, and epigenetic regulation. [60] A further large-scale sequencing study provides evidence that many transcripts thought to be lncRNAs may, in fact, be translated into proteins. [61]
In eukaryotes, RNA transcription is a tightly regulated process. Noncoding RNAs act upon different aspects of this process, targeting transcriptional modulators, RNA polymerase (RNAP) II and even the DNA duplex to regulate gene expression. [62]
NcRNAs modulate transcription by several mechanisms, including functioning themselves as co-regulators, modifying transcription factor activity, or regulating the association and activity of co-regulators. For example, the noncoding RNA Evf-2 functions as a co-activator for the homeobox transcription factor Dlx2, which plays important roles in forebrain development and neurogenesis. [63] [64] Sonic hedgehog induces transcription of Evf-2 from an ultra-conserved element located between the Dlx5 and Dlx6 genes during forebrain development. [63] Evf-2 then recruits the Dlx2 transcription factor to the same ultra-conserved element whereby Dlx2 subsequently induces expression of Dlx5. The existence of other similar ultra- or highly conserved elements within the mammalian genome that are both transcribed and fulfill enhancer functions suggest Evf-2 may be illustrative of a generalised mechanism that regulates developmental genes with complex expression patterns during vertebrate growth. [65] [66] Indeed, the transcription and expression of similar non-coding ultraconserved elements was shown to be abnormal in human leukaemia and to contribute to apoptosis in colon cancer cells, suggesting their involvement in tumorigenesis in like fashion to protein-coding RNA. [67] [68] [69]
Local ncRNAs can also recruit transcriptional programmes to regulate adjacent protein-coding gene expression.
The RNA binding protein TLS binds and inhibits the CREB binding protein and p300 histone acetyltransferase activities on a repressed gene target, cyclin D1. The recruitment of TLS to the promoter of cyclin D1 is directed by long ncRNAs expressed at low levels and tethered to 5' regulatory regions in response to DNA damage signals. [70] Moreover, these local ncRNAs act cooperatively as ligands to modulate the activities of TLS. In the broad sense, this mechanism allows the cell to harness RNA-binding proteins, which make up one of the largest classes within the mammalian proteome, and integrate their function in transcriptional programs. Nascent long ncRNAs have been shown to increase the activity of CREB binding protein, which in turn increases the transcription of that ncRNA. [71] A study found that a lncRNA in the antisense direction of the Apolipoprotein A1 (APOA1) regulates the transcription of APOA1 through epigenetic modifications. [72]
Recent evidence has raised the possibility that transcription of genes that escape from X-inactivation might be mediated by expression of long non-coding RNA within the escaping chromosomal domains. [73]
NcRNAs also target general transcription factors required for the RNAP II transcription of all genes. [62] These general factors include components of the initiation complex that assemble on promoters or involved in transcription elongation. A ncRNA transcribed from an upstream minor promoter of the dihydrofolate reductase (DHFR) gene forms a stable RNA-DNA triplex within the major promoter of DHFR to prevent the binding of the transcriptional co-factor TFIIB. [74] This novel mechanism of regulating gene expression may represent a widespread method of controlling promoter usage, as thousands of RNA-DNA triplexes exist in eukaryotic chromosome. [75] The U1 ncRNA can induce transcription by binding to and stimulating TFIIH to phosphorylate the C-terminal domain of RNAP II. [76] In contrast the ncRNA 7SK is able to repress transcription elongation by, in combination with HEXIM1/2, forming an inactive complex that prevents PTEFb from phosphorylating the C-terminal domain of RNAP II, [76] [77] [78] repressing global elongation under stressful conditions. These examples, which bypass specific modes of regulation at individual promoters provide a means of quickly affecting global changes in gene expression.
The ability to quickly mediate global changes is also apparent in the rapid expression of non-coding repetitive sequences. The short interspersed nuclear (SINE) Alu elements in humans and analogous B1 and B2 elements in mice have succeeded in becoming the most abundant mobile elements within the genomes, comprising ~10% of the human and ~6% of the mouse genome, respectively. [79] [80] These elements are transcribed as ncRNAs by RNAP III in response to environmental stresses such as heat shock, [81] where they then bind to RNAP II with high affinity and prevent the formation of active pre-initiation complexes. [82] [83] [84] [85] This allows for the broad and rapid repression of gene expression in response to stress. [82] [85]
A dissection of the functional sequences within Alu RNA transcripts has drafted a modular structure analogous to the organization of domains in protein transcription factors. [86] The Alu RNA contains two 'arms', each of which may bind one RNAP II molecule, as well as two regulatory domains that are responsible for RNAP II transcriptional repression in vitro. [85] These two loosely structured domains may even be concatenated to other ncRNAs such as B1 elements to impart their repressive role. [85] The abundance and distribution of Alu elements and similar repetitive elements throughout the mammalian genome may be partly due to these functional domains being co-opted into other long ncRNAs during evolution, with the presence of functional repeat sequence domains being a common characteristic of several known long ncRNAs including Kcnq1ot1, Xlsirt and Xist. [87] [88] [89] [90]
In addition to heat shock, the expression of SINE elements (including Alu, B1, and B2 RNAs) increases during cellular stress such as viral infection [91] in some cancer cells [92] where they may similarly regulate global changes to gene expression. The ability of Alu and B2 RNA to bind directly to RNAP II provides a broad mechanism to repress transcription. [83] [85] Nevertheless, there are specific exceptions to this global response where Alu or B2 RNAs are not found at activated promoters of genes undergoing induction, such as the heat shock genes. [85] This additional hierarchy of regulation that exempts individual genes from the generalised repression also involves a long ncRNA, heat shock RNA-1 (HSR-1). It was argued that HSR-1 is present in mammalian cells in an inactive state, but upon stress is activated to induce the expression of heat shock genes. [93] This activation involves a conformational alteration of HSR-1 in response to rising temperatures, permitting its interaction with the transcriptional activator HSF-1, which trimerizes and induces the expression of heat shock genes. [93] In the broad sense, these examples illustrate a regulatory circuit nested within ncRNAs whereby Alu or B2 RNAs repress general gene expression, while other ncRNAs activate the expression of specific genes.
Many of the ncRNAs that interact with general transcription factors or RNAP II itself (including 7SK, Alu and B1 and B2 RNAs) are transcribed by RNAP III, [94] uncoupling their expression from RNAP II, which they regulate. RNAP III also transcribes other ncRNAs, such as BC2, BC200 and some microRNAs and snoRNAs, in addition to housekeeping ncRNA genes such as tRNAs, 5S rRNAs and snRNAs. [94] The existence of an RNAP III-dependent ncRNA transcriptome that regulates its RNAP II-dependent counterpart is supported by the finding of a set of ncRNAs transcribed by RNAP III with sequence homology to protein-coding genes. This prompted the authors to posit a 'cogene/gene' functional regulatory network, [95] showing that one of these ncRNAs, 21A, regulates the expression of its antisense partner gene, CENP-F in trans.
In addition to regulating transcription, ncRNAs also control various aspects of post-transcriptional mRNA processing. Similar to small regulatory RNAs such as microRNAs and snoRNAs, these functions often involve complementary base pairing with the target mRNA. The formation of RNA duplexes between complementary ncRNA and mRNA may mask key elements within the mRNA required to bind trans-acting factors, potentially affecting any step in post-transcriptional gene expression including pre-mRNA processing and splicing, transport, translation, and degradation. [96]
The splicing of mRNA can induce its translation and functionally diversify the repertoire of proteins it encodes. The Zeb2 mRNA requires the retention of a 5'UTR intron that contains an internal ribosome entry site for efficient translation. [97] The retention of the intron depends on the expression of an antisense transcript that complements the intronic 5' splice site. [97] Therefore, the ectopic expression of the antisense transcript represses splicing and induces translation of the Zeb2 mRNA during mesenchymal development. Likewise, the expression of an overlapping antisense Rev-ErbAa2 transcript controls the alternative splicing of the thyroid hormone receptor ErbAa2 mRNA to form two antagonistic isoforms. [98]
NcRNA may also apply additional regulatory pressures during translation, a property particularly exploited in neurons where the dendritic or axonal translation of mRNA in response to synaptic activity contributes to changes in synaptic plasticity and the remodelling of neuronal networks. The RNAP III transcribed BC1 and BC200 ncRNAs, that previously derived from tRNAs, are expressed in the mouse and human central nervous system, respectively. [99] [100] BC1 expression is induced in response to synaptic activity and synaptogenesis and is specifically targeted to dendrites in neurons. [101] Sequence complementarity between BC1 and regions of various neuron-specific mRNAs also suggest a role for BC1 in targeted translational repression. [102] Indeed, it was recently shown that BC1 is associated with translational repression in dendrites to control the efficiency of dopamine D2 receptor-mediated transmission in the striatum [103] and BC1 RNA-deleted mice exhibit behavioural changes with reduced exploration and increased anxiety. [104]
In addition to masking key elements within single-stranded RNA, the formation of double-stranded RNA duplexes can also provide a substrate for the generation of endogenous siRNAs (endo-siRNAs) in Drosophila and mouse oocytes. [105] The annealing of complementary sequences, such as antisense or repetitive regions between transcripts, forms an RNA duplex that may be processed by Dicer-2 into endo-siRNAs. Also, long ncRNAs that form extended intramolecular hairpins may be processed into siRNAs, compellingly illustrated by the esi-1 and esi-2 transcripts. [106] Endo-siRNAs generated from these transcripts seem particularly useful in suppressing the spread of mobile transposon elements within the genome in the germline. However, the generation of endo-siRNAs from antisense transcripts or pseudogenes may also silence the expression of their functional counterparts via RISC effector complexes, acting as an important node that integrates various modes of long and short RNA regulation, as exemplified by the Xist and Tsix (see above). [107]
Epigenetic modifications, including histone and DNA methylation, histone acetylation and sumoylation, affect many aspects of chromosomal biology, primarily including regulation of large numbers of genes by remodeling broad chromatin domains. [108] [109] While it has been known for some time that RNA is an integral component of chromatin, [110] [111] it is only recently that we are beginning to appreciate the means by which RNA is involved in pathways of chromatin modification. [112] [113] [114] For example, Oplr16 epigenetically induces the activation of stem cell core factors by coordinating intrachromosomal looping and recruitment of DNA demethylase TET2. [115]
In Drosophila , long ncRNAs induce the expression of the homeotic gene, Ubx, by recruiting and directing the chromatin modifying functions of the trithorax protein Ash1 to Hox regulatory elements. [114] Similar models have been proposed in mammals, where strong epigenetic mechanisms are thought to underlie the embryonic expression profiles of the Hox genes that persist throughout human development. [116] [113] Indeed, the human Hox genes are associated with hundreds of ncRNAs that are sequentially expressed along both the spatial and temporal axes of human development and define chromatin domains of differential histone methylation and RNA polymerase accessibility. [113] One ncRNA, termed HOTAIR, that originates from the HOXC locus represses transcription across 40 kb of the HOXD locus by altering chromatin trimethylation state. HOTAIR is thought to achieve this by directing the action of Polycomb chromatin remodeling complexes in trans to govern the cells' epigenetic state and subsequent gene expression. Components of the Polycomb complex, including Suz12, EZH2 and EED, contain RNA binding domains that may potentially bind HOTAIR and probably other similar ncRNAs. [117] [118] [119] This example nicely illustrates a broader theme whereby ncRNAs recruit the function of a generic suite of chromatin modifying proteins to specific genomic loci, underscoring the complexity of recently published genomic maps. [109] Indeed, the prevalence of long ncRNAs associated with protein coding genes may contribute to localised patterns of chromatin modifications that regulate gene expression during development. For example, the majority of protein-coding genes have antisense partners, including many tumour suppressor genes that are frequently silenced by epigenetic mechanisms in cancer. [120] A recent study observed an inverse expression profile of the p15 gene and an antisense ncRNA in leukaemia. [120] A detailed analysis showed the p15 antisense ncRNA (CDKN2BAS) was able to induce changes to heterochromatin and DNA methylation status of p15 by an unknown mechanism, thereby regulating p15 expression. [120] Therefore, misexpression of the associated antisense ncRNAs may subsequently silence the tumour suppressor gene contributing towards cancer.
Many emergent themes of ncRNA-directed chromatin modification were first apparent within the phenomenon of imprinting, whereby only one allele of a gene is expressed from either the maternal or the paternal chromosome. In general, imprinted genes are clustered together on chromosomes, suggesting the imprinting mechanism acts upon local chromosome domains rather than individual genes. These clusters are also often associated with long ncRNAs whose expression is correlated with the repression of the linked protein-coding gene on the same allele. [121] Indeed, detailed analysis has revealed a crucial role for the ncRNAs Kcnqot1 and Igf2r/Air in directing imprinting. [122]
Almost all the genes at the Kcnq1 loci are maternally inherited, except the paternally expressed antisense ncRNA Kcnqot1. [123] Transgenic mice with truncated Kcnq1ot fail to silence the adjacent genes, suggesting that Kcnqot1 is crucial to the imprinting of genes on the paternal chromosome. [124] It appears that Kcnqot1 is able to direct the trimethylation of lysine 9 (H3K9me3) and 27 of histone 3 (H3K27me3) to an imprinting centre that overlaps the Kcnqot1 promoter and actually resides within a Kcnq1 sense exon. [125] Similar to HOTAIR (see above), Eed-Ezh2 Polycomb complexes are recruited to the Kcnq1 loci paternal chromosome, possibly by Kcnqot1, where they may mediate gene silencing through repressive histone methylation. [125] A differentially methylated imprinting centre also overlaps the promoter of a long antisense ncRNA Air that is responsible for the silencing of neighbouring genes at the Igf2r locus on the paternal chromosome. [126] [127] The presence of allele-specific histone methylation at the Igf2r locus suggests Air also mediates silencing via chromatin modification. [128]
The inactivation of a X-chromosome in female placental mammals is directed by one of the earliest and best characterized long ncRNAs, Xist. [129] The expression of Xist from the future inactive X-chromosome, and its subsequent coating of the inactive X-chromosome, occurs during early embryonic stem cell differentiation. Xist expression is followed by irreversible layers of chromatin modifications that include the loss of the histone (H3K9) acetylation and H3K4 methylation that are associated with active chromatin, and the induction of repressive chromatin modifications including H4 hypoacetylation, H3K27 trimethylation, [129] H3K9 hypermethylation and H4K20 monomethylation as well as H2AK119 monoubiquitylation. These modifications coincide with the transcriptional silencing of the X-linked genes. [130] Xist RNA also localises the histone variant macroH2A to the inactive X–chromosome. [131] There are additional ncRNAs that are also present at the Xist loci, including an antisense transcript Tsix, which is expressed from the future active chromosome and able to repress Xist expression by the generation of endogenous siRNA. [107] Together these ncRNAs ensure that only one X-chromosome is active in female mammals.
Telomeres form the terminal region of mammalian chromosomes and are essential for stability and aging and play central roles in diseases such as cancer. [132] Telomeres have been long considered transcriptionally inert DNA-protein complexes until it was shown in the late 2000s that telomeric repeats may be transcribed as telomeric RNAs (TelRNAs) [133] or telomeric repeat-containing RNAs. [134] These ncRNAs are heterogeneous in length, transcribed from several sub-telomeric loci and physically localise to telomeres. Their association with chromatin, which suggests an involvement in regulating telomere specific heterochromatin modifications, is repressed by SMG proteins that protect chromosome ends from telomere loss. [134] In addition, TelRNAs block telomerase activity in vitro and may therefore regulate telomerase activity. [133] Although early, these studies suggest an involvement for telomeric ncRNAs in various aspects of telomere biology.
Asynchronously replicating autosomal RNAs (ASARs) are very long (~200kb) non-coding RNAs that are non-spliced, non-polyadenylated, and are required for normal DNA replication timing and chromosome stability. [135] [136] [137] Deletion of any one of the genetic loci containing ASAR6, ASAR15, or ASAR6-141 results in the same phenotype of delayed replication timing and delayed mitotic condensation (DRT/DMC) of the entire chromosome. DRT/DMC results in chromosomal segregation errors that lead to increased frequency of secondary rearrangements and an unstable chromosome. Similar to Xist, ASARs show random monoallelic expression and exist in asynchronous DNA replication domains. Although the mechanism of ASAR function is still under investigation, it is hypothesized that they work via similar mechanisms as the Xist lncRNA, but on smaller autosomal domains resulting in allele specific changes in gene expression.
Incorrect reparation of DNA double-strand breaks (DSB) leading to chromosomal rearrangements is one of the oncogenesis's primary causes. A number of lncRNAs are crucial at the different stages of the main pathways of DSB repair in eukaryotic cells: nonhomologous end joining (NHEJ) and homology-directed repair (HDR). Gene mutations or variation in expression levels of such RNAs can lead to local DNA repair defects, increasing the chromosome aberration frequency. Moreover, it was demonstrated that some RNAs could stimulate long-range chromosomal rearrangements. [138]
The discovery that long ncRNAs function in various aspects of cell biology has led to research on their role in disease. Tens of thousands of lncRNAs are potentially associated with diseases based on the multi-omics evidence. [139] A handful of studies have implicated long ncRNAs in a variety of disease states and support an involvement and co-operation in neurological disease and cancer.
The first published report of an alteration in lncRNA abundance in aging and human neurological disease was provided by Lukiw et al. [140] in a study using short post-mortem interval tissues from patients with Alzheimer's disease and non-Alzheimer's dementia (NAD) ; this early work was based on the prior identification of a primate brain-specific cytoplasmic transcript of the Alu repeat family by Watson and Sutcliffe in 1987 known as BC200 (brain, cytoplasmic, 200 nucleotide). [141]
While many association studies have identified unusual expression of long ncRNAs in disease states, there is little understanding of their role in causing disease. Expression analyses that compare tumor cells and normal cells have revealed changes in the expression of ncRNAs in several forms of cancer. For example, in prostate tumours, PCGEM1 (one of two overexpressed ncRNAs) is correlated with increased proliferation and colony formation suggesting an involvement in regulating cell growth. [142] PRNCR1 was found to promote tumor growth in several malignancies like prostate cancer, breast cancer, non-small cell lung cancer, oral squamous cell carcinoma and colorectal cancer. [143] MALAT1 (also known as NEAT2) was originally identified as an abundantly expressed ncRNA that is upregulated during metastasis of early-stage non-small cell lung cancer and its overexpression is an early prognostic marker for poor patient survival rates. [142] LncRNAs such as HEAT2 or KCNQ1OT1 have been shown to be regulated in the blood of patients with cardiovascular diseases such as heart failure or coronary artery disease and, moreover, to predict cardiovascular disease events. [144] [145] More recently, the highly conserved mouse homologue of MALAT1 was found to be highly expressed in hepatocellular carcinoma. [146] Intronic antisense ncRNAs with expression correlated to the degree of tumor differentiation in prostate cancer samples have also been reported. [147] Despite a number of long ncRNAs having aberrant expression in cancer, their function and potential role in tumourigenesis is relatively unknown. For example, the ncRNAs HIS-1 and BIC have been implicated in cancer development and growth control, but their function in normal cells is unknown. [148] [149] In addition to cancer, ncRNAs also exhibit aberrant expression in other disease states. Overexpression of PRINS is associated with psoriasis susceptibility, with PRINS expression being elevated in the uninvolved epidermis of psoriatic patients compared with both psoriatic lesions and healthy epidermis. [150]
Genome-wide profiling revealed that many transcribed non-coding ultraconserved regions exhibit distinct profiles in various human cancer states. [68] An analysis of chronic lymphocytic leukaemia, colorectal carcinoma and hepatocellular carcinoma found that all three cancers exhibited aberrant expression profiles for ultraconserved ncRNAs relative to normal cells. Further analysis of one ultraconserved ncRNA suggested it behaved like an oncogene by mitigating apoptosis and subsequently expanding the number of malignant cells in colorectal cancers. [68] Many of these transcribed ultraconserved sites that exhibit distinct signatures in cancer are found at fragile sites and genomic regions associated with cancer. It seems likely that the aberrant expression of these ultraconserved ncRNAs within malignant processes results from important functions they fulfil in normal human development.
Recently, a number of association studies examining single nucleotide polymorphisms (SNPs) associated with disease states have been mapped to long ncRNAs. For example, SNPs that identified a susceptibility locus for myocardial infarction mapped to a long ncRNA, MIAT (myocardial infarction associated transcript). [151] Likewise, genome-wide association studies identified a region associated with coronary artery disease [152] that encompassed a long ncRNA, ANRIL. [153] ANRIL is expressed in tissues and cell types affected by atherosclerosis [154] [155] and its altered expression is associated with a high-risk haplotype for coronary artery disease. [155] [156] Lately there has been increasing evidence on the role of non-coding RNAs in the development and in the categorization of heart failure. [157]
The complexity of the transcriptome, and our evolving understanding of its structure may inform a reinterpretation of the functional basis for many natural polymorphisms associated with disease states. Many SNPs associated with certain disease conditions are found within non-coding regions and the complex networks of non-coding transcription within these regions make it particularly difficult to elucidate the functional effects of polymorphisms. For example, a SNP both within the truncated form of ZFAT and the promoter of an antisense transcript increases the expression of ZFAT not through increasing the mRNA stability, but rather by repressing the expression of the antisense transcript. [158]
The ability of long ncRNAs to regulate associated protein-coding genes may contribute to disease if misexpression of a long ncRNA deregulates a protein coding gene with clinical significance. In similar manner, an antisense long ncRNA that regulates the expression of the sense BACE1 gene, a crucial enzyme in Alzheimer's disease etiology, exhibits elevated expression in several regions of the brain in individuals with Alzheimer's disease [159] Alteration of the expression of ncRNAs may also mediate changes at an epigenetic level to affect gene expression and contribute to disease aetiology. For example, the induction of an antisense transcript by a genetic mutation led to DNA methylation and silencing of sense genes, causing β-thalassemia in a patient. [160]
Alongside their role in mediating pathological processes, long noncoding RNAs play a role in the immune response to vaccination, as identified for both the influenza vaccine and the yellow fever vaccine. [161]
It took over two decades after the discovery of the first human long non-coding transcripts for the functional significance of lncRNA structures to be fully recognized. Early structural studies led to the proposal of several hypotheses for classifying lncRNA architectures. One hypothesis suggests that lncRNAs may feature a compact tertiary structure, similar to ribozymes like the ribosome or self-splicing introns. Another possibility is that lncRNAs could have structured protein-binding sites arranged in a decentralized scaffold, lacking a compact core. A third hypothesis posits that lncRNAs might exhibit a largely unstructured architecture, with loosely organized protein-binding domains interspersed with long regions of disordered single-stranded RNA. [162]
Studying the tertiary structure of lncRNAs by conventional methods such as X- ray crystallography, cryo-EM and nuclear magnetic resonance (NMR) is unfortunately still hampered by their size and conformational dynamics, and by the fact that for now we still know too little about their mechanism to reconstruct stable and functionally-active lncRNA-ribonucleoprotein complexes. But some pioneering studies, showed that lncRNAs can already be studied by low-resolution single-particle and in-solution methods, such as atomic force microscopy (AFM) and small-angle X-ray scattering (SAXS), in some cases even in complexes with small molecule modulators. [163]
For instance, lncRNA MEG3 was shown to regulate transcription factor p53 thanks to its compact structured core. [164] Moreover, lncRNA Braveheart (Bvht) was shown to have a well-defined, albeit flexible 3D structure that is remodeled upon binding CNBP (Cellular Nucleic-acid Binding Protein) which recognizes distal domains in the RNA. [165] Finally, Xist a master regulator of X chromosome inactivation was shown to specifically bind a small molecule compound, which alters the conformation of Xist RepA motif and displaces two known interacting protein factors (PRC2 and SPEN) from the RNA. By such mechanism of action, the compound abrogates the initiation of X-chromosome inactivation. [166]
Non-coding DNA (ncDNA) sequences are components of an organism's DNA that do not encode protein sequences. Some non-coding DNA is transcribed into functional non-coding RNA molecules. Other functional regions of the non-coding DNA fraction include regulatory sequences that control gene expression; scaffold attachment regions; origins of DNA replication; centromeres; and telomeres. Some non-coding regions appear to be mostly nonfunctional, such as introns, pseudogenes, intergenic DNA, and fragments of transposons and viruses. Regions that are completely nonfunctional are called junk DNA.
A non-coding RNA (ncRNA) is a functional RNA molecule that is not translated into a protein. The DNA sequence from which a functional non-coding RNA is transcribed is often called an RNA gene. Abundant and functionally important types of non-coding RNAs include transfer RNAs (tRNAs) and ribosomal RNAs (rRNAs), as well as small RNAs such as microRNAs, siRNAs, piRNAs, snoRNAs, snRNAs, exRNAs, scaRNAs and the long ncRNAs such as Xist and HOTAIR.
Regulation of gene expression, or gene regulation, includes a wide range of mechanisms that are used by cells to increase or decrease the production of specific gene products. Sophisticated programs of gene expression are widely observed in biology, for example to trigger developmental pathways, respond to environmental stimuli, or adapt to new food sources. Virtually any step of gene expression can be modulated, from transcriptional initiation, to RNA processing, and to the post-translational modification of a protein. Often, one gene regulator controls another, and so on, in a gene regulatory network.
Antisense RNA (asRNA), also referred to as antisense transcript, natural antisense transcript (NAT) or antisense oligonucleotide, is a single stranded RNA that is complementary to a protein coding messenger RNA (mRNA) with which it hybridizes, and thereby blocks its translation into protein. The asRNAs have been found in both prokaryotes and eukaryotes, and can be classified into short and long non-coding RNAs (ncRNAs). The primary function of asRNA is regulating gene expression. asRNAs may also be produced synthetically and have found wide spread use as research tools for gene knockdown. They may also have therapeutic applications.
In biology, the epigenome of an organism is the collection of chemical changes to its DNA and histone proteins that affects when, where, and how the DNA is expressed; these changes can be passed down to an organism's offspring via transgenerational epigenetic inheritance. Changes to the epigenome can result in changes to the structure of chromatin and changes to the function of the genome. The human epigenome, including DNA methylation and histone modification, is maintained through cell division. The epigenome is essential for normal development and cellular differentiation, enabling cells with the same genetic code to perform different functions. The human epigenome is dynamic and can be influenced by environmental factors such as diet, stress, and toxins.
Xist is a non-coding RNA transcribed from the X chromosome of the placental mammals that acts as a major effector of the X-inactivation process. It is a component of the Xic – X-chromosome inactivation centre – along with two other RNA genes and two protein genes.
HOTAIR is a human gene located between HOXC11 and HOXC12 on chromosome 12. It is the first example of an RNA expressed on one chromosome that has been found to influence the transcription of the HOXD cluster posterior genes located on chromosome 2. The sequence and function of HOTAIR are different in humans and mice. Sequence analysis of HOTAIR revealed that it exists in mammals, has poorly conserved sequences and considerably conserved structures, and has evolved faster than nearby HoxC genes. A subsequent study identified HOTAIR has 32 nucleotides long conserved noncoding element (CNE) that has a paralogous copy in HOXD cluster region, suggesting that the HOTAIR conserved sequences predate whole genome duplication events at the root of vertebrate. While the conserved sequence paralogous with HOXD cluster is 32 nucleotide long, the HOTAIR sequence conserved from human to fish is about 200 nucleotide long and is marked by active enhancer features.
RNA polymerase IV is an enzyme that synthesizes small interfering RNA (siRNA) in plants, which silence gene expression. RNAP IV belongs to a family of enzymes that catalyze the process of transcription known as RNA Polymerases, which synthesize RNA from DNA templates. Discovered via phylogenetic studies of land plants, genes of RNAP IV are thought to have resulted from multistep evolution processes that occurred in RNA Polymerase II phylogenies. Such an evolutionary pathway is supported by the fact that RNAP IV is composed of 12 protein subunits that are either similar or identical to RNA polymerase II, and is specific to plant genomes. Via its synthesis of siRNA, RNAP IV is involved in regulation of heterochromatin formation in a process known as RNA directed DNA Methylation (RdDM).
Cryptic unstable transcripts (CUTs) are a subset of non-coding RNAs (ncRNAs) that are produced from intergenic and intragenic regions. CUTs were first observed in S. cerevisiae yeast models and are found in most eukaryotes. Some basic characteristics of CUTs include a length of around 200–800 base pairs, a 5' cap, poly-adenylated tail, and rapid degradation due to the combined activity of poly-adenylating polymerases and exosome complexes. CUT transcription occurs through RNA Polymerase II and initiates from nucleosome-depleted regions, often in an antisense orientation. To date, CUTs have a relatively uncharacterized function but have been implicated in a number of putative gene regulation and silencing pathways. Thousands of loci leading to the generation of CUTs have been described in the yeast genome. Additionally, stable uncharacterized transcripts, or SUTs, have also been detected in cells and bear many similarities to CUTs but are not degraded through the same pathways.
In molecular biology, HOTTIP is a long non-coding RNA. The gene encoding HOTTIP is located at the 5′ tip of the HOXA locus, and coordinates the activation of several of the 5′ HOXA genes. The non-coding RNA is brought into close proximity with the HOXA genes by chromosomal looping. HOTTIP binds to the WDR5 protein, which forms a complex with the histone methyltransferase protein MLL. This targets the WDR5-MLL complex to the HOXA region and results in H3K4 methylation and transcriptional activation of the HOXA locus. More recently, HOTTIP has been shown to play a role in Hepatocellular Carcinoma (HCC) progression. HOTTIP expression levels predict metastasis formation and poor disease outcome in HCC patients. HOTTIP has been shown to be transcriptionally regulated by a transcriptional coactivator PSIP1/p52
In molecular biology, FAS antisense RNA , also known as FAS-AS1 or SAF, is a long non-coding RNA. In humans it is located on chromosome 10. In humans it is transcribed from the opposite strand of intron 1 of the FAS gene. It may regulate the expression of some isoforms of FAS. It may also play a role in the regulation of FAS-mediated apoptosis. Recently it has been shown be sehgal et al. that the alternative splicing of Fas in lymphomas is tightly regulated by a long-noncoding RNA corresponding to an antisense transcript of Fas (FAS-AS1). Levels of FAS-AS1 correlate inversely with production of sFas, and FAS-AS1 binding to the RBM5 inhibits RBM5-mediated exon 6 skipping. EZH2, often mutated or overexpressed in lymphomas, hyper-methylates the FAS-AS1 promoter and represses the FAS-AS1 expression. EZH2-mediated repression of FAS-AS1 promoter can be released by DZNeP or overcome by ectopic expression of FAS-AS1, both of which increase levels of FAS-AS1 and correspondingly decrease expression of sFas. Treatment with Bruton's tyrosine kinase inhibitor or EZH2 knockdown decreases the levels of EZH2, RBM5 and sFas, thereby enhancing Fas-mediated apoptosis. This is the first report showing functional regulation of Fas repression by its antisense RNA. Our results reveal new therapeutic targets in lymphomas and provide a rationale for the use of EZH2 inhibitors or ibrutinib in combination with chemotherapeutic agents that recruit Fas for effective cell killing.
HOXA11-AS lncRNA is a long non-coding RNA from the antisense strand in the homeobox A. The HOX gene contains four clusters. The sense strand of the HOXA gene codes for proteins. Alternative names for HOXA11-AS lncRNA are: HOXA-AS5, HOXA11S, HOXA11-AS1, HOXA11AS, or NCRNA00076. This gene is 3,885 nucleotides long and resides at chromosome 7 (7p15.2) and is transcribed from an independent gene promoter. Being a lncRNA, it is longer than 200 nucleotides in length, in contrast to regular non-coding RNAs.
UBE3A-ATS/Ube3a-ATS (human/mouse), otherwise known as ubiquitin ligase E3A-ATS, is the name for the antisense DNA strand that is transcribed as part of a larger transcript called LNCAT at the Ube3a locus. The Ube3a locus is imprinted and in the central nervous system expressed only from the maternal allele. Silencing of Ube3a on the paternal allele is thought to occur through the Ube3a-ATS part of LNCAT, since non-coding antisense transcripts are often found at imprinted loci. The deletion and/or mutation of Ube3a on the maternal chromosome causes Angelman syndrome (AS) and Ube3a-ATS may prove to be an important aspect in finding a therapy for this disease. While in patients with AS the maternal Ube3a allele is inactive, the paternal allele is intact but epigenetically silenced. If unsilenced, the paternal allele could be a source of active Ube3a protein in AS patients. Therefore, understanding the mechanisms of how Ube3a-ATS might be involved in silencing the paternal Ube3a may lead to new therapies for AS. This possibility has been demonstrated by a recent study where the drug topotecan, administered to mice suffering from AS, activated expression of the paternal Ube3a gene by lowering the transcription of Ube3a-ATS.
Enhancer RNAs (eRNAs) represent a class of relatively long non-coding RNA molecules transcribed from the DNA sequence of enhancer regions. They were first detected in 2010 through the use of genome-wide techniques such as RNA-seq and ChIP-seq. eRNAs can be subdivided into two main classes: 1D eRNAs and 2D eRNAs, which differ primarily in terms of their size, polyadenylation state, and transcriptional directionality. The expression of a given eRNA correlates with the activity of its corresponding enhancer in target genes. Increasing evidence suggests that eRNAs actively play a role in transcriptional regulation in cis and in trans, and while their mechanisms of action remain unclear, a few models have been proposed.
Tsix is a non-coding RNA gene that is antisense to the Xist RNA. Tsix binds Xist during X chromosome inactivation. The name Tsix comes from the reverse of Xist, which stands for X-inactive specific transcript.
Epigenetics of human development is the study of how epigenetics effects human development.
Short interspersed nuclear elements (SINEs) are non-autonomous, non-coding transposable elements (TEs) that are about 100 to 700 base pairs in length. They are a class of retrotransposons, DNA elements that amplify themselves throughout eukaryotic genomes, often through RNA intermediates. SINEs compose about 13% of the mammalian genome.
Brain cytoplasmic 200 long-noncoding RNA is a 200 nucleotide RNA transcript found predominantly in the brain with a primary function of regulating translation by inhibiting its initiation. As a long non-coding RNA, it belongs to a family of RNA transcripts that are not translated into protein (ncRNAs). Of these ncRNAs, lncRNAs are transcripts of 200 nucleotides or longer and are almost three times more prevalent than protein-coding genes. Nevertheless, only a few of the almost 60,000 lncRNAs have been characterized, and little is known about their diverse functions. BC200 is one lncRNA that has given insight into their specific role in translation regulation, and implications in various forms of cancer as well as Alzheimer's disease.
A majority of the human genome is made up of non-protein coding DNA. It infers that such sequences are not commonly employed to encode for a protein. However, even though these regions do not code for protein, they have other functions and carry necessary regulatory information.They can be classified based on the size of the ncRNA. Small noncoding RNA is usually categorized as being under 200 bp in length, whereas long noncoding RNA is greater than 200bp. In addition, they can be categorized by their function within the cell; Infrastructural and Regulatory ncRNAs. Infrastructural ncRNAs seem to have a housekeeping role in translation and splicing and include species such as rRNA, tRNA, snRNA.Regulatory ncRNAs are involved in the modification of other RNAs.
Mitchell Guttman is a molecular biologist. He works at the California Institute of Technology, where he is a professor in the Division of Biology and Biological Engineering and a Robertson Investigator of the New York Stem Cell Foundation. He also serves as the associate director of the UCLA-Caltech Medical Scientist Training Program.
"We're calling long noncoding RNAs a class, when actually the only definition is that they are longer than 200 bp," says Ana Marques, a Research Fellow at the University of Oxford who uses evolutionary approaches to understand lncRNA function.