![]() | This article may be confusing or unclear to readers.(August 2016) |
isomiRs (from iso- + miR) are miRNA sequences that have variations with respect to the reference sequence. The term was coined by Morin et al in 2008. [1] It has been found that isomiR expression profiles can also exhibit race, population, and gender dependencies. [2] [3]
There are four main variation types:
miRBase is considered to be the gold-standard miRNA database—it stores miRNA sequences detected by thousand of experiments. In this database each miRNA is associated with a miRNA precursor and with one or two mature miRNA (-5p and -3p). In the past it had always been said that the same miRNA precursor generates the same miRNA sequences. However, the advent of deep sequencing has now allowed researchers to detect a huge variability in miRNA biogenesis, meaning that from the same miRNA precursor many different sequences can be generated potentially have different targets, [4] [2] [5] or even lead to opposite changes in mRNA expression. [2]
The advent of sequencing has permitted scientists to elucidate a huge landscape of new miRNAs, to increase our knowledge of the biogenesis involved and to discover putative post-transcriptional editing processes in miRNAs ignored until now. These processes mostly generate variations of the current miRNAs that are annotated in miRBase in the 3' and 5' terminus and in minor frequencies, nucleotide substitution along the miRNA length,. [6] [7] [8] [9] The variations are mainly generated by a shift of Drosha and Dicer in the cleavage site, but also by nucleotide additions at the 3'-end, [10] resulting in new sequences different from the annotated miRNA. These were named "isomiRs" by Morin et al., 2008. IsomiRs have been well established along different species in metazoa [11] [12] [13] [14] [15] and deeply described for the first time in human stem cells and human brain samples. [8] [9] Moreover, it has been proven that isomiRs are not caused by RNA degradation during sample preparation for next generation sequencing. [16] Some studies have tried to explain the miRNA diversity by structural bases of precursors but without clear results. [17] The functionality of adenylation or uridynilation at the 3'end (3'addition isomiRs) has been related to alterations in the miRNA-3'-UTR stability. [18] Furthermore, differential expression of isomiRs has been detected during development in D. melanogaster and Hippoglossus hippoglossus L., suggesting a biological function. [15] [19]
An exon is any part of a gene that will form a part of the final mature RNA produced by that gene after introns have been removed by RNA splicing. The term exon refers to both the DNA sequence within a gene and to the corresponding sequence in RNA transcripts. In RNA splicing, introns are removed and exons are covalently joined to one another as part of generating the mature RNA. Just as the entire set of genes for a species constitutes the genome, the entire set of exons constitutes the exome.
Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and deoxyribonucleic acid (DNA) are nucleic acids. Along with lipids, proteins, and carbohydrates, nucleic acids constitute one of the four major macromolecules essential for all known forms of life. Like DNA, RNA is assembled as a chain of nucleotides, but unlike DNA, RNA is found in nature as a single strand folded onto itself, rather than a paired double strand. Cellular organisms use messenger RNA (mRNA) to convey genetic information that directs synthesis of specific proteins. Many viruses encode their genetic information using an RNA genome.
MicroRNA (miRNA) are small, single-stranded, non-coding RNA molecules containing 21 to 23 nucleotides. Found in plants, animals and some viruses, miRNAs are involved in RNA silencing and post-transcriptional regulation of gene expression. miRNAs base-pair to complementary sequences in mRNA molecules, then gene silence said mRNA molecules by one or more of the following processes: (1) cleavage of mRNA strand into two pieces, (2) destabilization of mRNA by shortening its poly(A) tail, or (3) translation of mRNA into proteins. This last method of gene silencing is the least efficient of the three, and requires the aid of ribosomes.
Polyadenylation is the addition of a poly(A) tail to an RNA transcript, typically a messenger RNA (mRNA). The poly(A) tail consists of multiple adenosine monophosphates; in other words, it is a stretch of RNA that has only adenine bases. In eukaryotes, polyadenylation is part of the process that produces mature mRNA for translation. In many bacteria, the poly(A) tail promotes degradation of the mRNA. It, therefore, forms part of the larger process of gene expression.
Ribosomal ribonucleic acid (rRNA) is a type of non-coding RNA which is the primary component of ribosomes, essential to all cells. rRNA is a ribozyme which carries out protein synthesis in ribosomes. Ribosomal RNA is transcribed from ribosomal DNA (rDNA) and then bound to ribosomal proteins to form small and large ribosome subunits. rRNA is the physical and mechanical factor of the ribosome that forces transfer RNA (tRNA) and messenger RNA (mRNA) to process and translate the latter into proteins. Ribosomal RNA is the predominant form of RNA found in most cells; it makes up about 80% of cellular RNA despite never being translated into proteins itself. Ribosomes are composed of approximately 60% rRNA and 40% ribosomal proteins by mass.
Transcriptional modification or co-transcriptional modification is a set of biological processes common to most eukaryotic cells by which an RNA primary transcript is chemically altered following transcription from a gene to produce a mature, functional RNA molecule that can then leave the nucleus and perform any of a variety of different functions in the cell. There are many types of post-transcriptional modifications achieved through a diverse class of molecular mechanisms.
Rfam is a database containing information about non-coding RNA (ncRNA) families and other structured RNA elements. It is an annotated, open access database originally developed at the Wellcome Trust Sanger Institute in collaboration with Janelia Farm, and currently hosted at the European Bioinformatics Institute. Rfam is designed to be similar to the Pfam database for annotating protein families.
ChIP-sequencing, also known as ChIP-seq, is a method used to analyze protein interactions with DNA. ChIP-seq combines chromatin immunoprecipitation (ChIP) with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. It can be used to map global binding sites precisely for any protein of interest. Previously, ChIP-on-chip was the most common technique utilized to study these protein–DNA relations.
An Hfq binding sRNA is an sRNA that binds the bacterial RNA binding protein called Hfq. A number of bacterial small RNAs which have been shown to bind to Hfq have been characterised . Many of these RNAs share a similar structure composed of three stem-loops. Several studies have expanded this list, and experimentally validated a total of 64 Hfq binding sRNA in Salmonella Typhimurium. A transcriptome wide study on Hfq binding sites in Salmonella mapped 126 Hfq binding sites within sRNAs. Genomic SELEX has been used to show that Hfq binding RNAs are enriched in the sequence motif 5′-AAYAAYAA-3′. Genome-wide study identified 40 candidate Hfq-dependent sRNAs in plant pathogen Erwinia amylovora. 12 of them were confirmed by Northern blot.
In bioinformatics, miRBase is a biological database that acts as an archive of microRNA sequences and annotations. As of September 2010 it contained information about 15,172 microRNAs. This number has risen to 38,589 by March 2018. The miRBase registry provides a centralised system for assigning new names to microRNA genes.
This microRNA database and microRNA targets databases is a compilation of databases and web portals and servers used for microRNAs and their targets. MicroRNAs (miRNAs) represent an important class of small non-coding RNAs (ncRNAs) that regulate gene expression by targeting messenger RNAs.
Degradome sequencing (Degradome-Seq), also referred to as parallel analysis of RNA ends (PARE), is a modified version of 5'-Rapid Amplification of cDNA Ends (RACE) using high-throughput, deep sequencing methods such as Illumina's SBS technology. The degradome encompasses the entire set of proteases that are expressed at a specific time in a given biological material, including tissues, cells, organisms, and biofluids. Thus, sequencing this degradome offers a method for studying and researching the process of RNA degradation. This process is used to identify and quantify RNA degradation products, or fragments, present in any given biological sample. This approach allows for the systematic identification of targets of RNA decay and provides insight into the dynamics of transcriptional and post-transcriptional gene regulation.
The Sequence Read Archive is a bioinformatics database that provides a public repository for DNA sequencing data, especially the "short reads" generated by high-throughput sequencing, which are typically less than 1,000 base pairs in length. The archive is part of the International Nucleotide Sequence Database Collaboration (INSDC), and run as a collaboration between the NCBI, the European Bioinformatics Institute (EBI), and the DNA Data Bank of Japan (DDBJ).
miR-296 is a family of microRNA precursors found in mammals, including humans. The ~22 nucleotide mature miRNA sequence is excised from the precursor hairpin by the enzyme Dicer. This sequence then associates with RISC which effects RNA interference.
αr9 is a family of bacterial small non-coding RNAs with representatives in a broad group of α-proteobacteria from the order Hyphomicrobiales. The first member of this family (Smr9C) was found in a Sinorhizobium meliloti 1021 locus located in the chromosome (C). Further homology and structure conservation analysis have identified full-length Smr9C homologs in several nitrogen-fixing symbiotic rhizobia, in the plant pathogens belonging to Agrobacterium species as well as in a broad spectrum of Brucella species. αr9C RNA species are 144-158 nt long and share a well defined common secondary structure consisting of seven conserved regions. Most of the αr9 transcripts can be catalogued as trans-acting sRNAs expressed from well-defined promoter regions of independent transcription units within intergenic regions (IGRs) of the α-proteobacterial genomes.
The European Nucleotide Archive (ENA) is a repository providing free and unrestricted access to annotated DNA and RNA sequences. It also stores complementary information such as experimental procedures, details of sequence assembly and other metadata related to sequencing projects. The archive is composed of three main databases: the Sequence Read Archive, the Trace Archive and the EMBL Nucleotide Sequence Database. The ENA is produced and maintained by the European Bioinformatics Institute and is a member of the International Nucleotide Sequence Database Collaboration (INSDC) along with the DNA Data Bank of Japan and GenBank.
Chimeric RNA, sometimes referred to as a fusion transcript, is composed of exons from two or more different genes that have the potential to encode novel proteins. These mRNAs are different from those produced by conventional splicing as they are produced by two or more gene loci.
Single nucleotide polymorphism annotation is the process of predicting the effect or function of an individual SNP using SNP annotation tools. In SNP annotation the biological information is extracted, collected and displayed in a clear form amenable to query. SNP functional annotation is typically performed based on the available information on nucleic acid and protein sequences.