An RNA motif is a description of a group of RNAs that have a related structure. RNA motifs consist of a pattern of features within the primary sequence and secondary structure of related RNAs. Thus, it extends the concept of a sequence motif to include RNA secondary structure. The term "RNA motif" can refer both to the pattern and to the RNA sequences that match it.
RNA motifs can be described in two main forms: a multiple sequence alignment or an explicit search pattern. An alignment is usually augmented with a consensus secondary structure, i.e. the structure that is common to all or most RNAs. The sequences in the alignment then implicitly define a pattern of conservation that can, for example, be used to find additional examples of the RNA. This search strategy is implemented by, among others, the Infernal software package. [1]
The Rfam database is a collection of multiple sequence alignments that define a large subset of reliably known RNA motifs and associated information. Its data can be used with the Infernal software to find examples of such RNAs in sequence databases, e.g. genome sequences.
Alternatively, RNA motifs can also be described using explicit search patterns, which define specific primary sequence patterns combined with constraints of where helices should form. Such patterns can be used to find matching subsequences in a large sequence database. Several software packages implement such a search, e.g. RNArobo [2] and RNAmotif. [3]
Main article: Bioinformatics discovery of non-coding RNAs
Many methods to discover novel RNAs use a comparative approach, in which different sequences are analyzed together in order to detect characteristic signals of a conserved RNA. When such methods are successful, the resulting novel conserved RNA can be viewed as an RNA motif, expressed using an alignment or a pattern. An early example is the RNA motif based around the T-box, which in 1993 was determined to be associated with aminoacyl-tRNA synthetase genes. [4] The mechanism by which this RNA motif regulates genes was later demonstrated, thus establishing the functional importance of the RNA motif. Later, in 1997, a conserved RNA motif called the B12-box was detected upstream of genes related to B12 metabolism. [5] This RNA motif was later found to correspond to a part of a riboswitch that binds the co-factor adenosylcobalamin, which is often called the cobalamin riboswitch. (Later variants were shown to bind other cobalamin derivatives.) Many other examples of RNA motifs whose functions were later determined are known, especially in the context of riboswitches. [6] However, other types of RNA motifs have been functionally characterized, such as bacterial sRNAs like the 6C RNA, which was discovered as a motif in 2007 [7] and functionally characterized in 2016, [8] or ribozymes like the twister ribozyme, which was detected as an RNA motif and functionally characterized in the same publication. [9]
In molecular biology, a riboswitch is a regulatory segment of a messenger RNA molecule that binds a small molecule, resulting in a change in production of the proteins encoded by the mRNA. Thus, an mRNA that contains a riboswitch is directly involved in regulating its own activity, in response to the concentrations of its effector molecule. The discovery that modern organisms use RNA to bind small molecules, and discriminate against closely related analogs, expanded the known natural capabilities of RNA beyond its ability to code for proteins, catalyze reactions, or to bind other RNA or protein macromolecules.
Cobalamin riboswitch is a cis-regulatory element which is widely distributed in 5' untranslated regions of vitamin B12 (Cobalamin) related genes in bacteria.
The glucosamine-6-phosphate riboswitch ribozyme is an RNA structure that resides in the 5' untranslated region (UTR) of the mRNA transcript of the glmS gene. This RNA regulates the glmS gene by responding to concentrations of a specific metabolite, glucosamine-6-phosphate (GlcN6P), in addition to catalyzing a self-cleaving chemical reaction upon activation. This cleavage leads to the degradation of the mRNA that contains the ribozyme, and lowers production of GlcN6P. The glmS gene encodes for an enzyme glutamine-fructose-6-phosphate amidotransferase, which catalyzes the formation of GlcN6P, a compound essential for cell wall biosynthesis, from fructose-6-phosphate and glutamine. Thus, when GlcN6P levels are high, the glmS ribozyme is activated and the mRNA transcript is degraded but in the absence of GlcN6P the gene continues to be translated into glutamine-fructose-6-phosphate amidotransferase and GlcN6P is produced. GlcN6P is a cofactor for this cleavage reaction, as it directly participates as an acid-base catalyst. This RNA is the first riboswitch also found to be a self-cleaving ribozyme and, like many others, was discovered using a bioinformatics approach.
The YdaO/YuaA leader is a conserved RNA structure found upstream of the ydaO and yuaA genes in Bacillus subtilis and related genes in other bacteria. Its secondary structure and gene associations were predicted by bioinformatics.
The ykkC/yxkD leader is a conserved RNA structure found upstream of the ykkC and yxkD genes in Bacillus subtilis and related genes in other bacteria. The function of this family is unclear for many years although it has been suggested that it may function to switch on efflux pumps and detoxification systems in response to harmful environmental molecules. The Thermoanaerobacter tengcongensis sequence AE013027 overlaps with that of purine riboswitch suggesting that the two riboswitches may work in conjunction to regulate the upstream gene which codes for TTE0584 (Q8RC62), a member of the permease family.
PreQ1-II riboswitches form a class of riboswitches that specifically bind pre-queuosine1 (PreQ1), a precursor of the modified nucleoside queuosine. They are found in certain species of Streptococcus and Lactococcus, and were originally identified as a conserved RNA secondary structure called the "COG4708 motif". All known members of this riboswitch class appear to control members of COG4708 genes. These genes are predicted to encode membrane-bound proteins and have been proposed to be a transporter of preQ1, or a related metabolite, based on their association with preQ1-binding riboswitches. PreQ1-II riboswitches have no apparent similarities in sequence or structure to preQ1-I riboswitches, a previously discovered class of preQ1-binding riboswitches. PreQ1 thus joins S-adenosylmethionine as the second metabolite to be found that is the ligand of more than one riboswitch class.
Cyclic di-GMP-I riboswitches are a class of riboswitch that specifically bind cyclic di-GMP, which is a second messenger that is used in a variety of microbial processes including virulence, motility and biofilm formation. Cyclic di-GMP-I riboswitches were originally identified by bioinformatics as a conserved RNA-like structure called the "GEMM motif". These riboswitches are present in a wide variety of bacteria, and are most common in Clostridia and certain varieties of Proteobacteria. The riboswitches are present in pathogens such as Clostridium difficile, Vibrio cholerae and Bacillus anthracis. Geobacter uraniumreducens is predicted to have 30 instances of this riboswitch in its genome. A bacteriophage that infects C. difficile is predicted to carry a cyclic di-GMP-I riboswitch, which it might use to detect and exploit the physiological state of bacteria that it infects.
The Downstream-peptide motif refers to a conserved RNA structure identified by bioinformatics in the cyanobacterial genera Synechococcus and Prochlorococcus and one phage that infects such bacteria. It was also detected in marine samples of DNA from uncultivated bacteria, which are presumably other species of cyanobacteria.
The glutamine riboswitch is a conserved RNA structure that was predicted by bioinformatics. It is present in a variety of lineages of cyanobacteria, as well as some phages that infect cyanobacteria. It is also found in DNA extracted from uncultivated bacteria living in the ocean that are presumably species of cyanobacteria.
The pfl RNA motif refers to a conserved RNA structure present in some bacteria and originally discovered using bioinformatics. pfl RNAs are consistently present in genomic locations that likely correspond to the 5' untranslated regions of protein-coding genes. This arrangement in bacteria is commonly associated with cis-regulatory elements. Moreover, they are in presumed 5' UTRs of multiple non-homologous genes, suggesting that they function only in these locations. Additional evidence of cis-regulatory function came from the observation that predicted rho-independent transcription terminators overlap pfl RNAs. This overlap suggests that the alternate secondary structures of pfl RNA and the transcription terminator stem-loops compete with each other, and this is a common mechanism for cis gene control in bacteria.
The yjdF RNA motif is a conserved RNA structure identified using bioinformatics. Most yjdF RNAs are located in bacteria classified within the phylum Firmicutes. A yjdF RNA is found in the presumed 5' untranslated region of the yjdF gene in Bacillus subtilis, and almost all yjdF RNAs are found in the 5' UTRs of homologs of this gene. The function of the yjdF gene is unknown, but the protein that it is predicted to encode is classified by the Pfam Database as DUF2992.
Cyclic di-GMP-II riboswitches form a class of riboswitches that specifically bind cyclic di-GMP, a second messenger used in multiple bacterial processes such as virulence, motility and biofilm formation. Cyclic di-GMP II riboswitches are structurally unrelated to cyclic di-GMP-I riboswitches, though they have the same function.
SAM-V riboswitch is the fifth known riboswitch to bind S-adenosyl methionine (SAM). It was first discovered in the marine bacterium Candidatus Pelagibacter ubique and can also be found in marine metagenomes. SAM-V features a similar consensus sequence and secondary structure as the binding site of SAM-II riboswitch, but bioinformatics scans cluster the two aptamers independently. These similar binding pockets suggest that the two riboswitches have undergone convergent evolution.
RNAs Associated with Genes Associated with Twister and Hammerhead ribozymes (RAGATH) refers to a bioinformatics strategy that was devised to find self-cleaving ribozymes in bacteria. It also refers to candidate RNAs, or RAGATH RNA motifs, discovered using this strategy.
Non-coding RNAs have been discovered using both experimental and bioinformatic approaches. Bioinformatic approaches can be divided into three main categories. The first involves homology search, although these techniques are by definition unable to find new classes of ncRNAs. The second category includes algorithms designed to discover specific types of ncRNAs that have similar properties. Finally, some discovery methods are based on very general properties of RNA, and are thus able to discover entirely new kinds of ncRNAs.
The folE RNA motif, now known as the THF-II riboswitch, is a conserved RNA structure that was discovered by bioinformatics. folE motifs are found in Alphaproteobacteria.
The FTHFS RNA motif is a conserved RNA structure that was discovered by bioinformatics. FTHFS motifs are found in metagenomic sequences derived from samples of the human gut.
The queA RNA motif is a conserved RNA structure that was discovered by bioinformatics. queA motif RNAs have not yet been found in any classified organism; they are known from metagenomic sequences.
The terC RNA motif is a conserved RNA structure that was discovered by bioinformatics. terC motif RNAs are found in Proteobacteria, within the sub-lineages Alphaproteobacteria and Pseudomonadales.
The uup RNA motif is a conserved RNA structure that was discovered by bioinformatics. uup motif RNAs are found in Firmicutes and Gammaproteobacteria.