Part of a series on |
DNA barcoding |
---|
By taxa |
Other |
A wide variety of non-coding RNAs have been identified in various species of organisms known to science. However, RNAs have also been identified in "metagenomics" sequences derived from samples of DNA or RNA extracted from the environment, which contain unknown species. Initial work in this area detected homologs of known bacterial RNAs in such metagenome samples. [1] [2] Many of these RNA sequences were distinct from sequences within cultivated bacteria, and provide the potential for additional information on the RNA classes to which they belong.
The distinct environmental sequences were exploited to detect previously unknown RNAs in the marine bacterium Pelagibacter ubique . P. ubique is extremely common in marine sequences. So sequences of DNA extracted from oceans, many of which are inevitably derived from species related to P. ubique, were exploited to facilitate the analysis of possible secondary structures of RNAs predicted in this species. [3]
Subsequent studies identified novel RNAs exclusively using sequences extracted from environmental samples. The first study determined the sequences of RNAs directly extracted from microbial biomass in the Pacific Ocean. [4] The researches found that a large fraction of the total extracted RNA molecules did not appear to code for protein, but instead appear to conserve consistent RNA secondary structures. A number of these were shown to belong to known small RNA sequence families, including riboswitches. A larger fraction of these microbial small RNAs appeared to represent novel, non-coding small RNAs, not yet described in any databases. A second study used sequences of DNA extracted from various environments, and inferred the presence of conserved RNA secondary structures among some of these sequences. [5] Both studies identified RNAs that were not present in then-available genome sequences of any known organisms, and determined that some of the RNAs were remarkably abundant. [4] [5] In fact, two of the RNA classes (the IMES-1 RNA motif and IMES-2 RNA motif) exceeded ribosomes in copy number, which is extremely unusual among RNAs in bacteria. IMES-1 RNAs were also determined to be highly abundant near the shore in the Atlantic Ocean using different techniques.
RNAs that were identified in environmental sequence samples include the IMES-1, IMES-3, IMES-4, Whalefall-1, potC , Termite-flg and Gut-1 RNA motifs. These RNA structures have not been detected in the genome of any known species. The IMES-2 RNA motif, GOLLD RNA motif and manA RNA motif were discovered using environmental DNA or RNA sequence samples, and are present in a small number of known species. Additional non-coding RNAs are predicted in marine environments, [4] although no specific conserved secondary structures have been published for these other candidates. Other conserved RNA structures were originally detected using environmental sequence data, e.g., the glnA RNA motif, but were subsequently detected in numerous cultivated species of bacteria.
The discovery of RNAs that are not detected among currently known species mirrors findings of protein classes that are currently unique to environmental samples. [6]
"Candidatus Pelagibacter", with the single species "Ca. P. communis", was isolated in 2002 and given a specific name, although it has not yet been described as required by the bacteriological code. It is an abundant member of the SAR11 clade in the phylum Alphaproteobacteria. SAR11 members are highly dominant organisms found in both salt and fresh water worldwide – possibly the most numerous bacterium in the world, and were originally known only from their rRNA genes, which were first identified in environmental samples from the Sargasso Sea in 1990 by Stephen Giovannoni's laboratory in the Department of Microbiology at Oregon State University and later found in oceans worldwide. "Ca. P. communis" and its relatives may be the most abundant organisms in the ocean, and quite possibly the most abundant bacteria in the entire world. It can make up about 25% of all microbial plankton cells, and in the summer they may account for approximately half the cells present in temperate ocean surface water. The total abundance of "Ca. P. communis" and relatives is estimated to be about 2 × 1028 microbes.
Metagenomics is the study of genetic material recovered directly from environmental samples. The broad field may also be referred to as environmental genomics, ecogenomics or community genomics.
The SAM-II riboswitch is a RNA element found predominantly in Alphaproteobacteria that binds S-adenosyl methionine (SAM). Its structure and sequence appear to be unrelated to the SAM riboswitch found in Gram-positive bacteria. This SAM riboswitch is located upstream of the metA and metC genes in Agrobacterium tumefaciens, and other methionine and SAM biosynthesis genes in other alpha-proteobacteria. Like the other SAM riboswitch, it probably functions to turn off expression of these genes in response to elevated SAM levels. A significant variant of SAM-II riboswitches was found in Pelagibacter ubique and related marine bacteria and called SAM-V. Also, like many structured RNAs, SAM-II riboswitches can tolerate long loops between their stems.
The IMES-1 RNA motif is a conserved RNA structure that was identified in marine environmental sequences by two studies based on metagenomics and bioinformatics, the first analyzing metatranscriptome (RNA) data and the second using metagenome (DNA) data. These RNAs are present in environmental sequences, and as of 2009 are not known to be present in any cultivated species. However, the species that use these RNAs are most closely related to known alphaproteobacteria and gammaproteobacteria. IMES-1 RNAs make up a significant portion of marine RNA transcripts and are exceptionally abundant in that over five times as many IMES-1 RNAs were found as ribosomes in RNAs sampled from the Pacific Ocean. Only two bacterial RNAs are known to be more highly transcribed than ribosomes. IMES-1 RNAs were also detected in abundance in Block Island Sound in the Atlantic Ocean.
The IMES-2 RNA motif is a conserved RNA structure that was identified by a study based on metagenomics and bioinformatics, and the underlying RNA sequences were identified independently by a similar earlier study. These RNAs are present in environmental sequences, and when discovered were not known to be present in any cultivated species. However, an IMES-2 RNA has been detected in alphaproteobacterium HIMB114, which is classified in the SAR11 clade of marine bacteria. This finding fits with earlier predictions that species that use IMES-2 RNAs are most closely related to alphaproteobacteria. IMES-2 RNAs are exceptionally abundant, as twice as many IMES-2 RNAs were found as ribosomes in RNAs sampled from the Pacific Ocean. Only two bacterial RNAs are known to be more highly transcribed than ribosomes.
The IMES-3 RNA motif is a conserved RNA structure that was identified based on metagenomics and bioinformatics, and the underlying RNA sequences were identified independently by an earlier study. These RNAs are present in environmental sequences, and as of 2009 are not known to be present in any cultivated species. IMES-3 RNAs are abundant in comparison to ribosomes in RNAs sampled from the Pacific Ocean.
The IMES-4 RNA motif is a conserved RNA structure that was identified in marine environmental sequences by metagenomics and bioinformatics. These RNAs are present in environmental sequences, and as of 2009 are not known to be present in any cultivated species. IMES-4 RNAs are fairly abundant in comparison to ribosomes in RNAs sampled from the Pacific Ocean.
The Downstream-peptide motif refers to a conserved RNA structure identified by bioinformatics in the cyanobacterial genera Synechococcus and Prochlorococcus and one phage that infects such bacteria. It was also detected in marine samples of DNA from uncultivated bacteria, which are presumably other species of cyanobacteria.
The glutamine riboswitch is a conserved RNA structure that was predicted by bioinformatics. It is present in a variety of lineages of cyanobacteria, as well as some phages that infect cyanobacteria. It is also found in DNA extracted from uncultivated bacteria living in the ocean that are presumably species of cyanobacteria.
The Gut-1 RNA motif is a conserved RNA structure identified by bioinformatics. These RNAs are present in environmental sequences, and as of 2010 are not known to be present in any species that has been grown under laboratory conditions. Gut-1 RNA is exclusively found in DNA from uncultivated bacteria present in samples from the human gut.
The manA RNA motif refers to a conserved RNA structure that was identified by bioinformatics. Instances of the manA RNA motif were detected in bacteria in the genus Photobacterium and phages that infect certain kinds of cyanobacteria. However, most predicted manA RNA sequences are derived from DNA collected from uncultivated marine bacteria. Almost all manA RNAs are positioned such that they might be in the 5' untranslated regions of protein-coding genes, and therefore it was hypothesized that manA RNAs function as cis-regulatory elements. Given the relative complexity of their secondary structure, and their hypothesized cis-regulatory role, they might be riboswitches.
The wcaG RNA motif is an RNA structure conserved in some bacteria that was detected by bioinformatics. wcaG RNAs are found in certain phages that infect cyanobacteria. Most known wcaG RNAs were found in sequences of DNA extracted from uncultivated marine bacteria. wcaG RNAs might function as cis-regulatory elements, in view of their consistent location in the possible 5' untranslated regions of genes. It was suggested the wcaG RNAs might further function as riboswitches.
PhotoRC RNA motifs refer to conserved RNA structures that are associated with genes acting in the photosynthetic reaction centre of photosynthetic bacteria. Two such RNA classes were identified and called the PhotoRC-I and PhotoRC-II motifs. PhotoRC-I RNAs were detected in the genomes of some cyanobacteria. Although no PhotoRC-II RNA has been detected in cyanobacteria, one is found in the genome of a purified phage that infects cyanobacteria. Both PhotoRC-I and PhotoRC-II RNAs are present in sequences derived from DNA that was extracted from uncultivated marine bacteria.
The Polynucleobacter-1 RNA motif is a conserved RNA structure that was identified by bioinformatics. The RNA structure is predominantly located in genome sequences derived from DNA extracted from uncultivated marine samples. However it was also predicted in the genome of Polynucleobacter species QLW-P1DMWA-1, a kind of betaproteobacteria. The RNAs are often located near to a conserved gene that might be homologous to a gene found in a phage that infects cyanobacteria. However, it is unknown if the RNA is used by phages.
The potC RNA motif is a conserved RNA structure discovered using bioinformatics. The RNA is detected only in genome sequences derived from DNA that was extracted from uncultivated marine bacteria. Thus, this RNA is present in environmental samples, but not yet found in any cultivated organism. potC RNAs are located in the presumed 5' untranslated regions of genes predicted to encode either membrane transport proteins or peroxiredoxins. Therefore, it was hypothesized that potC RNAs are cis-regulatory elements, but their detailed function is unknown.
The Termite-flg RNA motif is a conserved RNA structure identified by bioinformatics. Genomic sequences corresponding to Termite-flg RNAs have been identified only in uncultivated bacteria present in the termite hindgut. As of 2010 it has not been identified in the DNA of any cultivated species, and is thus an example of RNAs present in environmental samples.
The Termite-leu RNA motif is a conserved RNA structure discovered by bioinformatics. It is found only in DNA sequences extracted from uncultivated bacteria living in termite hindguts, and has not yet been detected in any known cultivated organism. In many cases, Termite-leu RNAs are found in the likely 5′ untranslated regions of multive genes related to the synthesis of the amino acid leucine. However, in several cases it is not found in this type of location. Therefore, it was considered ambiguous as to whether Termite-leu RNAs constitute cis-regulatory elements.
The Whalefall-1 RNA motif refers to a conserved RNA structure that was discovered using bioinformatics. Structurally, the motif consists of two stem-loops, the second of which is often terminated by a CUUG tetraloop, which is an energetically favorable RNA sequence. Whalefall-1 RNAs are found only in DNA extracted from uncultivated bacteria found on whale fall, i.e., a whale carcass. As of 2010, Whalefall-1 RNAs have not been detected in any known, cultivated species of bacteria, and are thus one of several RNAs present in environmental samples.
The Ocean-V RNA motif is a conserved RNA structure discovered using bioinformatics. Only a few Ocean-V RNA sequences have been detected, all in sequences derived from DNA that was extracted from uncultivated bacteria found in ocean water. As of 2010, no Ocean-V RNA has been detected in any known, cultivated organism.
SAM-V riboswitch is the fifth known riboswitch to bind S-adenosyl methionine (SAM). It was first discovered in the marine bacterium Candidatus Pelagibacter ubique and can also be found in marine metagenomes. SAM-V features a similar consensus sequence and secondary structure as the binding site of SAM-II riboswitch, but bioinformatics scans cluster the two aptamers independently. These similar binding pockets suggest that the two riboswitches have undergone convergent evolution.