Rhea (pipeline)

Last updated
Rhea
Rhea Logo.png
Developer(s) Ilias Lagkouvardos, Sandra Fischer, Neeraj Kumar, Thomas Clavel
Initial release16 November 2016 (2016-11-16)
Stable release
1.1.0
Written in R
Operating system Windows, macOS, Ubuntu, Fedora, Red Hat Linux, openSUSE
License MIT License
Website https://lagkouvardos.github.io/Rhea/

Rhea [1] is a bioinformatic pipeline written in R language for the analysis of microbial profiles. It was released during the end of 2016 and it is publicly available through a GitHub repository. [2]

Starting with an Operational taxonomic unit (OTU) table, the pipeline contains scripts that perform the following common analytical steps:

  1. Normalization of the OTU table
  2. Calculation of the alpha diversity for each sample
  3. Calculation of beta diversity and visualization of the results with PCoA
  4. Taxonomic binning
  5. Statistical testing
  6. Correlation analysis

The name Rhea was primarily given to the pipeline as a phonetic and visual link to the R language used throughout development. Moreover, as stated in the original publication, [1] the name was chosen to reflect the flowing and evolving nature of the scripts, as "flow" is one of the suggested etymology of the name of the mythological goddess Rhea.

Related Research Articles

Rhea may refer to:

Metagenomics Study of genes found in the environment

Metagenomics is the study of genetic material recovered directly from environmental samples. The broad field may also be referred to as environmental genomics, ecogenomics or community genomics.

16S ribosomal RNA RNA component

16S ribosomal RNA is the RNA component of the 30S subunit of a prokaryotic ribosome. It binds to the Shine-Dalgarno sequence and provides most of the SSU structure.

An Operational Taxonomic Unit (OTU) is an operational definition used to classify groups of closely related individuals. The term was originally introduced in 1963 by Robert R. Sokal and Peter H. A. Sneath in the context of numerical taxonomy, where an "Operational Taxonomic Unit" is simply the group of organisms currently being studied. In this sense, an OTU is a pragmatic definition to group individuals by similarity, equivalent to but not necessarily in line with classical Linnaean taxonomy or modern evolutionary taxonomy.

Ribosomal RNA (rRNA) intergenic spacer analysis (RISA) is a method of microbial community analysis that provides a means of comparing differing environments or treatment impacts without the bias imposed by culture- dependent approaches. This type of analysis is often referred to as community fingerprinting. RISA involves PCR amplification of a region of the rRNA gene operon between the small (16S) and large (23S) subunits called the intergenic spacer region ISR.

A microbial consortium or microbial community, is two or more bacterial or microbial groups living symbiotically. Consortiums can be endosymbiotic or ectosymbiotic, or occasionally may be both. The protist Mixotricha paradoxa, itself an endosymbiont of the Mastotermes darwiniensis termite, is always found as a consortium of at least one endosymbiotic coccus, multiple ectosymbiotic species of flagellate or ciliate bacteria, and at least one species of helical Treponema bacteria that forms the basis of Mixotricha protists' locomotion.

UniFrac is a distance metric used for comparing biological communities. It differs from dissimilarity measures such as Bray-Curtis dissimilarity in that it incorporates information on the relative relatedness of community members by incorporating phylogenetic distances between observed organisms in the computation.

Rare biosphere refers to a large number of rare species of microbial life, i.e. bacteria, archaea and fungi, that can be found in very low concentrations in an environment.

DNA barcoding Method of species identification using a short section of DNA

DNA barcoding is a method of species identification using a short section of DNA from a specific gene or genes. The premise of DNA barcoding is that, by comparison with a reference library of such DNA sections, an individual sequence can be used to uniquely identify an organism to species, in the same way that a supermarket scanner uses the familiar black stripes of the UPC barcode to identify an item in its stock against its reference database. These "barcodes" are sometimes used in an effort to identify unknown species, parts of an organism, or simply to catalog as many taxa as possible, or to compare with traditional taxonomy in an effort to determine species boundaries.

Zetaproteobacteria Class of bacteria

The class Zetaproteobacteria is the sixth and most recently described class of the Pseudomonadota. Zetaproteobacteria can also refer to the group of organisms assigned to this class. The Zetaproteobacteria were originally represented by a single described species, Mariprofundus ferrooxydans, which is an iron-oxidizing neutrophilic chemolithoautotroph originally isolated from Loihi Seamount in 1996 (post-eruption). Molecular cloning techniques focusing on the small subunit ribosomal RNA gene have also been used to identify a more diverse majority of the Zetaproteobacteria that have as yet been unculturable.

Earth Microbiome Project

The Earth Microbiome Project (EMP) is an initiative founded by Janet Jansson, Jack Gilbert and Rob Knight in 2010 to collect natural samples and to analyze the microbial community around the globe.

Community fingerprinting is a set of molecular biology techniques that can be used to quickly profile the diversity of a microbial community. Rather than directly identifying or counting individual cells in an environmental sample, these techniques show how many variants of a gene are present. In general, it is assumed that each different gene variant represents a different type of microbe. Community fingerprinting is used by microbiologists studying a variety of microbial systems to measure biodiversity or track changes in community structure over time. The method analyzes environmental samples by assaying genomic DNA. This approach offers an alternative to microbial culturing, which is important because most microbes cannot be cultured in the laboratory. Community fingerprinting does not result in identification of individual microbe species; instead, it presents an overall picture of a microbial community. These methods are now largely being replaced by high throughput sequencing, such as targeted microbiome analysis and metagenomics.

Parasutterella is a genus of Gram-negative, circular/rod-shaped, obligate anaerobic, non-spore forming bacteria from the Pseudomonadota phylum, Betaproteobacteria class and the family Sutterellaceae. Previously, this genus was considered "unculturable," meaning that it could not be characterized through conventional laboratory techniques, such as grow in culture due its unique requirements of anaerobic environment. The genus was initially discovered through 16S rRNA sequencing and bioinformatics analysis. By analyzing the sequence similarity, Parasutterella was determined to be related most closely to the genus Sutterella and previously classified in the family Alcaligenaceae.

Syntrophococcus sucromutans is a Gram-negative strictly anaerobic chemoorganotrophic Bacillota. These bacteria can be found forming small chains in the habitat where it was first isolated, the rumen of cows. It is the type strain of genus Syntrophococcus and it has an uncommon one-carbon metabolic pathway, forming acetate from formate as a product of sugar oxidation.

Metatranscriptomics is the science that studies gene expression of microbes within natural environments, i.e., the metatranscriptome. It also allows to obtain whole gene expression profiling of complex microbial communities.

PICRUSt is a bioinformatics software package. The name is an abbreviation for Phylogenetic Investigation of Communities by Reconstruction of Unobserved States.

Acetatifactor muris is a bacterium from the genus of Acetatifactor which was isolated from the cecal content of an obese mouse in Freising-Weihenstephan in Germany. The organism is rod-shaped, Gram-positive, anaerobic, and non-motile. The organism does not form spores, and its GC-content is 48%. It does not metabolize glucose, and it tests positive for phenylalanine arylamidase. This species is the type strain for the genus Acetatifactor, which is commonly found in the guts of rodents. The DSM type strain is 23669T, and the ATCC type strain is BAA-2170T.

QIIME is a bioinformatic pipeline designated for the task of analysing microbial communities that were sampled through marker gene amplicon sequencing. In its heart the pipeline performs quality control over the input sequencing reads, clusters the marker gene nucleotide sequences at a requested phylogenetic level into OTUs or sequence variants and taxonomically annotates them by looking for similar sequences in a reference taxonomic database. The main output from the QIIME pipeline is a feature table, which describes the abundances of each OTU or sequence variant in each sample. Additional tools that are relevant to ecological aspects of the samples being investigated are also provided within the pipeline, including rarefaction, alpha diversity and beta diversity calculations, visualizations such as principle coordinates analysis (PCoA) and much more. QIIME presents a very modular approach and allows multiple methods to be applied in many stages of the analysis. For example, the stage of sequences clustering can be performed using either UCLUST, CD-HIT, BLAST and more. QIIME is under active development since its release in 2010.

Machine learning in bioinformatics is the application of machine learning algorithms to bioinformatics, including genomics, proteomics, microarrays, systems biology, evolution, and text mining.

Amplicon sequence variant

An amplicon sequence variant (ASV) is any one of the inferred single DNA sequences recovered from a high-throughput analysis of marker genes. Because these analyses, also called "amplicon reads," are created following the removal of erroneous sequences generated during PCR and sequencing, using ASVs makes it possible to distinguish sequence variation by a single nucleotide change. The uses of ASVs include classifying groups of species based on DNA sequences, finding biological and environmental variation, and determining ecological patterns.

References

  1. 1 2 Lagkouvardos, Ilias; Fischer, Sandra; Kumar, Neeraj; Clavel, Thomas (11 January 2017). "Rhea: a transparent and modular R pipeline for microbial profiling based on 16S rRNA gene amplicons". PeerJ. 5: e2836. doi:10.7717/peerj.2836. PMC   5234437 .
  2. "Rhea by Lagkouvardos". lagkouvardos.github.io.