Bioconductor

Last updated
Bioconductor
Stable release
3.22 / 30 October 2025;0 days ago (2025-10-30)
Operating system Linux, macOS, Windows
Platform R programming language
Type Bioinformatics
License Artistic License 2.0
Website www.bioconductor.org

Bioconductor is a free, open source and open development software project for the analysis and comprehension of genomic data generated by wet lab experiments in molecular biology.

Contents

Bioconductor is based primarily on the statistical R programming language, but does contain contributions in other programming languages. It has two releases each year that follow the semiannual releases of R. At any one time there is a release version, which corresponds to the released version of R, and a development version, which corresponds to the development version of R. Most users will find the release version appropriate for their needs. In addition there are many genome annotation packages available that are mainly, but not solely, oriented towards different types of microarrays.

The project was started in the Fall of 2001 and is overseen by the Bioconductor core team, based primarily at the Fred Hutchinson Cancer Research Center, with other members coming from international institutions.

Packages

Most Bioconductor components are distributed as R packages, which are add-on modules for R. Initially most of the Bioconductor software packages focused on the analysis of single channel Affymetrix and two or more channel cDNA/Oligo microarrays. As the project has matured, the functional scope of the software packages broadened to include the analysis of all types of genomic data, such as SAGE, sequence, or SNP data.

Goals

The broad goals of the projects are to:

Main features

Milestones

Each release of Bioconductor is developed to work best with a chosen version of R. [1] In addition to bugfixes and updates, a new release typically adds packages. The table below maps a Bioconductor release to a R version and shows the number of available Bioconductor software packages for that release.

VersionRelease datePackage countR dependency
3.2230 Oct 20252361 R 4.5
3.2116 Apr 20252341 R 4.5
3.2030 Oct 20242289 R 4.4
3.1825 Oct 20232266 R 4.3
3.162 Nov 20222183 R 4.2
3.1427 Oct 20212083 R 4.1
3.1128 Apr 20201903 R 4.0
3.1030 Oct 20191823 R 3.6
3.831 Oct 20181649 R 3.5
3.631 Oct 20171473 R 3.4
3.418 Oct 20161296 R 3.3
3.214 Oct 20151104 R 3.2
3.014 Oct 2014934 R 3.1
2.1315 Oct 2013749 R 3.0
2.113 Oct 2012610 R 2.15
2.91 Nov 2011517 R 2.14
2.814 Apr 2011466 R 2.13
2.718 Nov 2010418 R 2.12
2.623 Apr 2010389 R 2.11
2.528 Oct 2009352 R 2.10
2.421 Apr 2009320 R 2.9
2.322 Oct 2008294 R 2.8
2.21 May 2008260 R 2.7
2.18 Oct 2007233 R 2.6
2.026 Apr 2007214 R 2.5
1.94 Oct 2006188 R 2.4
1.827 Apr 2006172 R 2.3
1.714 Oct 2005141 R 2.2
1.618 May 2005123 R 2.1
1.525 Oct 2004100 R 2.0
1.417 May 200481 R 1.9
1.330 Oct 200349 R 1.8
1.229 May 200330 R 1.7
1.119 Oct 200220 R 1.6
1.01 May 200215 R 1.5

Application of Bioconductor in small-RNA seq and microRNA data analysis

Introduction

Small RNA sequencing is a widely used technique to study microRNA(miRNAs), small interfering RNAs (siRNAs), piwi-interacting RNA (piRNAs) that play a crucial role in RNA-mediated gene silencing process or known as RNA silencing /Gene silencing process. RNA silencing process employs different types of substrates which give rise to different types of RNA population, namely microRNAs, siRNAs, etc. In the laboratory, small RNA sequencing typically start by extraction of RNA from cells or tissues, followed by Adapter ligation to the 5' and 3' ends of small RNAs, followed by Reverse transcription and PCR amplification to generate cDNA libraries. Finally, High-throughput sequencing ( most commonly Illumina platforms) is used to produce millions of short reads. These resulting data then undergo computational processing to align reads to reference genomes of particular species or miRNA databases.

Bioconductor in RNA Biology

Bioconductor(BioC) [2] is a widely used open-source platform for analysing different types of small-RNA sequencing and genomic data. It primarily utilizes the R programming language and offers a wide range of packages for bioinformatics and computational biology. Bioconductor provides a wide range of packages [3] for handling small-RNA seq data among them few are widely used by researchers. Popular Bioconductor packages like DESeq2, [4] edgeR, [5] limma + voom, [6] [7] GenomicAlignment, [8] GenomicFeatures, [8] Rsubread, [9] ShortRead, [10] featureCounts [11] provide robust analysis of RNA-seq data. [12]

DESeq2

It uses a negative binomial distribution modeling for differential expression analysis of read count from RNA-seq data. [13] It is popular for dispersion estimation, normalization, and visualization by PCA plots or MA plots. [4]

edgeR

It also uses a negative binomial distribution modeling for differential expression analysis of read count from RNA seq data. In contrast with DESeq2, it is used when sample number is relatively small. [5] [13]

limma + voom

It is used to estimate the mean-variance relationship of count data and transforms it to log2-counts per million (CPM). It is used for analysing microarray data and also to calculate CPM value from RNA-seq data. [6]

GenomicAlignment

It is widely popular for reads like BAM and SAM file alignment to assign aligned reads to genes or miRNAs for downstream analysis. [8] [14]

GenomicFeatures

It is used to build transcript-centric annotation databases like TxDb objects which store information about genes, exons, transcripts from GTF/GFF files. [8] [15]

Rsubread

It is used mostly for summarizing the reads and mapping, where functions like align(), featureCounts() are used to provide an efficient alternative to external aligners like STAR or HISAT2. [16]

ShortRead

It is often used to pre-process the raw FASTQ files to check the quality of raw FASTQ data, which comes from a sequencing platform like Illumina sequencing etc. [10]

Computational Workflow

Data Import and Quality Control

FASTQ files [17] are typically imported by using different Bioconductor packages like ShortRead [10] which provides quality assessment reports.

Adapter Trimming and Filtering

Different external tools like Cutadapt, [18] trimmomatic [19] is used to remove the adapter sequence from the raw FASTQ files. This helps to improve the Reads quality.

Read Alignment

The processed Reads are aligned to the reference genome. This alignment can be done by different aligners like Rsubread, or external tools such as STAR, with results stored in standard formats like SAM (Sequence Alignment Map) or BAM (Binary Alignment Map) files .

Annotation of microRNAs

Bioconductor supports to integrate miRBase data where different packages like miRBaseConverter, [20] AnnotationHub, [21] org.Mm.eg.db [22] are used for annotate reads to known miRNAs.

Quantification

Count reads are mapped to known genes or microRNAs, and summarize counts across samples.

Differential Expression Analysis

After mapping and quantifying microRNA expression, different well-established packages like DESeq2, edgeR are used for differential expression analysis.

Visualization

To interpret and present miRNA expression results, different visualization packages are used like ggplot2, [23] pheatmap, [24] ComplexHeatmap which generates Volcano plot (statistics), PCA plot (Principal component analysis), MA plot, pheatmap are used to visualize the differential expression data.

Resources

See also

References

  1. "Bioconductor – Release Announcements". bioconductor.org. Bioconductor. Retrieved 28 May 2019.
  2. Durinck S, Moreau Y, Kasprzyk A, Davis S, De Moor B, Brazma A, Huber W (August 2005). "BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis". Bioinformatics. 21 (16): 3439–40. doi:10.1093/bioinformatics/bti525. PMID   16082012.
  3. "Org.Hs.eg.db".
  4. 1 2 Love MI, Huber W, Anders S (2014). "Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2". Genome Biol. 15 (12) 550. doi: 10.1186/s13059-014-0550-8 . PMC   4302049 . PMID   25516281.
  5. 1 2 Robinson MD, McCarthy DJ, Smyth GK (January 2010). "edgeR: a Bioconductor package for differential expression analysis of digital gene expression data". Bioinformatics. 26 (1): 139–40. doi:10.1093/bioinformatics/btp616. PMC   2796818 . PMID   19910308.
  6. 1 2 Law CW, Alhamdoosh M, Su S, Dong X, Tian L, Smyth GK, Ritchie ME (2016). "RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR". F1000Res. 5 ISCB Comm J-1408. doi: 10.12688/f1000research.9005.3 . PMC   4937821 . PMID   27441086.
  7. Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK (April 2015). "limma powers differential expression analyses for RNA-sequencing and microarray studies". Nucleic Acids Res. 43 (7): e47. doi:10.1093/nar/gkv007. PMC   4402510 . PMID   25605792.
  8. 1 2 3 4 Huber W, Carey VJ, Gentleman R, Anders S, Carlson M, Carvalho BS, Bravo HC, Davis S, Gatto L, Girke T, Gottardo R, Hahne F, Hansen KD, Irizarry RA, Lawrence M, Love MI, MacDonald J, Obenchain V, Oleś AK, Pagès H, Reyes A, Shannon P, Smyth GK, Tenenbaum D, Waldron L, Morgan M (February 2015). "Orchestrating high-throughput genomic analysis with Bioconductor". Nat Methods. 12 (2): 115–21. doi:10.1038/nmeth.3252. PMC   4509590 . PMID   25633503.
  9. Liao Y, Smyth GK, Shi W (May 2019). "The R package Rsubread is easier, faster, cheaper and better for alignment and quantification of RNA sequencing reads". Nucleic Acids Res. 47 (8): e47. doi:10.1093/nar/gkz114. PMC   6486549 . PMID   30783653.
  10. 1 2 3 Morgan M, Anders S, Lawrence M, Aboyoun P, Pagès H, Gentleman R (October 2009). "ShortRead: a bioconductor package for input, quality assessment and exploration of high-throughput sequence data". Bioinformatics. 25 (19): 2607–8. doi:10.1093/bioinformatics/btp450. PMC   2752612 . PMID   19654119.
  11. Liao Y, Smyth GK, Shi W (April 2014). "featureCounts: an efficient general purpose program for assigning sequence reads to genomic features". Bioinformatics. 30 (7): 923–30. doi:10.1093/bioinformatics/btt656. PMID   24227677.
  12. Koch CM, Chiu SF, Akbarpour M, Bharat A, Ridge KM, Bartom ET, Winter DR (August 2018). "A Beginner's Guide to Analysis of RNA Sequencing Data". Am J Respir Cell Mol Biol. 59 (2): 145–157. doi:10.1165/rcmb.2017-0430TR. PMC   6096346 . PMID   29624415.
  13. 1 2 Chen Y, Lun AT, Smyth GK (2016). "From reads to genes to pathways: differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline". F1000Res. 5: 1438. doi: 10.12688/f1000research.8987.2 . PMC   4934518 . PMID   27508061.
  14. "GenomicAlignments (Development version)".
  15. "GenomicFeatures (Development version)".
  16. Liao Y, Smyth GK, Shi W (May 2013). "The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote". Nucleic Acids Res. 41 (10): e108. doi:10.1093/nar/gkt214. PMC   3664803 . PMID   23558742.
  17. Cock PJ, Fields CJ, Goto N, Heuer ML, Rice PM (April 2010). "The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants". Nucleic Acids Res. 38 (6): 1767–71. doi:10.1093/nar/gkp1137. PMC   2847217 . PMID   20015970.
  18. "Cutadapt — Cutadapt 5.1 documentation".
  19. Bolger AM, Lohse M, Usadel B (August 2014). "Trimmomatic: a flexible trimmer for Illumina sequence data". Bioinformatics. 30 (15): 2114–20. doi:10.1093/bioinformatics/btu170. PMC   4103590 . PMID   24695404.
  20. Xu T, Su N, Liu L, Zhang J, Wang H, Zhang W, Gui J, Yu K, Li J, Le TD (December 2018). "miRBaseConverter: an R/Bioconductor package for converting and retrieving miRNA name, accession, sequence and family information in different versions of miRBase". BMC Bioinformatics. 19 (Suppl 19) 514. doi: 10.1186/s12859-018-2531-5 . PMC   6311916 . PMID   30598108.
  21. "AnnotationHub (Development version)".
  22. "Org.Mm.eg.db".
  23. "1 Introduction – ggplot2: Elegant Graphics for Data Analysis (3e)".
  24. "Pheatmap: Pretty Heatmaps version 1.0.13 from CRAN".