Bioconductor | |
---|---|
![]() | |
Stable release | 3.21 / 16 April 2025 |
Operating system | Linux, macOS, Windows |
Platform | R programming language |
Type | Bioinformatics |
License | Artistic License 2.0 |
Website | www |
Bioconductor is a free, open source and open development software project for the analysis and comprehension of genomic data generated by wet lab experiments in molecular biology.
Bioconductor is based primarily on the statistical R programming language, but does contain contributions in other programming languages. It has two releases each year that follow the semiannual releases of R. At any one time there is a release version, which corresponds to the released version of R, and a development version, which corresponds to the development version of R. Most users will find the release version appropriate for their needs. In addition there are many genome annotation packages available that are mainly, but not solely, oriented towards different types of microarrays.
The project was started in the Fall of 2001 and is overseen by the Bioconductor core team, based primarily at the Fred Hutchinson Cancer Research Center, with other members coming from international institutions.
Most Bioconductor components are distributed as R packages, which are add-on modules for R. Initially most of the Bioconductor software packages focused on the analysis of single channel Affymetrix and two or more channel cDNA/Oligo microarrays. As the project has matured, the functional scope of the software packages broadened to include the analysis of all types of genomic data, such as SAGE, sequence, or SNP data.
The broad goals of the projects are to:
Each release of Bioconductor is developed to work best with a chosen version of R. [1] In addition to bugfixes and updates, a new release typically adds packages. The table below maps a Bioconductor release to a R version and shows the number of available Bioconductor software packages for that release.
Version | Release date | Package count | R dependency |
---|---|---|---|
3.21 | 16 Apr 2025 | 2341 | R 4.5 |
3.20 | 30 Oct 2024 | 2289 | R 4.4 |
3.18 | 25 Oct 2023 | 2266 | R 4.3 |
3.16 | 2 Nov 2022 | 2183 | R 4.2 |
3.14 | 27 Oct 2021 | 2083 | R 4.1 |
3.11 | 28 Apr 2020 | 1903 | R 4.0 |
3.10 | 30 Oct 2019 | 1823 | R 3.6 |
3.8 | 31 Oct 2018 | 1649 | R 3.5 |
3.6 | 31 Oct 2017 | 1473 | R 3.4 |
3.4 | 18 Oct 2016 | 1296 | R 3.3 |
3.2 | 14 Oct 2015 | 1104 | R 3.2 |
3.0 | 14 Oct 2014 | 934 | R 3.1 |
2.13 | 15 Oct 2013 | 749 | R 3.0 |
2.11 | 3 Oct 2012 | 610 | R 2.15 |
2.9 | 1 Nov 2011 | 517 | R 2.14 |
2.8 | 14 Apr 2011 | 466 | R 2.13 |
2.7 | 18 Nov 2010 | 418 | R 2.12 |
2.6 | 23 Apr 2010 | 389 | R 2.11 |
2.5 | 28 Oct 2009 | 352 | R 2.10 |
2.4 | 21 Apr 2009 | 320 | R 2.9 |
2.3 | 22 Oct 2008 | 294 | R 2.8 |
2.2 | 1 May 2008 | 260 | R 2.7 |
2.1 | 8 Oct 2007 | 233 | R 2.6 |
2.0 | 26 Apr 2007 | 214 | R 2.5 |
1.9 | 4 Oct 2006 | 188 | R 2.4 |
1.8 | 27 Apr 2006 | 172 | R 2.3 |
1.7 | 14 Oct 2005 | 141 | R 2.2 |
1.6 | 18 May 2005 | 123 | R 2.1 |
1.5 | 25 Oct 2004 | 100 | R 2.0 |
1.4 | 17 May 2004 | 81 | R 1.9 |
1.3 | 30 Oct 2003 | 49 | R 1.8 |
1.2 | 29 May 2003 | 30 | R 1.7 |
1.1 | 19 Oct 2002 | 20 | R 1.6 |
1.0 | 1 May 2002 | 15 | R 1.5 |
Small RNA sequencing is a widely used technique to study microRNA(miRNAs), small interfering RNAs (siRNAs), piwi-interacting RNA (piRNAs) that play a crucial role in RNA-mediated gene silencing process or known as RNA silencing /Gene silencing process. RNA silencing process employs different types of substrates which give rise to different types of RNA population, namely microRNAs, siRNAs, etc. In the laboratory, small RNA sequencing typically start by extraction of RNA from cells or tissues, followed by Adapter ligation to the 5' and 3' ends of small RNAs, followed by Reverse transcription and PCR amplification to generate cDNA libraries. Finally, High-throughput sequencing ( most commonly Illumina platforms) is used to produce millions of short reads. These resulting data then undergo computational processing to align reads to reference genomes of particular species or miRNA databases.
Bioconductor(BioC) [2] is a widely used open-source platform for analysing different types of small-RNA sequencing and genomic data. It primarily utilizes the R programming language and offers a wide range of packages for bioinformatics and computational biology. Bioconductor provides a wide range of packages [3] for handling small-RNA seq data among them few are widely used by researchers. Popular Bioconductor packages like DESeq2, [4] edgeR, [5] limma + voom, [6] [7] GenomicAlignment, [8] GenomicFeatures, [8] Rsubread, [9] ShortRead, [10] featureCounts [11] provide robust analysis of RNA-seq data. [12]
It uses a negative binomial distribution modeling for differential expression analysis of read count from RNA-seq data. [13] It is popular for dispersion estimation, normalization, and visualization by PCA plots or MA plots. [4]
It also uses a negative binomial distribution modeling for differential expression analysis of read count from RNA seq data. In contrast with DESeq2, it is used when sample number is relatively small. [5] [14]
It is used to estimate the mean-variance relationship of count data and transforms it to log2-counts per million (CPM). It is used for analysing microarray data and also to calculate CPM value from RNA-seq data. [15]
It is widely popular for reads like BAM and SAM file alignment to assign aligned reads to genes or miRNAs for downstream analysis. [8] [16]
It is used to build transcript-centric annotation databases like TxDb objects which store information about genes, exons, transcripts from GTF/GFF files. [8] [17]
It is used mostly for summarizing the reads and mapping, where functions like align(), featureCounts() are used to provide an efficient alternative to external aligners like STAR or HISAT2. [18]
It is often used to pre-process the raw FASTQ files to check the quality of raw FASTQ data, which comes from a sequencing platform like Illumina sequencing etc. [10]
FASTQ files [19] are typically imported by using different Bioconductor packages like ShortRead [10] which provides quality assessment reports.
Different external tools like Cutadapt, [20] trimmomatic [21] is used to remove the adapter sequence from the raw FASTQ files. This helps to improve the Reads quality.
The processed Reads are aligned to the reference genome. This alignment can be done by different aligners like Rsubread, or external tools such as STAR, with results stored in standard formats like SAM (Sequence Alignment Map) or BAM (Binary Alignment Map) files .
Bioconductor supports to integrate miRBase data where different packages like miRBaseConverter, [22] AnnotationHub, [23] org.Mm.eg.db [24] are used for annotate reads to known miRNAs.
Count reads are mapped to known genes or microRNAs, and summarize counts across samples.
After mapping and quantifying microRNA expression, different well-established packages like DESeq2, edgeR are used for differential expression analysis.
To interpret and present miRNA expression results, different visualization packages are used like ggplot2, [25] pheatmap, [26] ComplexHeatmap which generates Volcano plot (statistics), PCA plot (Principal component analysis), MA plot, pheatmap are used to visualize the differential expression data.