List of RNA-Seq bioinformatics tools

Last updated

RNA-Seq [1] [2] [3] is a technique [4] that allows transcriptome studies (see also Transcriptomics technologies) based on next-generation sequencing technologies. This technique is largely dependent on bioinformatics tools developed to support the different steps of the process. Here are listed some of the principal tools commonly employed and links to some important web resources.

Contents

Design

Design is a fundamental step of a particular RNA-Seq experiment. Some important questions like sequencing depth/coverage or how many biological or technical replicates must be carefully considered. Design review. [5]

Quality control, trimming, error correction and pre-processing of data

Quality assessment of raw data [6] is the first step of the bioinformatics pipeline of RNA-Seq. Often, is necessary to filter data, removing low quality sequences or bases (trimming), adapters, contaminations, overrepresented sequences or correcting errors to assure a coherent final result.

Quality control

Improving the quality

Improvement of the RNA-Seq quality, correcting the bias is a complex subject. [16] [17] Each RNA-Seq protocol introduces specific type of bias, each step of the process (such as the sequencing technology used) is susceptible to generate some sort of noise or type of error. Furthermore, even the species under investigation and the biological context of the samples are able to influence the results and introduce some kind of bias. Many sources of bias were already reported – GC content and PCR enrichment, [18] [19] rRNA depletion, [20] errors produced during sequencing, [21] priming of reverse transcription caused by random hexamers. [22]

Different tools were developed to attempt to solve each of the detected errors.

Trimming and adapters removal

  • AlienTrimmer [23] implements a very fast approach (based on k-mers) to trim low-quality base pairs and clip technical (alien) oligonucleotides from single- or paired-end sequencing reads in plain or gzip-compressed FASTQ files (for more details, see AlienTrimmer).
  • BBDuk multithreaded tool to trim adapters and filter or mask contaminants based on kmer-matching, allowing a hamming- or edit-distance, as well as degenerate bases. Also performs optimal quality-trimming and filtering, format conversion, contaminant concentration reporting, gc-filtering, length-filtering, entropy-filtering, chastity-filtering, and generates text histograms for most operations. Interconverts between fastq, fasta, sam, scarf, interleaved and 2-file paired, gzipped, bzipped, ASCII-33 and ASCII-64. Keeps pairs together. Open-source, written in pure Java; supports all platforms with no recompilation and no other dependencies.
  • clean_reads cleans NGS (Sanger, 454, Illumina and solid) reads. It can trim bad quality regions, adaptors, vectors, and regular expressions. It also filters out the reads that do not meet a minimum quality criteria based on the sequence length and the mean quality.
  • condetri [24] is a method for content dependent read trimming for Illumina data using quality scores of each base individually. It is independent from sequencing coverage and user interaction. The main focus of the implementation is on usability and to incorporate read trimming in next-generation sequencing data processing and analysis pipelines. It can process single-end and paired-end sequencing data of arbitrary length.
  • cutadapt [25] removes adapter sequences from next-generation sequencing data (Illumina, SOLiD and 454). It is used especially when the read length of the sequencing machine is longer than the sequenced molecule, like the microRNA case.
  • Deconseq Detect and remove contaminations from sequence data.
  • Erne-Filter [26] is a short string alignment package whose goal is to provide an all-inclusive set of tools to handle short (NGS-like) reads. ERNE comprises ERNE-FILTER (read trimming and continamination filtering), ERNE-MAP (core alignment tool/algorithm), ERNE-BS5 (bisulfite treated reads aligner), and ERNE-PMAP/ERNE-PBS5 (distributed versions of the aligners).
  • FastqMcf Fastq-mcf attempts to: Detect & remove sequencing adapters and primers; Detect limited skewing at the ends of reads and clip; Detect poor quality at the ends of reads and clip; Detect Ns, and remove from ends; Remove reads with CASAVA 'Y' flag (purity filtering); Discard sequences that are too short after all of the above; Keep multiple mate-reads in sync while doing all of the above.
  • FASTX Toolkit is a set of command line tools to manipulate reads in files FASTA or FASTQ format. These commands make possible preprocess the files before mapping with tools like Bowtie. Some of the tasks allowed are: conversion from FASTQ to FASTA format, information about statistics of quality, removing sequencing adapters, filtering and cutting sequences based on quality or conversion DNA/RNA.
  • Flexbar performs removal of adapter sequences, trimming and filtering features.
  • FreClu improves overall alignment accuracy performing sequencing-error correction by trimming short reads, based on a clustering methodology.
  • htSeqTools is a Bioconductor package able to perform quality control, processing of data and visualization. htSeqTools makes possible visualize sample correlations, to remove over-amplification artifacts, to assess enrichment efficiency, to correct strand bias and visualize hits.
  • NxTrim Adapter trimming and virtual library creation routine for Illumina Nextera Mate Pair libraries.
  • PRINSEQ [27] generates statistics of your sequence data for sequence length, GC content, quality scores, n-plicates, complexity, tag sequences, poly-A/T tails, odds ratios. Filter the data, reformat and trim sequences.
  • Sabre A barcode demultiplexing and trimming tool for FastQ files.
  • Scythe A 3'-end adapter contaminant trimmer.
  • SEECER is a sequencing error correction algorithm for RNA-seq data sets. It takes the raw read sequences produced by a next generation sequencing platform like machines from Illumina or Roche. SEECER removes mismatch and indel errors from the raw reads and significantly improves downstream analysis of the data. Especially if the RNA-Seq data is used to produce a de novo transcriptome assembly, running SEECER can have tremendous impact on the quality of the assembly.
  • Sickle A windowed adaptive trimming tool for FASTQ files using quality.
  • SnoWhite [28] is a pipeline designed to flexibly and aggressively clean sequence reads (gDNA or cDNA) prior to assembly. It takes in and returns fastq or fasta formatted sequence files.
  • ShortRead is a package provided in the R (programming language) / BioConductor environments and allows input, manipulation, quality assessment and output of next-generation sequencing data. This tool makes possible manipulation of data, such as filter solutions to remove reads based on predefined criteria. ShortRead could be complemented with several Bioconductor packages to further analysis and visualization solutions (BioStrings, BSgenome, IRanges, and so on).
  • SortMeRNA is a program tool for filtering, mapping and OTU-picking NGS reads in metatranscriptomic and metagenomic data. The core algorithm is based on approximate seeds and allows for analyses of nucleotide sequences. The main application of SortMeRNA is filtering ribosomal RNA from metatranscriptomic data.
  • TagCleaner The TagCleaner tool can be used to automatically detect and efficiently remove tag sequences (e.g. WTA tags) from genomic and metagenomic datasets. It is easily configurable and provides a user-friendly interface.
  • Trimmomatic [29] performs trimming for Illumina platforms and works with FASTQ reads (single or pair-ended). Some of the tasks executed are: cut adapters, cut bases in optional positions based on quality thresholds, cut reads to a specific length, converts quality scores to Phred-33/64.
  • fastp A tool designed to provide all-in-one preprocessing for FastQ files. This tool is developed in C++ with multithreading supported.
  • FASTX-Toolkit The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.

Detection of chimeric reads

Recent sequencing technologies normally require DNA samples to be amplified via polymerase chain reaction (PCR). Amplification often generates chimeric elements (specially from ribosomal origin) - sequences formed from two or more original sequences joined.

  • UCHIME is an algorithm for detecting chimeric sequences.
  • ChimeraSlayeris a chimeric sequence detection utility, compatible with near-full length Sanger sequences and shorter 454-FLX sequences (~500 bp).

Error correction

High-throughput sequencing errors characterization and their eventual correction. [30]

  • Acacia Error-corrector for pyrosequenced amplicon reads.
  • AllPathsLG error correction.
  • AmpliconNoise [31] AmpliconNoise is a collection of programs for the removal of noise from 454 sequenced PCR amplicons. It involves two steps the removal of noise from the sequencing itself and the removal of PCR point errors. This project also includes the Perseus algorithm for chimera removal.
  • BayesHammer. Bayesian clustering for error correction. This algorithm is based on Hamming graphs and Bayesian subclustering. While BAYES HAMMER was designed for single-cell sequencing, it also improves on existing error correction tools for bulk sequencing data.
  • Bless [32] A bloom filter-based error correction solution for high-throughput sequencing reads.
  • Blue [33] Blue is a short-read error-correction tool based on k-mer consensus and context.
  • BFC A sequencing error corrector designed for Illumina short reads. It uses a non-greedy algorithm with a speed comparable to implementations based on greedy methods.
  • Denoiser Denoiser is designed to address issues of noise in pyrosequencing data. Denoiser is a heuristic variant of PyroNoise. Developers of denoiser report a good agreement with PyroNoise on several test datasets.
  • Echo A reference-free short-read error correction algorithm.
  • Lighter. A sequencing error correction without counting.
  • LSC LSC uses short Illumina reads to corrected errors in long reads.
  • Karect Karect: accurate correction of substitution, insertion and deletion errors for next-generation sequencing data.
  • NoDe NoDe: an error-correction algorithm for pyrosequencing amplicon reads.
  • PyroTagger PyroTagger: A fast, accurate pipeline for analysis of rRNA amplicon pyrosequence data.
  • Quake is a tool to correct substitution sequencing errors in experiments with deep coverage for Illumina sequencing reads.
  • QuorUM: An Error Corrector for Illumina Reads.
  • Rcorrector. Error correction for Illumina RNA-seq reads.
  • Reptile is a software developed in C++ for correcting sequencing errors in short reads from next-gen sequencing platforms.
  • Seecer SEquencing Error CorrEction for Rna reads.
  • SGA
  • SOAPdenovo
  • UNOISE

Bias correction

  • Alpine [34] Modeling and correcting fragment sequence bias for RNA-seq.
  • cqn [35] is a normalization tool for RNA-Seq data, implementing the conditional quantile normalization method.
  • EDASeq [36] is a Bioconductor package to perform GC-Content Normalization for RNA-Seq Data.
  • GeneScissors A comprehensive approach to detecting and correcting spurious transcriptome inference due to RNAseq reads misalignment.
  • Peer [37] is a collection of Bayesian approaches to infer hidden determinants and their effects from gene expression profiles using factor analysis methods. Applications of PEER have: a) detected batch effects and experimental confounders, b) increased the number of expression QTL findings by threefold, c) allowed inference of intermediate cellular traits, such as transcription factor or pathway activations.
  • RUV [38] is a R package that implements the remove unwanted variation (RUV) methods of Risso et al. (2014) for the normalization of RNA-Seq read counts between samples.
  • svaSurrogate Variable Analysis.
  • svaseq removing batch effects and other unwanted noise from sequencing data.
  • SysCall [39] is a classifier tool to identification and correction of systematic error in high-throughput sequence data.

Other tasks/pre-processing data

Further tasks performed before alignment, namely paired-read mergers.

  • AuPairWise A Method to Estimate RNA-Seq Replicability through Co-expression.
  • BamHash is a checksum based method to ensure that the read pairs in FASTQ files match exactly the read pairs stored in BAM files, regardless of the ordering of reads. BamHash can be used to verify the integrity of the files stored and discover any discrepancies. Thus, BamHash can be used to determine if it is safe to delete the FASTQ files storing raw sequencing reads after alignment, without the loss of data.
  • BBMerge Merges paired reads based on overlap to create longer reads, and an insert-size histogram. Fast, multithreaded, and yields extremely few false positives. Open-source, written in pure Java; supports all platforms with no recompilation and no other dependencies. Distributed with BBMap.
  • Biopieces are a collection of bioinformatics tools that can be pieced together in a very easy and flexible manner to perform both simple and complex tasks. The Biopieces work on a data stream in such a way that the data stream can be passed through several different Biopieces, each performing one specific task: modifying or adding records to the data stream, creating plots, or uploading data to databases and web services.
  • COPE [40] COPE: an accurate k-mer-based pair-end reads connection tool to facilitate genome assembly.
  • DeconRNASeq is an R package for deconvolution of heterogeneous tissues based on mRNA-Seq data.
  • FastQ Screen screens FASTQ format sequences to a set of databases to confirm that the sequences contain what is expected (such as species content, adapters, vectors, etc.).
  • FLASH is a read pre-processing tool. FLASH combines paired-end reads which overlap and converts them to single long reads.
  • IDCheck
  • ORNA and ORNA Q/K A tool for reducing redundancy in RNA-seq data which reduces the computational resource requirements of an assembler
  • PANDASeq.is a program to align Illumina reads, optionally with PCR primers embedded in the sequence, and reconstruct an overlapping sequence.
  • PEAR [41] PEAR: Illumina Paired-End reAd mergeR.
  • qRNASeq script The qRNAseq tool can be used to accurately eliminate PCR duplicates from RNA-Seq data if Molecular Indexes™ or other stochastic labels have been used during library prep.
  • SHERA [42] a SHortread Error-Reducing Aligner.
  • XORRO Rapid Paired-End Read Overlapper.
  • DecontaMiner [43] detects contamination in RNA-Seq data.

Alignment tools

After quality control, the first step of RNA-Seq analysis involves alignment of the sequenced reads to a reference genome (if available) or to a transcriptome database. See also List of sequence alignment software .

Short (unspliced) aligners

Short aligners are able to align continuous reads (not containing gaps result of splicing) to a genome of reference. Basically, there are two types: 1) based on the Burrows–Wheeler transform method such as Bowtie and BWA, and 2) based on Seed-extend methods, Needleman–Wunsch or Smith–Waterman algorithms. The first group (Bowtie and BWA) is many times faster, however some tools of the second group tend to be more sensitive, generating more correctly aligned reads.

Spliced aligners

Many reads span exon-exon junctions and can not be aligned directly by Short aligners, thus specific aligners were necessary - Spliced aligners. Some Spliced aligners employ Short aligners to align firstly unspliced/continuous reads (exon-first approach), and after follow a different strategy to align the rest containing spliced regions - normally the reads are split into smaller segments and mapped independently. See also. [45] [46]

Aligners based on known splice junctions (annotation-guided aligners)

In this case the detection of splice junctions is based on data available in databases about known junctions. This type of tools cannot identify new splice junctions. Some of this data comes from other expression methods like expressed sequence tags (EST).

  • Erange is a tool to alignment and data quantification to mammalian transcriptomes.
  • IsoformEx
  • MapAL
  • OSA
  • RNA-MATE is a computational pipeline for alignment of data from Applied Biosystems SOLID system. Provides the possibility of quality control and trimming of reads. The genome alignments are performed using mapreads and the splice junctions are identified based on a library of known exon-junction sequences. This tool allows visualization of alignments and tag counting.
  • RUM performs alignment based on a pipeline, being able to manipulate reads with splice junctions, using Bowtie and Blat. The flowchart starts doing alignment against a genome and a transcriptome database executed by Bowtie. The next step is to perform alignment of unmapped sequences to the genome of reference using BLAT. In the final step all alignments are merged to get the final alignment. The input files can be in FASTA or FASTQ format. The output is presented in RUM and SAM format.
  • RNASEQR.
  • SAMMate
  • SpliceSeq
  • X-Mate

De novo splice aligners

De novo Splice aligners allow the detection of new Splice junctions without need to previous annotated information (some of these tools present annotation as a suplementar option).

  • ABMapper
  • BBMap Uses short kmers to align reads directly to the genome (spanning introns to find novel isoforms) or transcriptome. Highly tolerant of substitution errors and indels, and very fast. Supports output of all SAM tags needed by Cufflinks. No limit to genome size or number of splices per read. Supports Illumina, 454, Sanger, Ion Torrent, PacBio, and Oxford Nanopore reads, paired or single-ended. Does not use any splice-site-finding heuristics optimized for a single taxonomic branch, but rather finds optimally-scoring multi-affine-transform global alignments, and thus is ideal for studying new organisms with no annotation and unknown splice motifs. Open-source, written in pure Java; supports all platforms with no recompilation and no other dependencies.
  • ContextMap was developed to overcome some limitations of other mapping approaches, such as resolution of ambiguities. The central idea of this tool is to consider reads in gene expression context, improving this way alignment accuracy. ContextMap can be used as a stand-alone program and supported by mappers producing a SAM file in the output (e.g.: TopHat or MapSplice). In stand-alone mode aligns reads to a genome, to a transcriptome database or both.
  • CRAC propose a novel way of analyzing reads that integrates genomic locations and local coverage, and detect candidate mutations, indels, splice or fusion junctions in each single read. Importantly, CRAC improves its predictive performance when supplied with e.g. 200 nt reads and should fit future needs of read analyses.
  • GSNAP
  • GMAP A Genomic Mapping and Alignment Program for mRNA and EST Sequences.
  • HISAT is a spliced alignment program for mapping RNA-seq reads. In addition to one global FM-index that represents a whole genome, HISAT uses a large set of small FM-indexes that collectively cover the whole genome (each index represents a genomic region of ~64,000 bp and ~48,000 indexes are needed to cover the human genome). These small indexes (called local indexes) combined with several alignment strategies enable effective alignment of RNA-seq reads, in particular, reads spanning multiple exons. The memory footprint of HISAT is relatively low (~4.3GB for the human genome). We have developed HISAT based on the Bowtie2 implementation to handle most of the operations on the FM-index.
  • HISAT2 is an alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes (as well as to a single reference genome). Based on an extension of BWT for graphs [Sirén et al. 2014], we designed and implemented a graph FM-index (GFM), an original approach and its first implementation to the best of our knowledge. In addition to using one global GFM index that represents a population of human genomes, HISAT2 uses a large set of small GFM indexes that collectively cover the whole genome (each index representing a genomic region of 56 Kbp, with 55,000 indexes needed to cover the human population). These small indexes (called local indexes), combined with several alignment strategies, enable rapid and accurate alignment of sequencing reads. This new indexing scheme is called a Hierarchical Graph FM index (HGFM).
  • HMMSplicer can identify canonical and non-canonical splice junctions in short-reads. Firstly, unspliced reads are removed with Bowtie. After that, the remaining reads are one at a time divided in half, then each part is seeded against a genome and the exon borders are determined based on the Hidden Markov Model. A quality score is assigned to each junction, useful to detect false positive rates.
  • MapSplice
  • PALMapper
  • Pass [47] aligns gapped, ungapped reads and also bisulfite sequencing data. It includes the possibility to filter data before alignment (remotion of adapters). Pass uses Needleman–Wunsch and Smith–Waterman algorithms, and performs alignment in 3 stages: scanning positions of seed sequences in the genome, testing the contiguous regions and finally refining the alignment.
  • PASSion
  • PASTA
  • QPALMA predicts splice junctions supported on machine learning algorithms. In this case the training set is a set of spliced reads with quality information and already known alignments.
  • RASER: [48] reads aligner for SNPs and editing sites of RNA.
  • SeqSaw
  • SoapSplice A tool for genome-wide ab initio detection of splice junction sites from RNA-Seq, a method using new generation sequencing technologies to sequence the messenger RNA.
  • SpliceMap
  • SplitSeek
  • SuperSplat was developed to find all type of splice junctions. The algorithm splits each read in all possible two-chunk combinations in an iterative way, and alignment is tried to each chunck. Output in "Supersplat" format.
De novo splice aligners that also use annotation optionally
  • MapNext
  • OLego
  • STAR is a tool that employs "sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure", detects canonical, non-canonical splices junctions and chimeric-fusion sequences. It is already adapted to align long reads (third-generation sequencing technologies) and can reach speeds of 45 million paired reads per hour per processor. [49]
  • Subjunc [44] is a specialized version of Subread. It uses all mappable regions in an RNA-seq read to discover exons and exon-exon junctions. It uses the donor/receptor signals to find the exact splicing locations. Subjunc yields full alignments for every RNA-seq read including exon-spanning reads, in addition to the discovered exon-exon junctions. Subjunc should be used for the purpose of junction detection and genomic variation detection in RNA-seq data.
  • TopHat [50] is prepared to find de novo junctions. TopHat aligns reads in two steps. Firstly, unspliced reads are aligned with Bowtie. After, the aligned reads are assembled with Maq resulting islands of sequences. Secondly, the splice junctions are determined based on the initially unmapped reads and the possible canonical donor and acceptor sites within the island sequences.
Other spliced aligners
  • G.Mo.R-Se is a method that uses RNA-Seq reads to build de novo gene models.

Evaluation of alignment tools

Normalization, quantitative analysis and differential expression

General tools

These tools perform normalization and calculate the abundance of each gene expressed in a sample. [51] RPKM, FPKM and TPMs [52] are some of the units employed to quantification of expression. Some software are also designed to study the variability of genetic expression between samples (differential expression). Quantitative and differential studies are largely determined by the quality of reads alignment and accuracy of isoforms reconstruction. Several studies are available comparing differential expression methods. [53] [54] [55]

Evaluation of quantification and differential expression

Multi-tool solutions

Transposable Element expression

Workbench (analysis pipeline / integrated solutions)

Commercial solutions

Open (free) source solutions

Alternative splicing analysis

General tools

Intron retention analysis

Differential isoform/transcript usage

Fusion genes/chimeras/translocation finders/structural variations

Genome arrangements result of diseases like cancer can produce aberrant genetic modifications like fusions or translocations. Identification of these modifications play important role in carcinogenesis studies. [85]

Copy number variation identification

Single cell RNA-Seq

Single cell sequencing. The traditional RNA-Seq methodology is commonly known as "bulk RNA-Seq", in this case RNA is extracted from a group of cells or tissues, not from the individual cell like it happens in single cell methods. Some tools available to bulk RNA-Seq are also applied to single cell analysis, however to face the specificity of this technique new algorithms were developed.

Integrated Packages

Quality Control and Gene Filtering

Data cleaning and denoising

Normalization

Dimension Reduction

Differential Expression

Visualization

RNA-Seq simulators

These Simulators generate in silico reads and are useful tools to compare and test the efficiency of algorithms developed to handle RNA-Seq data. Moreover, some of them make possible to analyse and model RNA-Seq protocols.

Transcriptome assemblers

The transcriptome is the total population of RNAs expressed in one cell or group of cells, including non-coding and protein-coding RNAs. There are two types of approaches to assemble transcriptomes. Genome-guided methods use a reference genome (if possible a finished and high quality genome) as a template to align and assembling reads into transcripts. Genome-independent methods does not require a reference genome and are normally used when a genome is not available. In this case reads are assembled directly in transcripts.

Genome-guided assemblers

Genome-independent (de novo) assemblers

Assembly evaluation tools

Co-expression networks

miRNA prediction and analysis

Visualization tools

Functional, network and pathway analysis tools

Further annotation tools for RNA-Seq data

Compression tools

RNA-Seq databases

Single species' RNA-Seq databases

Related Research Articles

<span class="mw-page-title-main">Alternative splicing</span> Process by which a gene can code for multiple proteins

Alternative splicing, or alternative RNA splicing, or differential splicing, is an alternative splicing process during gene expression that allows a single gene to code for multiple proteins. In this process, particular exons of a gene may be included within or excluded from the final, processed messenger RNA (mRNA) produced from that gene. This means the exons are joined in different combinations, leading to different (alternative) mRNA strands. Consequently, the proteins translated from alternatively spliced mRNAs usually contain differences in their amino acid sequence and, often, in their biological functions.

In bioinformatics, sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of a wide range of analytical methods to understand its features, function, structure, or evolution. It can be performed on the entire genome, transcriptome or proteome of an organism, and can also involve only selected segments or regions, like tandem repeats and transposable elements. Methodologies used include sequence alignment, searches against biological databases, and others.

In bioinformatics, sequence assembly refers to aligning and merging fragments from a longer DNA sequence in order to reconstruct the original sequence. This is needed as DNA sequencing technology might not be able to 'read' whole genomes in one go, but rather reads small pieces of between 20 and 30,000 bases, depending on the technology used. Typically, the short fragments (reads) result from shotgun sequencing genomic DNA, or gene transcript (ESTs).

The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The term transcriptome is a portmanteau of the words transcript and genome; it is associated with the process of transcript production during the biological process of transcription.

<span class="mw-page-title-main">Fusion gene</span>

A fusion gene is a hybrid gene formed from two previously independent genes. It can occur as a result of translocation, interstitial deletion, or chromosomal inversion. Fusion genes have been found to be prevalent in all main types of human neoplasia. The identification of these fusion genes play a prominent role in being a diagnostic and prognostic marker.

<span class="mw-page-title-main">Steven Salzberg</span> American biologist and computer scientist

Steven Lloyd Salzberg is an American computational biologist and computer scientist who is a Bloomberg Distinguished Professor of Biomedical Engineering, Computer Science, and Biostatistics at Johns Hopkins University, where he is also Director of the Center for Computational Biology.

<span class="mw-page-title-main">RNA-Seq</span> Lab technique in cellular biology

RNA-Seq is a technique that uses next-generation sequencing to reveal the presence and quantity of RNA molecules in a biological sample, providing a snapshot of gene expression in the sample, also known as transcriptome.

SOAP is a suite of bioinformatics software tools from the BGI Bioinformatics department enabling the assembly, alignment, and analysis of next generation DNA sequencing data. It is particularly suited to short read sequencing data.

Paired-end tags (PET) are the short sequences at the 5’ and 3' ends of a DNA fragment which are unique enough that they (theoretically) exist together only once in a genome, therefore making the sequence of the DNA in between them available upon search or upon further sequencing. Paired-end tags (PET) exist in PET libraries with the intervening DNA absent, that is, a PET "represents" a larger fragment of genomic or cDNA by consisting of a short 5' linker sequence, a short 5' sequence tag, a short 3' sequence tag, and a short 3' linker sequence. It was shown conceptually that 13 base pairs are sufficient to map tags uniquely. However, longer sequences are more practical for mapping reads uniquely. The endonucleases used to produce PETs give longer tags but sequences of 50–100 base pairs would be optimal for both mapping and cost efficiency. After extracting the PETs from many DNA fragments, they are linked (concatenated) together for efficient sequencing. On average, 20–30 tags could be sequenced with the Sanger method, which has a longer read length. Since the tag sequences are short, individual PETs are well suited for next-generation sequencing that has short read lengths and higher throughput. The main advantages of PET sequencing are its reduced cost by sequencing only short fragments, detection of structural variants in the genome, and increased specificity when aligning back to the genome compared to single tags, which involves only one end of the DNA fragment.

De novo transcriptome assembly is the de novo sequence assembly method of creating a transcriptome without the aid of a reference genome.

MicroRNA sequencing (miRNA-seq), a type of RNA-Seq, is the use of next-generation sequencing or massively parallel high-throughput DNA sequencing to sequence microRNAs, also called miRNAs. miRNA-seq differs from other forms of RNA-seq in that input material is often enriched for small RNAs. miRNA-seq allows researchers to examine tissue-specific expression patterns, disease associations, and isoforms of miRNAs, and to discover previously uncharacterized miRNAs. Evidence that dysregulated miRNAs play a role in diseases such as cancer has positioned miRNA-seq to potentially become an important tool in the future for diagnostics and prognostics as costs continue to decrease. Like other miRNA profiling technologies, miRNA-Seq has both advantages and disadvantages.

Chimeric RNA, sometimes referred to as a fusion transcript, is composed of exons from two or more different genes that have the potential to encode novel proteins. These mRNAs are different from those produced by conventional splicing as they are produced by two or more gene loci.

Bowtie is a software package commonly used for sequence alignment and sequence analysis in bioinformatics. The source code for the package is distributed freely and compiled binaries are available for Linux, macOS and Windows platforms. As of 2017, the Genome Biology paper describing the original Bowtie method has been cited more than 11,000 times. Bowtie is open-source software and is currently maintained by Johns Hopkins University.

Single-cell sequencing examines the nucleic acid sequence information from individual cells with optimized next-generation sequencing technologies, providing a higher resolution of cellular differences and a better understanding of the function of an individual cell in the context of its microenvironment. For example, in cancer, sequencing the DNA of individual cells can give information about mutations carried by small populations of cells. In development, sequencing the RNAs expressed by individual cells can give insight into the existence and behavior of different cell types. In microbial systems, a population of the same species can appear genetically clonal. Still, single-cell sequencing of RNA or epigenetic modifications can reveal cell-to-cell variability that may help populations rapidly adapt to survive in changing environments.

<span class="mw-page-title-main">Alicia Oshlack</span> Australian bioinformatician

Alicia Yinema Kate Nungarai Oshlack is an Australian bioinformatician and is Co-Head of Computational Biology at the Peter MacCallum Cancer Centre in Melbourne, Victoria, Australia. She is best known for her work developing methods for the analysis of transcriptome data as a measure of gene expression. She has characterized the role of gene expression in human evolution by comparisons of humans, chimpanzees, orangutans, and rhesus macaques, and works collaboratively in data analysis to improve the use of clinical sequencing of RNA samples by RNAseq for human disease diagnosis.

Metatranscriptomics is the set of techniques used to study gene expression of microbes within natural environments, i.e., the metatranscriptome.

TopHat is an open-source bioinformatics tool for the throughput alignment of shotgun cDNA sequencing reads generated by transcriptomics technologies using Bowtie first and then mapping to a reference genome to discover RNA splice sites de novo. TopHat aligns RNA-Seq reads to mammalian-sized genomes.

In molecular phylogenetics, relationships among individuals are determined using character traits, such as DNA, RNA or protein, which may be obtained using a variety of sequencing technologies. High-throughput next-generation sequencing has become a popular technique in transcriptomics, which represent a snapshot of gene expression. In eukaryotes, making phylogenetic inferences using RNA is complicated by alternative splicing, which produces multiple transcripts from a single gene. As such, a variety of approaches may be used to improve phylogenetic inference using transcriptomic data obtained from RNA-Seq and processed using computational phylogenetics.

Third-generation sequencing is a class of DNA sequencing methods which produce longer sequence reads, under active development since 2008.

Transcriptomics technologies are the techniques used to study an organism's transcriptome, the sum of all of its RNA transcripts. The information content of an organism is recorded in the DNA of its genome and expressed through transcription. Here, mRNA serves as a transient intermediary molecule in the information network, whilst non-coding RNAs perform additional diverse functions. A transcriptome captures a snapshot in time of the total transcripts present in a cell. Transcriptomics technologies provide a broad account of which cellular processes are active and which are dormant. A major challenge in molecular biology is to understand how a single genome gives rise to a variety of cells. Another is how gene expression is regulated.

References

  1. Wang Z, Gerstein M, Snyder M (January 2009). "RNA-Seq: a revolutionary tool for transcriptomics". Nature Reviews. Genetics. 10 (1): 57–63. doi:10.1038/nrg2484. PMC   2949280 . PMID   19015660.
  2. Kukurba KR, Montgomery SB (April 2015). "RNA Sequencing and Analysis". Cold Spring Harbor Protocols. 2015 (11): 951–969. doi:10.1101/pdb.top084970. PMC   4863231 . PMID   25870306.
  3. Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A, et al. (January 2016). "A survey of best practices for RNA-seq data analysis". Genome Biology. 17 (13): 13. doi: 10.1186/s13059-016-0881-8 . PMC   4728800 . PMID   26813401.
  4. "RNA Sequencing and analysis" (PDF). Canadian Bioinformatics Workshops. 2012.
  5. Poplawski A, Binder H (July 2018). "Feasibility of sample size calculation for RNA-seq studies". Briefings in Bioinformatics. 19 (4): 713–720. doi:10.1093/bib/bbw144. PMID   28100468. S2CID   28848959.
  6. Sheng Q, Vickers K, Zhao S, Wang J, Samuels DC, Koues O, et al. (July 2017). "Multi-perspective quality control of Illumina RNA sequencing data analysis". Briefings in Functional Genomics. 16 (4): 194–204. doi:10.1093/bfgp/elw035. PMC   5860075 . PMID   27687708.
  7. 1 2 Hoogstrate Y, Komor MA, Böttcher R, van Riet J, van de Werken HJ, van Lieshout S, et al. (December 2021). "Fusion transcripts and their genomic breakpoints in polyadenylated and ribosomal RNA-minus RNA sequencing data". GigaScience. 10 (12): giab080. doi:10.1093/gigascience/giab080. PMC   8673554 . PMID   34891161.
  8. Sayols S, Klein H (2015). "dupRadar: Assessment of duplication rates in RNA-Seq datasets. R package version 1.1.0". doi:10.18129/B9.bioc.dupRadar.{{cite journal}}: Cite journal requires |journal= (help)
  9. Davis MP, van Dongen S, Abreu-Goodger C, Bartonicek N, Enright AJ (September 2013). "Kraken: a set of tools for quality control and analysis of high-throughput sequence data". Methods. 63 (1): 41–49. doi:10.1016/j.ymeth.2013.06.027. PMC   3991327 . PMID   23816787.
  10. Anders S, Pyl PT, Huber W (January 2015). "HTSeq--a Python framework to work with high-throughput sequencing data". Bioinformatics. 31 (2): 166–169. doi:10.1093/bioinformatics/btu638. PMC   4287950 . PMID   25260700.
  11. Feng H, Zhang X, Zhang C (August 2015). "mRIN for direct assessment of genome-wide and gene-specific mRNA integrity from large-scale RNA-sequencing data". Nature Communications. 6 (7816): 7816. Bibcode:2015NatCo...6.7816F. doi:10.1038/ncomms8816. PMC   4523900 . PMID   26234653.
  12. Ewels P, Magnusson M, Lundin S, Käller M (October 2016). "MultiQC: summarize analysis results for multiple tools and samples in a single report". Bioinformatics. 32 (19): 3047–3048. doi:10.1093/bioinformatics/btw354. PMC   5039924 . PMID   27312411.
  13. DeLuca DS, Levin JZ, Sivachenko A, Fennell T, Nazaire MD, Williams C, et al. (June 2012). "RNA-SeQC: RNA-seq metrics for quality control and process optimization". Bioinformatics. 28 (11): 1530–1532. doi:10.1093/bioinformatics/bts196. PMC   3356847 . PMID   22539670.
  14. Wang L, Wang S, Li W (August 2012). "RSeQC: quality control of RNA-seq experiments". Bioinformatics. 28 (16): 2184–2185. doi: 10.1093/bioinformatics/bts356 . PMID   22743226.
  15. Lassmann T, Hayashizaki Y, Daub CO (January 2011). "SAMStat: monitoring biases in next generation sequencing data". Bioinformatics. 27 (1): 130–131. doi:10.1093/bioinformatics/btq614. PMC   3008642 . PMID   21088025.
  16. Lahens NF, Kavakli IH, Zhang R, Hayer K, Black MB, Dueck H, et al. (June 2014). "IVT-seq reveals extreme bias in RNA sequencing". Genome Biology. 15 (6): R86. doi: 10.1186/gb-2014-15-6-r86 . PMC   4197826 . PMID   24981968.
  17. Li S, Łabaj PP, Zumbo P, Sykacek P, Shi W, Shi L, et al. (September 2014). "Detecting and correcting systematic variation in large-scale RNA sequencing data". Nature Biotechnology. 32 (9): 888–895. doi:10.1038/nbt.3000. PMC   4160374 . PMID   25150837.
  18. Benjamini Y, Speed TP (May 2012). "Summarizing and correcting the GC content bias in high-throughput sequencing". Nucleic Acids Research. 40 (10): e72. doi:10.1093/nar/gks001. PMC   3378858 . PMID   22323520.
  19. Aird D, Ross MG, Chen WS, Danielsson M, Fennell T, Russ C, et al. (2011). "Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries". Genome Biology. 12 (2): R18. doi: 10.1186/gb-2011-12-2-r18 . PMC   3188800 . PMID   21338519.
  20. Adiconis X, Borges-Rivera D, Satija R, DeLuca DS, Busby MA, Berlin AM, et al. (July 2013). "Comparative analysis of RNA sequencing methods for degraded or low-input samples". Nature Methods. 10 (7): 623–629. doi:10.1038/nmeth.2483. PMC   3821180 . PMID   23685885.
  21. Nakamura K, Oshima T, Morimoto T, Ikeda S, Yoshikawa H, Shiwa Y, et al. (July 2011). "Sequence-specific error profile of Illumina sequencers". Nucleic Acids Research. 39 (13): e90. doi:10.1093/nar/gkr344. PMC   3141275 . PMID   21576222.
  22. Hansen KD, Brenner SE, Dudoit S (July 2010). "Biases in Illumina transcriptome sequencing caused by random hexamer priming". Nucleic Acids Research. 38 (12): e131. doi:10.1093/nar/gkq224. PMC   2896536 . PMID   20395217.
  23. Criscuolo A, Brisse S (November 2013). "AlienTrimmer: a tool to quickly and accurately trim off multiple short contaminant sequences from high-throughput sequencing reads". Genomics. 102 (5–6): 500–506. doi: 10.1016/j.ygeno.2013.07.011 . PMID   23912058.
  24. Smeds L, Künstner A (19 October 2011). "ConDeTri--a content dependent read trimmer for Illumina data". PLOS ONE. 6 (10): e26314. Bibcode:2011PLoSO...626314S. doi: 10.1371/journal.pone.0026314 . PMC   3198461 . PMID   22039460.
  25. Magoč T, Salzberg SL (November 2011). "FLASH: fast length adjustment of short reads to improve genome assemblies". Bioinformatics. 27 (21): 2957–2963. doi:10.14806/ej.17.1.200. PMC   3198573 . PMID   21903629.
  26. Prezza N, Del Fabbro C, Vezzi F, De Paoli E, Policriti A (2012). "Erne-Bs5". Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine. Vol. 12. pp. 12–19. doi:10.1145/2382936.2382938. ISBN   9781450316705. S2CID   5673753.
  27. Schmieder R, Edwards R (March 2011). "Quality control and preprocessing of metagenomic datasets". Bioinformatics. 27 (6): 863–864. doi:10.1093/bioinformatics/btr026. PMC   3051327 . PMID   21278185.
  28. Dlugosch KM, Lai Z, Bonin A, Hierro J, Rieseberg LH (February 2013). "Allele identification for transcriptome-based population genomics in the invasive plant Centaurea solstitialis". G3. 3 (2): 359–367. doi:10.1534/g3.112.003871. PMC   3564996 . PMID   23390612.
  29. Bolger AM, Lohse M, Usadel B (August 2014). "Trimmomatic: a flexible trimmer for Illumina sequence data". Bioinformatics. 30 (15): 2114–2120. doi:10.1093/bioinformatics/btu170. PMC   4103590 . PMID   24695404.
  30. Laehnemann D, Borkhardt A, McHardy AC (January 2016). "Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction". Briefings in Bioinformatics. 17 (1): 154–179. doi:10.1093/bib/bbv029. PMC   4719071 . PMID   26026159.
  31. Quince C, Lanzen A, Davenport RJ, Turnbaugh PJ (January 2011). "Removing noise from pyrosequenced amplicons". BMC Bioinformatics. 12 (38): 38. doi: 10.1186/1471-2105-12-38 . PMC   3045300 . PMID   21276213.
  32. Heo Y, Wu XL, Chen D, Ma J, Hwu WM (May 2014). "BLESS: bloom filter-based error correction solution for high-throughput sequencing reads". Bioinformatics. 30 (10): 1354–1362. doi:10.1093/bioinformatics/btu030. PMC   6365934 . PMID   24451628.
  33. Greenfield P, Duesing K, Papanicolaou A, Bauer DC (October 2014). "Blue: correcting sequencing errors using consensus and context". Bioinformatics. 30 (19): 2723–2732. doi: 10.1093/bioinformatics/btu368 . PMID   24919879.
  34. Michael I Love; John B Hogenesch; Rafael A Irizarry (2015). "Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation". bioRxiv   10.1101/025767 .
  35. Hansen KD, Irizarry RA, Wu Z (April 2012). "Removing technical variability in RNA-seq data using conditional quantile normalization". Biostatistics. 13 (2): 204–216. doi:10.1093/biostatistics/kxr054. PMC   3297825 . PMID   22285995.
  36. Risso D, Schwartz K, Sherlock G, Dudoit S (December 2011). "GC-content normalization for RNA-Seq data". BMC Bioinformatics. 12 (1): 480. doi: 10.1186/1471-2105-12-480 . PMC   3315510 . PMID   22177264.
  37. Stegle O, Parts L, Piipari M, Winn J, Durbin R (February 2012). "Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses". Nature Protocols. 7 (3): 500–507. doi:10.1038/nprot.2011.457. PMC   3398141 . PMID   22343431.
  38. Risso D, Ngai J, Speed TP, Dudoit S (September 2014). "Normalization of RNA-seq data using factor analysis of control genes or samples". Nature Biotechnology. 32 (9): 896–902. doi:10.1038/nbt.2931. PMC   4404308 . PMID   25150836.
  39. Meacham F, Boffelli D, Dhahbi J, Martin DI, Singer M, Pachter L (November 2011). "Identification and correction of systematic error in high-throughput sequence data". BMC Bioinformatics. 12 (1): 451. doi: 10.1186/1471-2105-12-451 . PMC   3295828 . PMID   22099972.
  40. Liu B, Yuan J, Yiu SM, Li Z, Xie Y, Chen Y, et al. (November 2012). "COPE: an accurate k-mer-based pair-end reads connection tool to facilitate genome assembly". Bioinformatics. 28 (22): 2870–2874. doi: 10.1093/bioinformatics/bts563 . PMID   23044551.
  41. Zhang J, Kobert K, Flouri T, Stamatakis A (March 2014). "PEAR: a fast and accurate Illumina Paired-End reAd mergeR". Bioinformatics. 30 (5): 614–620. doi:10.1093/bioinformatics/btt593. PMC   3933873 . PMID   24142950.
  42. Rodrigue S, Materna AC, Timberlake SC, Blackburn MC, Malmstrom RR, Alm EJ, Chisholm SW (July 2010). "Unlocking short read sequencing for metagenomics". PLOS ONE. 5 (7): e11840. Bibcode:2010PLoSO...511840R. doi: 10.1371/journal.pone.0011840 . PMC   2911387 . PMID   20676378.
  43. Sangiovanni M, Granata I, Thind AS, Guarracino MR (April 2019). "From trash to treasure: detecting unexpected contamination in unmapped NGS data". BMC Bioinformatics. 20 (Suppl 4): 168. doi: 10.1186/s12859-019-2684-x . PMC   6472186 . PMID   30999839.
  44. 1 2 Liao Y, Smyth GK, Shi W (May 2013). "The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote". Nucleic Acids Research. 41 (10): e108. doi:10.1093/nar/gkt214. PMC   3664803 . PMID   23558742.
  45. Alamancos GP, Agirre E, Eyras E (2014). "Methods to Study Splicing from High-Throughput RNA Sequencing Data". Spliceosomal Pre-mRNA Splicing. Methods in Molecular Biology. Vol. 1126. pp. 357–97. arXiv: 1304.5952 . doi:10.1007/978-1-62703-980-2_26. ISBN   978-1-62703-979-6. PMID   24549677. S2CID   18574607.
  46. Baruzzo G, Hayer KE, Kim EJ, Di Camillo B, FitzGerald GA, Grant GR (February 2017). "Simulation-based comprehensive benchmarking of RNA-seq aligners". Nature Methods. 14 (2): 135–139. doi:10.1038/nmeth.4106. PMC   5792058 . PMID   27941783.
  47. Campagna D, Telatin A, Forcato C, Vitulo N, Valle G (January 2013). "PASS-bis: a bisulfite aligner suitable for whole methylome analysis of Illumina and SOLiD reads". Bioinformatics. 29 (2): 268–270. doi: 10.1093/bioinformatics/bts675 . PMID   23162053.
  48. Ahn J, Xiao X (December 2015). "RASER: reads aligner for SNPs and editing sites of RNA". Bioinformatics. 31 (24): 3906–3913. doi:10.1093/bioinformatics/btv505. PMC   4692970 . PMID   26323713.
  49. 1 2 Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. (January 2013). "STAR: ultrafast universal RNA-seq aligner". Bioinformatics. 29 (1): 15–21. doi:10.1093/bioinformatics/bts635. PMC   3530905 . PMID   23104886.
  50. Trapnell C, Pachter L, Salzberg SL (May 2009). "TopHat: discovering splice junctions with RNA-Seq". Bioinformatics. 25 (9): 1105–1111. doi:10.1093/bioinformatics/btp120. PMC   2672628 . PMID   19289445.
  51. Pachter L (2011). "Models for transcript quantification from RNA-Seq". arXiv: 1104.3889 [q-bio.GN].
  52. Jin H, Wan YW, Liu Z (March 2017). "Comprehensive evaluation of RNA-seq quantification methods for linearity". BMC Bioinformatics. 18 (Suppl 4): 117. doi: 10.1186/s12859-017-1526-y . PMC   5374695 . PMID   28361706.
  53. Kvam VM, Liu P, Si Y (February 2012). "A comparison of statistical methods for detecting differentially expressed genes from RNA-seq data". American Journal of Botany. 99 (2): 248–256. doi: 10.3732/ajb.1100340 . PMID   22268221.
  54. Dillies MA, Rau A, Aubert J, Hennequet-Antier C, Jeanmougin M, Servant N, et al. (November 2013). "A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis". Briefings in Bioinformatics. 14 (6): 671–683. doi: 10.1093/bib/bbs046 . PMID   22988256.
  55. Evans C, Hardin J, Stoebel DM (September 2018). "Selecting between-sample RNA-Seq normalization methods from the perspective of their assumptions". Briefings in Bioinformatics. 19 (5): 776–792. doi:10.1093/bib/bbx008. PMC   6171491 . PMID   28334202.
  56. Wu Z, Jenkins BD, Rynearson TA, Dyhrman ST, Saito MA, Mercier M, Whitney LP (November 2010). "Empirical bayes analysis of sequencing-based transcriptional profiling without replicates". BMC Bioinformatics. 11: 564. doi: 10.1186/1471-2105-11-564 . PMC   3098101 . PMID   21080965.
  57. Hajiramezanali, E. & Dadaneh, S. Z. & Figueiredo, P. d. & Sze, S. & Zhou, Z. & Qian, X. Differential Expression Analysis of Dynamical Sequencing Count Data with a Gamma Markov Chain. arXiv : 1803.02527
  58. 1 2 Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, et al. (May 2010). "Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation". Nature Biotechnology. 28 (5): 511–515. doi:10.1038/nbt.1621. PMC   3146043 . PMID   20436464.
  59. Klambauer G, Unterthiner T, Hochreiter S (November 2013). "DEXUS: identifying differential expression in RNA-Seq studies with unknown conditions". Nucleic Acids Research. 41 (21): e198. doi:10.1093/nar/gkt834. PMC   3834838 . PMID   24049071.
  60. Vavoulis DV, Francescatto M, Heutink P, Gough J (February 2015). "DGEclust: differential expression analysis of clustered count data". Genome Biology. 16 (1): 39. doi: 10.1186/s13059-015-0604-6 . PMC   4365804 . PMID   25853652.
  61. Yépez, Vicente A.; Mertes, Christian; Müller, Michaela F.; Klaproth-Andrade, Daniela; Wachutka, Leonhard; Frésard, Laure; Gusic, Mirjana; Scheller, Ines F.; Goldberg, Patricia F.; Prokisch, Holger; Gagneur, Julien (February 2021). "Detection of aberrant gene expression events in RNA sequencing data". Nature Protocols. 16 (2): 1276–1296. doi:10.1038/s41596-020-00462-5. PMID   33462443.
  62. Feng J, Meyer CA, Wang Q, Liu JS, Shirley Liu X, Zhang Y (November 2012). "GFOLD: a generalized fold change for ranking differentially expressed genes from RNA-seq data". Bioinformatics. 28 (21): 2782–2788. doi: 10.1093/bioinformatics/bts515 . PMID   22923299.
  63. Rauschenberger A, Jonker MA, van de Wiel MA, Menezes RX (March 2016). "Testing for association between RNA-Seq and high-dimensional data". BMC Bioinformatics. 17 (118): 118. doi: 10.1186/s12859-016-0961-5 . PMC   4782413 . PMID   26951498.
  64. Cao M, Zhou W, Breidt FJ, Peers G (March 2020). "Large scale maximum average power multiple inference on time-course count data with application to RNA-seq analysis". Biometrics. 76 (1): 9–22. doi: 10.1111/biom.13144 . PMID   31483480.
  65. Moulos P, Hatzis P (February 2015). "Systematic integration of RNA-Seq statistical algorithms for accurate detection of differential gene expression patterns". Nucleic Acids Research. 43 (4): e25. doi:10.1093/nar/gku1273. PMC   4344485 . PMID   25452340.
  66. Hoogstrate, Youri; Draaisma, Kaspar; Ghisai, Santoesha A.; van Hijfte, Levi; Barin, Nastaran; de Heer, Iris; Coppieters, Wouter; van den Bosch, Thierry P. P.; Bolleboom, Anne; Gao, Zhenyu; Vincent, Arnaud J. P. E.; Karim, Latifa; Deckers, Manon; Taphoorn, Martin J. B.; Kerkhof, Melissa; Weyerbrock, Astrid; Sanson, Marc; Hoeben, Ann; Lukacova, Slávka; Lombardi, Giuseppe; Leenstra, Sieger; Hanse, Monique; Fleischeuer, Ruth E. M.; Watts, Colin; Angelopoulos, Nicos; Gorlia, Thierry; Golfinopoulos, Vassilis; Bours, Vincent; van den Bent, Martin J.; Robe, Pierre A.; French, Pim J. (9 March 2023). "Transcriptome analysis reveals tumor microenvironment changes in glioblastoma". Cancer Cell. 41 (4): 678–692.e7. doi: 10.1016/j.ccell.2023.02.019 . PMID   36898379. S2CID   257437946.
  67. Rauschenberger A, Menezes RX, van de Wiel MA, van Schoor NM, Jonker MA (2018). "Detecting SNPs with interactive effects on a quantitative trait". arXiv: 1805.09175 [stat.ME].
  68. Vera Alvarez R, Pongor LS, Mariño-Ramírez L, Landsman D (June 2019). "TPMCalculator: one-step software to quantify mRNA abundance of genomic features". Bioinformatics. 35 (11): 1960–1962. doi: 10.1093/bioinformatics/bty896 . PMC   6546121 . PMID   30379987.
  69. Navarro FC, Hoops J, Bellfy L, Cerveira E, Zhu Q, Zhang C, et al. (August 2019). "TeXP: Deconvolving the effects of pervasive and autonomous transcription of transposable elements". PLOS Computational Biology. 15 (8): e1007293. Bibcode:2019PLSCB..15E7293N. doi: 10.1371/journal.pcbi.1007293 . PMC   6715295 . PMID   31425522.
  70. Akhmedov M, Martinelli A, Geiger R, Kwee I (March 2020). "Omics Playground: a comprehensive self-service platform for visualization, analytics and exploration of Big Omics Data". NAR Genomics and Bioinformatics. 2 (1): lqz019. doi:10.1093/nargab/lqz019. PMC   7671354 . PMID   33575569.
  71. Yao L, Wang H, Song Y, Sui G (October 2017). "BioQueue: a novel pipeline framework to accelerate bioinformatics analysis". Bioinformatics. 33 (20): 3286–3288. doi: 10.1093/bioinformatics/btx403 . PMID   28633441.
  72. Kartashov AV, Barski A (August 2015). "BioWardrobe: an integrated platform for analysis of epigenomics and transcriptomics data". Genome Biology. 16 (1): 158. doi: 10.1186/s13059-015-0720-3 . PMC   4531538 . PMID   26248465.
  73. Levin L, Bar-Yaacov D, Bouskila A, Chorev M, Carmel L, Mishmar D (2015). "LEMONS - A Tool for the Identification of Splice Junctions in Transcriptomes of Organisms Lacking Reference Genomes". PLOS ONE. 10 (11): e0143329. Bibcode:2015PLoSO..1043329L. doi: 10.1371/journal.pone.0143329 . PMC   4659627 . PMID   26606265.
  74. Pundhir S, Gorodkin J (July 2015). "Differential and coherent processing patterns from small RNAs". Scientific Reports. 5: 12062. Bibcode:2015NatSR...512062P. doi:10.1038/srep12062. PMC   4499813 . PMID   26166713.
  75. Rogers MF, Thomas J, Reddy AS, Ben-Hur A (January 2012). "SpliceGrapher: detecting patterns of alternative splicing from RNA-Seq data in the context of gene models and EST data". Genome Biology. 13 (1): R4. doi: 10.1186/gb-2012-13-1-r4 . PMC   3334585 . PMID   22293517.
  76. Rogers MF, Boucher C, Ben-Hur A (2013). "SpliceGrapherXT". Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics. BCB'13. New York, NY, USA: ACM. pp. 247:247–247:255. doi:10.1145/2506583.2506625. ISBN   9781450324342. S2CID   15009112.
  77. Wu J, Akerman M, Sun S, McCombie WR, Krainer AR, Zhang MQ (November 2011). "SpliceTrap: a method to quantify alternative splicing under single cellular conditions". Bioinformatics. 27 (21): 3010–3016. doi:10.1093/bioinformatics/btr508. PMC   3198574 . PMID   21896509.
  78. Mertes, Christian; Scheller, Ines F.; Yépez, Vicente A.; Çelik, Muhammed H.; Liang, Yingjiqiong; Kremer, Laura S.; Gusic, Mirjana; Prokisch, Holger; Gagneur, Julien (22 January 2021). "Detection of aberrant splicing events in RNA-seq data using FRASER". Nature Communications. 12 (1): 529. Bibcode:2021NatCo..12..529M. doi:10.1038/s41467-020-20573-7. PMC   7822922 . PMID   33483494.
  79. Scheller, Ines F.; Lutz, Karoline; Mertes, Christian; Yépez, Vicente A.; Gagneur, Julien (December 2023). "Improved detection of aberrant splicing with FRASER 2.0 and the intron Jaccard index". The American Journal of Human Genetics. 110 (12): 2056–2067. doi:10.1016/j.ajhg.2023.10.014. PMID   38006880.
  80. Vitting-Seerup K, Sandelin A (September 2017). "The Landscape of Isoform Switches in Human Cancers". Molecular Cancer Research. 15 (9): 1206–1220. doi: 10.1158/1541-7786.mcr-16-0459 . PMID   28584021.
  81. Nowicka M, Robinson MD (6 December 2016). "DRIMSeq: a Dirichlet-multinomial framework for multivariate count outcomes in genomics". F1000Research. 5: 1356. doi: 10.12688/f1000research.8900.2 . PMC   5200948 . PMID   28105305.
  82. Papastamoulis P, Rattray M (November 2017). "Bayesian estimation of differential transcript usage from RNA-seq data". Statistical Applications in Genetics and Molecular Biology. 16 (5–6): 367–386. arXiv: 1701.03095 . Bibcode:2017arXiv170103095P. doi:10.1515/sagmb-2017-0005. PMID   29091583. S2CID   915799.
  83. Shi Y, Chinnaiyan AM, Jiang H (July 2015). "rSeqNP: a non-parametric approach for detecting differential expression and splicing from RNA-Seq data". Bioinformatics. 31 (13): 2222–2224. doi:10.1093/bioinformatics/btv119. PMC   4481847 . PMID   25717189.
  84. Jones DC, Kuppusamy KT, Palpant NJ, Peng X, Murry CE, Ruohola-Baker H, Ruzzo WL (20 November 2016). "Isolator: accurate and stable analysis of isoform-level expression in RNA-Seq experiments". bioRxiv   10.1101/088765 .
  85. Kumar S, Vo AD, Qin F, Li H (February 2016). "Comparative assessment of methods for the fusion transcripts detection from RNA-Seq data". Scientific Reports. 6 (21587): 21597. Bibcode:2016NatSR...621597K. doi:10.1038/srep21597. PMC   4748267 . PMID   26862001.
  86. Uhrig S, Ellermann J, Walther T, Burkhardt P, Fröhlich M, Hutter B, et al. (March 2021). "Accurate and efficient detection of gene fusions from RNA sequencing data". Genome Research. 31 (3): 448–460. doi:10.1101/gr.257246.119. PMC   7919457 . PMID   33441414.
  87. Creason A, Haan D, Dang K, Chiotti KE, Inkman M, Lamb A, et al. (August 2021). "A community challenge to evaluate RNA-seq, fusion detection, and isoform quantification methods for cancer discovery". Cell Systems. 12 (8): 827–838.e5. doi:10.1016/j.cels.2021.05.021. PMC   8376800 . PMID   34146471.
  88. Abate, Francesco; Acquaviva, Andrea; Paciello, Giulia; Foti, Carmelo; Ficarra, Elisa; Ferrarini, Alberto; Delledonne, Massimo; Iacobucci, Ilaria; Soverini, Simona; Martinelli, Giovanni; Macii, Enrico (15 August 2012). "Bellerophontes: an RNA-Seq data analysis framework for chimeric transcripts discovery based on accurate fusion model". Bioinformatics. 28 (16): 2114–2121. doi: 10.1093/bioinformatics/bts334 . ISSN   1367-4811. PMID   22711792.
  89. Fan, Xian; Abbott, Travis E.; Larson, David; Chen, Ken (2014). "BreakDancer: Identification of Genomic Structural Variation from Paired-End Read Mapping". Current Protocols in Bioinformatics. 45: 15.6.1–11. doi:10.1002/0471250953.bi1506s45. ISSN   1934-340X. PMC   4138716 . PMID   25152801.
  90. Chen, Ken; Wallis, John W.; Kandoth, Cyriac; Kalicki-Veizer, Joelle M.; Mungall, Karen L.; Mungall, Andrew J.; Jones, Steven J.; Marra, Marco A.; Ley, Timothy J.; Mardis, Elaine R.; Wilson, Richard K.; Weinstein, John N.; Ding, Li (15 July 2012). "BreakFusion: targeted assembly-based identification of gene fusions in whole transcriptome paired-end sequencing data". Bioinformatics. 28 (14): 1923–1924. doi:10.1093/bioinformatics/bts272. ISSN   1367-4811. PMC   3389765 . PMID   22563071.
  91. Iyer, Matthew K.; Chinnaiyan, Arul M.; Maher, Christopher A. (11 August 2011). "ChimeraScan: a tool for identifying chimeric transcription in sequencing data". Bioinformatics. 27 (20): 2903–2904. doi:10.1093/bioinformatics/btr467. ISSN   1367-4811. PMC   3187648 . PMID   21840877.
  92. Chu, Hsueh-Ting; Hsiao, William W. L.; Chen, Jen-Chih; Yeh, Tze-Jung; Tsai, Mong-Hsun; Lin, Han; Liu, Yen-Wenn; Lee, Sheng-An; Chen, Chaur-Chin; Tsao, Theresa T. H.; Kao, Cheng-Yan (1 March 2013). "EBARDenovo: highly accurate de novo assembly of RNA-Seq with efficient chimera-detection". Bioinformatics. 29 (8): 1004–1010. doi:10.1093/bioinformatics/btt092. ISSN   1367-4811. PMID   23457040.
  93. 1 2 Haas, Brian J.; Dobin, Alex; Stransky, Nicolas; Li, Bo; Yang, Xiao; Tickle, Timothy; Bankapur, Asma; Ganote, Carrie; Doak, Thomas G. (24 March 2017). "STAR-Fusion: Fast and Accurate Fusion Transcript Detection from RNA-Seq". doi:10.1101/120295. S2CID   43186395 . Retrieved 30 August 2023.{{cite journal}}: Cite journal requires |journal= (help)
  94. 1 2 Nicorici, Daniel; Satalan, Mihaela; Edgren, Henrik; Kangaspeska, Sara; Murumagi, Astrid; Kallioniemi, Olli; Virtanen, Sami; Kilkku, Olavi (19 November 2014). "FusionCatcher - a tool for finding somatic fusion genes in paired-end RNA-sequencing data". doi:10.1101/011650. S2CID   85702767 . Retrieved 30 August 2023.{{cite journal}}: Cite journal requires |journal= (help)
  95. 1 2 Okonechnikov, Konstantin; Imai-Matsushima, Aki; Paul, Lukas; Seitz, Alexander; Meyer, Thomas F.; Garcia-Alcalde, Fernando (1 December 2016). "InFusion: Advancing Discovery of Fusion Genes and Chimeric Transcripts from Deep RNA-Sequencing Data". PLOS ONE. 11 (12): e0167417. Bibcode:2016PLoSO..1167417O. doi: 10.1371/journal.pone.0167417 . ISSN   1932-6203. PMC   5132003 . PMID   27907167.
  96. 1 2 Wang K, Singh D, Zeng Z, Coleman SJ, Huang Y, Savich GL, et al. (October 2010). "MapSplice: accurate mapping of RNA-seq reads for splice junction discovery". Nucleic Acids Research. 38 (18): e178. doi:10.1093/nar/gkq622. PMC   2952873 . PMID   20802226.
  97. 1 2 Jia W, Qiu K, He M, Song P, Zhou Q, Zhou F, et al. (February 2013). "SOAPfuse: an algorithm for identifying fusion transcripts from paired-end RNA-Seq data". Genome Biology. 14 (2): R12. doi: 10.1186/gb-2013-14-2-r12 . PMC   4054009 . PMID   23409703.
  98. Weber, David; Ibn-Salem, Jonas; Sorn, Patrick; Suchan, Martin; Holtsträter, Christoph; Lahrmann, Urs; Vogler, Isabel; Schmoldt, Kathrin; Lang, Franziska; Schrörs, Barbara; Löwer, Martin; Sahin, Ugur (4 April 2022). "Accurate detection of tumor-specific gene fusions reveals strongly immunogenic personal neo-antigens". Nature Biotechnology. 40 (8): 1276–1284. doi:10.1038/s41587-022-01247-9. ISSN   1087-0156. PMC   7613288 . PMID   35379963.
  99. Benelli, Matteo; Pescucci, Chiara; Marseglia, Giuseppina; Severgnini, Marco; Torricelli, Francesca; Magi, Alberto (23 October 2012). "Discovering chimeric transcripts in paired-end RNA-seq data by using EricScript". Bioinformatics. 28 (24): 3232–3239. doi: 10.1093/bioinformatics/bts617 . ISSN   1367-4811. PMID   23093608.
  100. Dehghannasiri R, Freeman DE, Jordanski M, Hsieh GL, Damljanovic A, Lehnert E, Salzman J (July 2019). "Improved detection of gene fusions by applying statistical methods reveals oncogenic RNA cancer drivers". Proceedings of the National Academy of Sciences of the United States of America. 116 (31): 15524–15533. Bibcode:2019PNAS..11615524D. doi: 10.1073/pnas.1900391116 . PMC   6681709 . PMID   31308241.
  101. McPherson, Andrew; Hormozdiari, Fereydoun; Zayed, Abdalnasser; Giuliany, Ryan; Ha, Gavin; Sun, Mark G. F.; Griffith, Malachi; Heravi Moussavi, Alireza; Senz, Janine; Melnyk, Nataliya; Pacheco, Marina; Marra, Marco A.; Hirst, Martin; Nielsen, Torsten O.; Sahinalp, S. Cenk (May 2011). "deFuse: an algorithm for gene fusion discovery in tumor RNA-Seq data". PLOS Computational Biology. 7 (5): e1001138. Bibcode:2011PLSCB...7E1138M. doi: 10.1371/journal.pcbi.1001138 . ISSN   1553-7358. PMC   3098195 . PMID   21625565.
  102. Hoogstrate Y, Ghisai SA, de Wit M, de Heer I, Draaisma K, van Riet J, et al. (March 2022). "The EGFRvIII transcriptome in glioblastoma: A meta-omics analysis". Neuro-Oncology. 24 (3): 429–441. doi:10.1093/neuonc/noab231. PMC   8917407 . PMID   34608482.
  103. Piazza, Rocco; Pirola, Alessandra; Spinelli, Roberta; Valletta, Simona; Redaelli, Sara; Magistroni, Vera; Gambacorti-Passerini, Carlo (September 2012). "FusionAnalyser: a new graphical, event-driven tool for fusion rearrangements discovery". Nucleic Acids Research. 40 (16): e123. doi:10.1093/nar/gks394. ISSN   1362-4962. PMC   3439881 . PMID   22570408.
  104. Ge, Huanying; Liu, Kejun; Juan, Todd; Fang, Fang; Newman, Matthew; Hoeck, Wolfgang (18 May 2011). "FusionMap: detecting fusion genes from next-generation sequencing data at base-pair resolution". Bioinformatics. 27 (14): 1922–1928. doi:10.1093/bioinformatics/btr310. ISSN   1367-4803. PMID   21593131.
  105. Sboner, Andrea; Habegger, Lukas; Pflueger, Dorothee; Terry, Stephane; Chen, David Z; Rozowsky, Joel S; Tewari, Ashutosh K; Kitabayashi, Naoki; Moss, Benjamin J; Chee, Mark S; Demichelis, Francesca; Rubin, Mark A; Gerstein, Mark B (October 2010). "FusionSeq: a modular framework for finding gene fusions by analyzing paired-end RNA-sequencing data". Genome Biology. 11 (10): R104. doi: 10.1186/gb-2010-11-10-r104 . ISSN   1474-760X. PMC   3218660 . PMID   20964841.
  106. Davidson, Nadia M; Majewski, Ian J; Oshlack, Alicia (12 January 2015). "JAFFA: High sensitivity transcriptome-focused fusion gene detection". Genome Medicine. 7 (1): 43. bioRxiv   10.1101/013698 . doi: 10.1186/s13073-015-0167-x (inactive 20 February 2024). hdl:11343/261352. PMC   4445815 . PMID   26019724.{{cite journal}}: CS1 maint: DOI inactive as of February 2024 (link)
  107. McPherson, Andrew; Wu, Chunxiao; Wyatt, Alexander W.; Shah, Sohrab; Collins, Colin; Sahinalp, S. Cenk (28 June 2012). "nFuse: Discovery of complex genomic rearrangements in cancer using high-throughput sequencing". Genome Research. 22 (11): 2250–2261. doi:10.1101/gr.136572.111. ISSN   1088-9051. PMC   3483554 . PMID   22745232.
  108. Torres-García, Wandaliz; Zheng, Siyuan; Sivachenko, Andrey; Vegesna, Rahulsimham; Wang, Qianghu; Yao, Rong; Berger, Michael F.; Weinstein, John N.; Getz, Gad; Verhaak, Roel G.W. (1 April 2014). "PRADA: pipeline for RNA sequencing data analysis". Bioinformatics. 30 (15): 2224–2226. doi:10.1093/bioinformatics/btu169. ISSN   1367-4811. PMC   4103589 . PMID   24695405.
  109. Wu, Jikun; Zhang, Wenqian; Huang, Songbo; He, Zengquan; Cheng, Yanbing; Wang, Jun; Lam, Tak-Wah; Peng, Zhiyu; Yiu, Siu-Ming (11 October 2013). "SOAPfusion: a robust and effective computational fusion discovery tool for RNA-seq reads". Bioinformatics. 29 (23): 2971–2978. doi:10.1093/bioinformatics/btt522. ISSN   1367-4811. PMID   24123671.
  110. Kim, Daehwan; Salzberg, Steven L (2011). "TopHat-Fusion: an algorithm for discovery of novel fusion transcripts". Genome Biology. 12 (8): R72. doi: 10.1186/gb-2011-12-8-r72 . ISSN   1465-6906. PMC   3245612 . PMID   21835007.
  111. Li, Jing-Woei; Wan, Raymond; Yu, Chi-Shing; Co, Ngai Na; Wong, Nathalie; Chan, Ting-Fung (12 January 2013). "ViralFusionSeq: accurately discover viral integration events and reconstruct fusion transcripts at single-base resolution". Bioinformatics. 29 (5): 649–651. doi:10.1093/bioinformatics/btt011. ISSN   1367-4811. PMC   3582262 . PMID   23314323.
  112. Routh A, Johnson JE (January 2014). "Discovery of functional genomic motifs in viruses with ViReMa-a Virus Recombination Mapper-for analysis of next-generation sequencing data". Nucleic Acids Research. 42 (2): e11. doi:10.1093/nar/gkt916. PMC   3902915 . PMID   24137010.
  113. Thind AS, Monga I, Thakur PK, Kumari P, Dindhoria K, Krzak M, et al. (November 2021). "Demystifying emerging bulk RNA-Seq applications: the application and utility of bioinformatic methodology". Briefings in Bioinformatics. 22 (6). doi:10.1093/bib/bbab259. PMID   34329375.
  114. Hashimshony T, Wagner F, Sher N, Yanai I (September 2012). "CEL-Seq: single-cell RNA-Seq by multiplexed linear amplification". Cell Reports. 2 (3): 666–673. doi: 10.1016/j.celrep.2012.08.003 . PMID   22939981.
  115. Macosko EZ, Basu A, Satija R, Nemesh J, Shekhar K, Goldman M, et al. (May 2015). "Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets". Cell. 161 (5): 1202–1214. doi:10.1016/j.cell.2015.05.002. PMC   4481139 . PMID   26000488.
  116. Marco E, Karp RL, Guo G, Robson P, Hart AH, Trippa L, Yuan GC (December 2014). "Bifurcation analysis of single-cell gene expression data reveals epigenetic landscape". Proceedings of the National Academy of Sciences of the United States of America. 111 (52): E5643–E5650. Bibcode:2014PNAS..111E5643M. doi: 10.1073/pnas.1408993111 . PMC   4284553 . PMID   25512504.
  117. Buettner F, Natarajan KN, Casale FP, Proserpio V, Scialdone A, Theis FJ, et al. (February 2015). "Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells". Nature Biotechnology. 33 (2): 155–160. doi: 10.1038/nbt.3102 . PMID   25599176.
  118. Mohammed MH, Ghosh TS, Singh NK, Mande SS (January 2011). "SPHINX--an algorithm for taxonomic binning of metagenomic sequences". Bioinformatics. 27 (1): 22–30. doi:10.1093/bioinformatics/btq608. PMID   21030462.
  119. Stubbington MJ, Lönnberg T, Proserpio V, Clare S, Speak AO, Dougan G, Teichmann SA (April 2016). "T cell fate and clonality inference from single-cell transcriptomes". Nature Methods. 13 (4): 329–332. doi:10.1038/nmeth.3800. PMC   4835021 . PMID   26950746.
  120. Eltahla AA, Rizzetto S, Pirozyan MR, Betz-Stablein BD, Venturi V, Kedzierska K, et al. (July 2016). "Linking the T cell receptor to the single cell transcriptome in antigen-specific human T cells". Immunology and Cell Biology. 94 (6): 604–611. doi:10.1038/icb.2016.16. PMID   26860370. S2CID   25714515.
  121. Trapnell C. "Monocle 3". cole-trapnell-lab.github.io. Retrieved 23 September 2021.
  122. Wolf FA, Angerer P, Theis FJ (February 2018). "SCANPY: large-scale single-cell gene expression data analysis". Genome Biology. 19 (1): 15. doi: 10.1186/s13059-017-1382-0 . PMC   5802054 . PMID   29409532.
  123. "Scanpy – Single-Cell Analysis in Python — Scanpy 1.8.1 documentation". scanpy.readthedocs.io. readthedocs.io. Retrieved 23 September 2021.
  124. Diaz A, Liu SJ, Sandoval C, Pollen A, Nowakowski TJ, Lim DA, Kriegstein A (July 2016). "SCell: integrated analysis of single-cell RNA-seq data". Bioinformatics. 32 (14): 2219–2220. doi:10.1093/bioinformatics/btw201. PMC   4937196 . PMID   27153637.
  125. Butler A, Hoffman P, Smibert P, Papalexi E, Satija R (June 2018). "Integrating single-cell transcriptomic data across different conditions, technologies, and species". Nature Biotechnology. 36 (5): 411–420. doi:10.1038/nbt.4096. PMC   6700744 . PMID   29608179.
  126. Hao Y, Hao S, Andersen-Nissen E, Mauck WM, Zheng S, Butler A, et al. (June 2021). "Integrated analysis of multimodal single-cell data". Cell. 184 (13): 3573–3587.e29. doi:10.1016/j.cell.2021.04.048. PMC   8238499 . PMID   34062119.
  127. Juliá M, Telenti A, Rausell A (October 2015). "Sincell: an R/Bioconductor package for statistical assessment of cell-state hierarchies from single-cell RNA-seq". Bioinformatics. 31 (20): 3380–3382. doi:10.1093/bioinformatics/btv368. PMC   4595899 . PMID   26099264.
  128. Guo M, Wang H, Potter SS, Whitsett JA, Xu Y (November 2015). "SINCERA: A Pipeline for Single-Cell RNA-Seq Profiling Analysis". PLOS Computational Biology. 11 (11): e1004575. Bibcode:2015PLSCB..11E4575G. doi: 10.1371/journal.pcbi.1004575 . PMC   4658017 . PMID   26600239.
  129. Ilicic T, Kim JK, Kolodziejczyk AA, Bagger FO, McCarthy DJ, Marioni JC, Teichmann SA (February 2016). "Classification of low quality cells from single-cell RNA-seq data". Genome Biology. 17 (1): 29. doi: 10.1186/s13059-016-0888-1 . PMC   4758103 . PMID   26887813.
  130. Leng N, Choi J, Chu LF, Thomson JA, Kendziorski C, Stewart R (May 2016). "OEFinder: a user interface to identify and visualize ordering effects in single-cell RNA-seq data". Bioinformatics. 32 (9): 1408–1410. doi:10.1093/bioinformatics/btw004. PMC   4848403 . PMID   26743507.
  131. Jiang P, Thomson JA, Stewart R (August 2016). "Quality control of single-cell RNA-seq by SinQC". Bioinformatics. 32 (16): 2514–2516. doi:10.1093/bioinformatics/btw176. PMC   4978927 . PMID   27153613.
  132. Li H, Brouwer CR, Luo W (April 2022). "A universal deep neural network for in-depth cleaning of single-cell RNA-Seq data". Nature Communications. 13 (1): 1901. Bibcode:2022NatCo..13.1901L. doi:10.1038/s41467-022-29576-y. PMC   8990021 . PMID   35393428.
  133. Vallejos CA, Marioni JC, Richardson S (June 2015). "BASiCS: Bayesian Analysis of Single-Cell Sequencing Data". PLOS Computational Biology. 11 (6): e1004333. Bibcode:2015PLSCB..11E4333V. doi: 10.1371/journal.pcbi.1004333 . PMC   4480965 . PMID   26107944.
  134. Ding B, Zheng L, Zhu Y, Li N, Jia H, Ai R, et al. (July 2015). "Normalization and noise reduction for single cell RNA-seq experiments". Bioinformatics. 31 (13): 2225–2227. doi:10.1093/bioinformatics/btv122. PMC   4481848 . PMID   25717193.
  135. Pierson E, Yau C (November 2015). "ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis". Genome Biology. 16 (241): 241. doi: 10.1186/s13059-015-0805-z . PMC   4630968 . PMID   26527291.
  136. Vu TN, Wills QF, Kalari KR, Niu N, Wang L, Rantalainen M, Pawitan Y (July 2016). "Beta-Poisson model for single-cell RNA-seq data analyses". Bioinformatics. 32 (14): 2128–2135. doi: 10.1093/bioinformatics/btw202 . PMID   27153638.
  137. Finak G, McDavid A, Yajima M, Deng J, Gersuk V, Shalek AK, et al. (December 2015). "MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data". Genome Biology. 16 (1): 278. doi: 10.1186/s13059-015-0844-5 . PMC   4676162 . PMID   26653891.
  138. Kharchenko PV, Silberstein L, Scadden DT (July 2014). "Bayesian approach to single-cell differential expression analysis". Nature Methods. 11 (7): 740–742. doi:10.1038/nmeth.2967. PMC   4112276 . PMID   24836921.
  139. Chang Z, Li G, Liu J, Zhang Y, Ashby C, Liu D, et al. (February 2015). "Bridger: a new framework for de novo transcriptome assembly using RNA-seq data". Genome Biology. 16 (1): 30. doi: 10.1186/s13059-015-0596-2 . PMC   4342890 . PMID   25723335.
  140. Foroushani A, Agrahari R, Docking R, Chang L, Duns G, Hudoba M, et al. (March 2017). "Large-scale gene network analysis reveals the significance of extracellular matrix pathway and homeobox genes in acute myeloid leukemia: an introduction to the Pigengene package and its applications". BMC Medical Genomics. 10 (1): 16. doi: 10.1186/s12920-017-0253-6 . PMC   5353782 . PMID   28298217.
  141. Quek C, Jung CH, Bellingham SA, Lonie A, Hill AF (2015). "iSRAP - a one-touch research tool for rapid profiling of small RNA-seq data". Journal of Extracellular Vesicles. 4: 29454. doi:10.3402/jev.v4.29454. PMC   4641893 . PMID   26561006.
  142. Kuksa PP, Amlie-Wolf A, Katanic Ž, Valladares O, Wang LS, Leung YY (July 2018). "SPAR: small RNA-seq portal for analysis of sequencing experiments". Nucleic Acids Research. 46 (W1): W36–W42. doi:10.1093/nar/gky330. PMC   6030839 . PMID   29733404.
  143. Johnson NR, Yeoh JM, Coruh C, Axtell MJ (July 2016). "Improved Placement of Multi-mapping Small RNAs". G3. 6 (7): 2103–2111. doi:10.1534/g3.116.030452. PMC   4938663 . PMID   27175019.
  144. Schmid-Burgk JL, Hornung V (November 2015). "BrowserGenome.org: web-based RNA-seq data analysis and visualization". Nature Methods. 12 (11): 1001. doi: 10.1038/nmeth.3615 . PMID   26513548. S2CID   205424303.
  145. Milne I, Stephen G, Bayer M, Cock PJ, Pritchard L, Cardle L, et al. (March 2013). "Using Tablet for visual exploration of second-generation sequencing data". Briefings in Bioinformatics. 14 (2): 193–202. doi: 10.1093/bib/bbs012 . PMID   22445902.
  146. Pirayre A, Couprie C, Duval L, Pesquet JC (2017). "BRANE Clust: Cluster-Assisted Gene Regulatory Network Inference Refinement". IEEE/ACM Transactions on Computational Biology and Bioinformatics (Submitted manuscript). 15 (3): 850–860. doi:10.1109/TCBB.2017.2688355. PMID   28368827. S2CID   12866368.
  147. Pirayre A, Couprie C, Bidard F, Duval L, Pesquet JC (November 2015). "BRANE Cut: biologically-related a priori network enhancement with graph cuts for gene regulatory network inference". BMC Bioinformatics. 16: 368. doi: 10.1186/s12859-015-0754-2 . PMC   4634801 . PMID   26537179.
  148. Luo W, Friedman MS, Shedden K, Hankenson KD, Woolf PJ (May 2009). "GAGE: generally applicable gene set enrichment for pathway analysis". BMC Bioinformatics. 10 (161): 161. doi: 10.1186/1471-2105-10-161 . PMC   2696452 . PMID   19473525.
  149. Subhash S, Kanduri C (September 2016). "GeneSCF: a real-time based functional enrichment tool with support for multiple organisms". BMC Bioinformatics. 17 (1): 365. doi: 10.1186/s12859-016-1250-z . PMC   5020511 . PMID   27618934.
  150. Rue-Albrecht K (2014). "Visualise microarray and RNAseq data using gene ontology annotations. R package version 1.4.1". GitHub .
  151. Young MD; Wakefield MJ; Smyth GK; Oshlack A (2010). "Gene ontology analysis for RNA-seq: accounting for selection bias". Genome Biology. 11 (2): R14. doi: 10.1186/gb-2010-11-2-r14 . PMC   2872874 . PMID   20132535.
  152. Xiong Q, Mukherjee S, Furey TS (September 2014). "GSAASeqSP: a toolset for gene set association analysis of RNA-Seq data". Scientific Reports. 4 (6347): 6347. Bibcode:2014NatSR...4E6347X. doi:10.1038/srep06347. PMC   4161965 . PMID   25213199.
  153. Hänzelmann S, Castelo R, Guinney J (January 2013). "GSVA: gene set variation analysis for microarray and RNA-seq data". BMC Bioinformatics. 14 (17): 7. doi: 10.1186/1471-2105-14-7 . PMC   3618321 . PMID   23323831.
  154. Zhou YH (March 2016). "Pathway analysis for RNA-Seq data using a score-based approach". Biometrics. 72 (1): 165–174. doi:10.1111/biom.12372. PMC   4992401 . PMID   26259845.
  155. Ihnatova I, Budinska E (October 2015). "ToPASeq: an R package for topology-based pathway analysis of microarray and RNA-Seq data". BMC Bioinformatics. 16 (350): 350. doi: 10.1186/s12859-015-0763-1 . PMC   4625615 . PMID   26514335.
  156. Van Bel M, Proost S, Van Neste C, Deforce D, Van de Peer Y, Vandepoele K (December 2013). "TRAPID: an efficient online tool for the functional and comparative analysis of de novo RNA-Seq transcriptomes". Genome Biology. 14 (12): R134. doi: 10.1186/gb-2013-14-12-r134 . PMC   4053847 . PMID   24330842.
  157. Bucchini F, Del Cortona A, Kreft Ł, Botzki A, Van Bel M, Vandepoele K (September 2021). "TRAPID 2.0: a web application for taxonomic and functional analysis of de novo transcriptomes". Nucleic Acids Research. 49 (17): e101. doi:10.1093/nar/gkab565. PMC   8464036 . PMID   34197621.
  158. de Jong A, van der Meulen S, Kuipers OP, Kok J (September 2015). "T-REx: Transcriptome analysis webserver for RNA-seq Expression data". BMC Genomics. 16 (663): 663. doi: 10.1186/s12864-015-1834-4 . PMC   4558784 . PMID   26335208.
  159. Lan D, Llamas B (14 September 2022). "Genozip 14 - advances in compression of BAM and CRAM files". bioRxiv. doi:10.1101/2022.09.12.507582. S2CID   252357508.
  160. Zhang Y, Chen K, Sloan SA, Bennett ML, Scholze AR, O'Keeffe S, et al. (September 2014). "An RNA-sequencing transcriptome and splicing database of glia, neurons, and vascular cells of the cerebral cortex". The Journal of Neuroscience. 34 (36): 11929–11947. doi:10.1523/JNEUROSCI.1860-14.2014. PMC   4152602 . PMID   25186741.
  161. Wang Y, Wu N, Liu J, Wu Z, Dong D (July 2015). "FusionCancer: a database of cancer fusion genes derived from RNA-seq data". Diagnostic Pathology. 10 (131): 131. doi: 10.1186/s13000-015-0310-4 . PMC   4517624 . PMID   26215638.
  162. Franzén O, Gan LM, Björkegren JL (January 2019). "PanglaoDB: a web server for exploration of mouse and human single-cell RNA sequencing data". Database. 2019. doi:10.1093/database/baz046. PMC   6450036 . PMID   30951143.