List of sequence alignment software

Last updated

This list of sequence alignment software is a compilation of software tools and web portals used in pairwise sequence alignment and multiple sequence alignment. See structural alignment software for structural alignment of proteins.

Contents

Database search only

NameDescriptionSequence type*AuthorsYear
BLAST Local search with fast k-tuple heuristic (Basic Local Alignment Search Tool)Both Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ [1] 1990
HPC-BLAST NCBI compliant multinode and multicore BLAST wrapper. Distributed with the latest version of BLAST, this wrapper facilitates parallelization of the algorithm on modern hybrid architectures with many nodes and many cores within each node. [2] Protein Burdyshaw CE, Sawyer S, Horton MD, Brook RG, Rekapalli B 2017
CS-BLAST Sequence-context specific BLAST, more sensitive than BLAST, FASTA, and SSEARCH. Position-specific iterative version CSI-BLAST more sensitive than PSI-BLASTProteinAngermueller C, Biegert A, Soeding J [3] 2013
CUDASW++GPU accelerated Smith Waterman algorithm for multiple shared-host GPUsProteinLiu Y, Maskell DL and Schmidt B2009/2010
DIAMONDBLASTX and BLASTP aligner based on double indexingProteinBuchfink B, Xie C, Huson DH, Reuter K, Drost HG [4] [5] 2015/2021
FASTA Local search with fast k-tuple heuristic, slower but more sensitive than BLASTBoth
GGSEARCH, GLSEARCHGlobal:Global (GG), Global:Local (GL) alignment with statisticsProtein
Genome MagicianSoftware for ultra fast local DNA sequence motif search and pairwise alignment for NGS data (FASTA, FASTQ).DNAHepperle D (www.sequentix.de)2020
GenoogleGenoogle uses indexing and parallel processing techniques for searching DNA and Proteins sequences. It is developed in Java and open source.BothAlbrecht F2015
HMMER Local and global search with profile Hidden Markov models, more sensitive than PSI-BLASTBoth Durbin R, Eddy SR, Krogh A, Mitchison G [6] 1998
HH-suite Pairwise comparison of profile Hidden Markov models; very sensitiveProteinSöding J [7] [8] 2005/2012
IDFInverse Document FrequencyBoth
InfernalProfile SCFG searchRNA Eddy S
KLASTHigh-performance general purpose sequence similarity search toolBoth2009/2014
LAMBDAHigh performance local aligner compatible to BLAST, but much faster; supports SAM/BAMProteinHannes Hauswedell, Jochen Singer, Knut Reinert [9] 2014
MMseqs2Software suite to search and cluster huge sequence sets. Similar sensitivity to BLAST and PSI-BLAST but orders of magnitude fasterProteinSteinegger M, Mirdita M, Galiez C, Söding J [10] 2017
USEARCHUltra-fast sequence analysis toolBothEdgar, R. C. (2010). "Search and clustering orders of magnitude faster than BLAST". Bioinformatics. 26 (19): 2460–2461. doi: 10.1093/bioinformatics/btq461 . PMID   20709691. publication2010
OSWALDOpenCL Smith-Waterman on Altera's FPGA for Large Protein DatabasesProteinRucci E, García C, Botella G, De Giusti A, Naiouf M, Prieto-Matías M [11] 2016
parasailFast Smith-Waterman search using SIMD parallelizationBothDaily J2015
PSI-BLAST Position-specific iterative BLAST, local search with position-specific scoring matrices, much more sensitive than BLASTProtein Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ [12] 1997
PSI-SearchCombining the Smith-Waterman search algorithm with the PSI-BLAST profile construction strategy to find distantly related protein sequences, and preventing homologous over-extension errors.ProteinLi W, McWilliam H, Goujon M, Cowley A, Lopez R, Pearson WR [13] 2012
R&RRetrieve and Relate (R&R) is a high performance yet sensitive multi-database search engine, capable of searching in parallel through DNA,RNA and Protein sequences.Both2019
ScalaBLASTHighly parallel Scalable BLASTBothOehmen et al. [14] 2011
SequilabLinking and profiling sequence alignment data from NCBI-BLAST results with major sequence analysis servers/servicesNucleotide, peptide2010
SAMLocal and global search with profile Hidden Markov models, more sensitive than PSI-BLASTBoth Karplus K, Krogh A [15] 1999
SSEARCHSmith-Waterman search, slower but more sensitive than FASTABoth
SWAPHI First parallelized algorithm employing the emerging Intel Xeon Phis to accelerate Smith-Waterman protein database searchProteinLiu Y and Schmidt B2014
SWAPHI-LS First parallel Smith-Waterman algorithm exploiting Intel Xeon Phi clusters to accelerate the alignment of long DNA sequencesDNALiu Y, Tran TT, Lauenroth F, Schmidt B2014
SWIMMSmith-Waterman implementation for Intel Multicore and Manycore architecturesProteinRucci E, García C, Botella G, De Giusti A, Naiouf M and Prieto-Matías M [16] 2015
SWIMM2.0Enhanced Smith-Waterman on Intel's Multicore and Manycore architectures based on AVX-512 vector extensionsProteinRucci E, García C, Botella G, De Giusti A, Naiouf M and Prieto-Matías M [17] 2018
SWIPEFast Smith-Waterman search using SIMD parallelizationBothRognes T2011

*Sequence type: protein or nucleotide

Pairwise alignment

NameDescriptionSequence type*Alignment type**AuthorYear
ACANAFast heuristic anchor based pairwise alignmentBothBothHuang, Umbach, Li2005
AlignMeAlignments for membrane protein sequencesProteinBothM. Stamm, K. Khafizov, R. Staritzbichler, L.R. Forrest2013
ALLALIGNFor DNA, RNA and protein molecules up to 32MB, aligns all sequences of size K or greater. Similar alignments are grouped together for analysis. Automatic repetitive sequence filter.BothLocalE. Wachtel2017
Bioconductor Biostrings::pairwiseAlignmentDynamic programmingBothBoth + Ends-freeP. Aboyoun2008
BioPerl dpAlignDynamic programmingBothBoth + Ends-freeY. M. Chan2003
BLASTZ, LASTZSeeded pattern-matchingNucleotideLocalSchwartz et al. [18] [19] 2004,2009
CUDAlignDNA sequence alignment of unrestricted size in single or multiple GPUsNucleotideLocal, SemiGlobal, GlobalE. Sandes [20] [21] [22] 2011-2015
DNADotWeb-based dot-plot toolNucleotideGlobalR. Bowen1998
DOTLETJava-based dot-plot toolBothGlobalM. Pagni and T. Junier1998
FEASTPosterior based local extension with descriptive evolution modelNucleotideLocalA. K. Hudek and D. G. Brown2010
Genome Compiler Genome CompilerAlign chromatogram files (.ab1, .scf) against a template sequence, locate errors, and correct them instantly.NucleotideLocalGenome Compiler Corporation2014
G-PASGPU-based dynamic programming with backtrackingBothLocal, SemiGlobal, GlobalW. Frohmberg, M. Kierzynka et al.2011
GapMisDoes pairwise sequence alignment with one gapBothSemiGlobalK. Frousios, T. Flouri, C. S. Iliopoulos, K. Park, S. P. Pissis, G. Tischler2012
Genome MagicianSoftware for ultra fast local DNA sequence motif search and pairwise alignment for NGS data (FASTA, FASTQ).DNALocal, SemiGlobal, GlobalHepperle D (www.sequentix.de)2020
GGSEARCH, GLSEARCHGlobal:Global (GG), Global:Local (GL) alignment with statisticsProteinGlobal in queryW. Pearson2007
JAligner Java open-source implementation of Smith-WatermanBothLocalA. Moustafa2005
K*SyncProtein sequence to structure alignment that includes secondary structure, structural conservation, structure-derived sequence profiles, and consensus alignment scoresProteinBothD. Chivian & D. Baker [23] 2003
LALIGNMultiple, non-overlapping, local similarity (same algorithm as SIM)BothLocal non-overlappingW. Pearson1991 (algorithm)
NW-alignStandard Needleman-Wunsch dynamic programming algorithmProteinGlobalY Zhang2012
mAlignmodelling alignment; models the information content of the sequencesNucleotideBothD. Powell, L. Allison and T. I. Dix2004
matcherWaterman-Eggert local alignment (based on LALIGN)BothLocalI. Longden (modified from W. Pearson)1999
MCALIGN2explicit models of indel evolutionDNAGlobalJ. Wang et al.2006
MegAlign Pro (Lasergene Molecular Biology)Software to align DNA, RNA, protein, or DNA + protein sequences via pairwise and multiple sequence alignment algorithms including MUSCLE, Mauve, MAFFT, Clustal Omega, Jotun Hein, Wilbur-Lipman, Martinez Needleman-Wunsch, Lipman-Pearson and Dotplot analysis.BothBoth DNASTAR 1993-2016
MUMmer suffix tree basedNucleotideGlobalS. Kurtz et al.2004
needle Needleman-Wunsch dynamic programmingBothSemiGlobalA. Bleasby1999
Ngilalogarithmic and affine gap costs and explicit models of indel evolutionBothGlobalR. Cartwright2007
NW Needleman-Wunsch dynamic programmingBothGlobalA.C.R. Martin1990-2015
parasailC/C++/Python/Java SIMD dynamic programming library for SSE, AVX2BothGlobal, Ends-free, LocalJ. Daily2015
Path Smith-Waterman on protein back-translation graph (detects frameshifts at protein level)ProteinLocalM. Gîrdea et al. [24] 2009
PatternHunter Seeded pattern-matchingNucleotideLocalB. Ma et al. [25] [26] 2002–2004
ProbA (also propA)Stochastic partition function sampling via dynamic programming BothGlobalU. Mückstein2002
PyMOL"align" command aligns sequence & applies it to structureProteinGlobal (by selection)W. L. DeLano2007
REPuter suffix tree basedNucleotideLocalS. Kurtz et al.2001
SABERTOOTHAlignment using predicted Connectivity ProfilesProteinGlobalF. Teichert, J. Minning, U. Bastolla, and M. Porto2009
SatsumaParallel whole-genome synteny alignmentsDNALocalM.G. Grabherr et al.2010
SEQALNVarious dynamic programmingBothLocal or globalM.S. Waterman and P. Hardy1996
SIM, GAP, NAP, LAPLocal similarity with varying gap treatmentsBothLocal or globalX. Huang and W. Miller1990-6
SIMLocal similarityBothLocalX. Huang and W. Miller1991
SPA: Super pairwise alignmentFast pairwise global alignmentNucleotideGlobalShen, Yang, Yao, Hwang2002
SSEARCHLocal (Smith-Waterman) alignment with statisticsProteinLocalW. Pearson1981 (Algorithm)
Sequences StudioJava applet demonstrating various algorithms from [27] Generic sequenceLocal and globalA.Meskauskas1997 (reference book)
SWIFOLDSmith-Waterman Acceleration on Intel's FPGA with OpenCL for Long DNA SequencesNucleotideLocalE. Rucci [28] [29] 2017-2018
SWIFT suitFast Local Alignment SearchingDNALocalK. Rasmussen, [30] W. Gerlach2005,2008
stretcherMemory-optimized Needleman-Wunsch dynamic programmingBothGlobalI. Longden (modified from G. Myers and W. Miller)1999
tranalignAligns nucleic acid sequences given a protein alignmentNucleotideNAG. Williams (modified from B. Pearson)2002
UGENEOpensource Smith-Waterman for SSE/CUDA, Suffix array based repeats finder & dotplotBothBothUniPro2010
waterSmith-Waterman dynamic programmingBothLocalA. Bleasby1999
wordmatchk-tuple pairwise matchBothNAI. Longden1998
YASS Seeded pattern-matchingNucleotideLocalL. Noe and G. Kucherov [31] 2004

*Sequence type: protein or nucleotide **Alignment type: local or global

Multiple sequence alignment

NameDescriptionSequence type*Alignment type**AuthorYearLicense
ABAA-Bruijn alignmentProteinGlobalB.Raphael et al.2004 Proprietary, freeware for education, research, nonprofit
ALEmanual alignment ; some software assistanceNucleotidesLocalJ. Blandy and K. Fogel1994 (latest version 2007)Free, GPL2
ALLALIGNFor DNA, RNA and protein molecules up to 32MB, aligns all sequences of size K or greater, MSA or within a single molecule. Similar alignments are grouped together for analysis. Automatic repetitive sequence filter.BothLocalE. Wachtel2017Free
AMAP Sequence annealingBothGlobalA. Schwartz and L. Pachter 2006
anon.fast, optimal alignment of three sequences using linear gap costsNucleotidesGlobalD. Powell, L. Allison and T. I. Dix2000
BAli-Phy Tree+multi-alignment; probabilistic-Bayesian; joint estimationBoth + CodonsGlobalBD Redelings and MA Suchard2005 (latest version 2018)Free, GPL
Base-By-BaseJava-based multiple sequence alignment editor with integrated analysis toolsBothLocal or globalR. Brodie et al.2004 Proprietary, freeware, must register
CHAOS, DIALIGNIterative alignmentBothLocal (preferred)M. Brudno and B. Morgenstern2003
ClustalWProgressive alignmentBothLocal or globalThompson et al.1994Free, LGPL
CodonCode Aligner Multi-alignment; ClustalW & Phrap supportNucleotidesLocal or globalP. Richterich et al.2003 (latest version 2009)
Compass COmparison of Multiple Protein sequence Alignments with assessment of Statistical SignificanceProteinGlobalR.I. Sadreyev, et al.2009
DECIPHER Progressive-iterative alignmentBothGlobalErik S. Wright2014Free, GPL
DIALIGN-TX and DIALIGN-TSegment-based methodBothLocal (preferred) or GlobalA.R.Subramanian2005 (latest version 2008)
DNA AlignmentSegment-based method for intraspecific alignmentsBothLocal (preferred) or GlobalA.Roehl2005 (latest version 2008)
DNA Baser Sequence AssemblerMulti-alignment; Full automatic sequence alignment; Automatic ambiguity correction; Internal base caller; Command line seq alignmentNucleotidesLocal or globalHeracle BioSoft SRL2006 (latest version 2018)Commercial (some modules are freeware)
DNADynamo linked DNA to Protein multiple alignment with MUSCLE, Clustal and Smith-WatermanBothLocal or globalDNADynamo2004 (newest version 2017)
EDNAEnergy Based Multiple Sequence Alignment for DNA Binding SitesNucleotidesLocal or globalSalama, RA. et al.2013
FAMSAProgressive alignment for extremely large protein families (hundreds of thousands of members)ProteinGlobalDeorowicz et al.2016Free, GPL 3
FSA Sequence annealingBothGlobalR. K. Bradley et al.2008
Geneious Progressive-Iterative alignment; ClustalW pluginBothLocal or globalA.J. Drummond et al.2005 (latest version 2017)
GUIDANCEQuality control and filtering of multiple sequence alignmentsBothLocal or globalO. Penn et al.2010 (latest version 2015)
KalignProgressive alignmentBothGlobalT. Lassmann2005
MACSEProgressive-iterative alignment. Multiple alignment of coding sequences accounting for frameshifts and stop codons.NucleotidesGlobalV. Ranwez et al.2011 (latest version, v2.07 2023)
MAFFT Progressive-iterative alignmentBothLocal or globalK. Katoh et al.2005Free, BSD
MARNAMulti-alignment of RNAsRNALocalS. Siebert et al.2005
MAVID Progressive alignmentBothGlobalN. Bray and L. Pachter 2004
MegAlign Pro (Lasergene Molecular Biology)Software to align DNA, RNA, protein, or DNA + protein sequences via pairwise and multiple sequence alignment algorithms including MUSCLE, Mauve, MAFFT, Clustal Omega, Jotun Hein, Wilbur-Lipman, Martinez Needleman-Wunsch, Lipman-Pearson and Dotplot analysis.BothLocal or global DNASTAR 1993-2023
MSADynamic programmingBothLocal or globalD.J. Lipman et al.1989 (modified 1995)
MSAProbsDynamic programmingProteinGlobalY. Liu, B. Schmidt, D. Maskell2010
MULTALINDynamic programming-clusteringBothLocal or globalF. Corpet1988
Multi-LAGANProgressive dynamic programming alignmentBothGlobalM. Brudno et al.2003
MUSCLE Progressive-iterative alignmentBothLocal or globalR. Edgar2004
OpalProgressive-iterative alignmentBothLocal or globalT. Wheeler and J. Kececioglu2007 (latest stable 2013, latest beta 2016)
PecanProbabilistic-consistencyDNAGlobalB. Paten et al.2008
Phylo A human computing framework for comparative genomics to solve multiple alignment NucleotidesLocal or globalMcGill Bioinformatics2010
PMFastRProgressive structure aware alignmentRNAGlobalD. DeBlasio, J Braund, S Zhang2009
PralineProgressive-iterative-consistency-homology-extended alignment with preprofiling and secondary structure predictionProteinGlobalJ. Heringa1999 (latest version 2009)
PicXAANonprogressive, maximum expected accuracy alignmentBothGlobalS.M.E. Sahraeian and B.J. Yoon2010
POAPartial order/hidden Markov modelProteinLocal or globalC. Lee2002
Probalign Probabilistic/consistency with partition function probabilitiesProteinGlobalRoshan and Livesay2006Free, public domain
ProbCons Probabilistic/consistencyProteinLocal or globalC. Do et al.2005Free, public domain
PROMALS3DProgressive alignment/hidden Markov model/Secondary structure/3D structureProteinGlobalJ. Pei et al.2008
PRRN/PRRPIterative alignment (especially refinement)ProteinLocal or globalY. Totoki (based on O. Gotoh)1991 and later
PSAlignAlignment preserving non-heuristicBothLocal or globalS.H. Sze, Y. Lu, Q. Yang.2006
RevTransCombines DNA and Protein alignment, by back translating the protein alignment to DNA.DNA/Protein (special)Local or globalWernersson and Pedersen2003 (newest version 2005)
SAGASequence alignment by genetic algorithmProteinLocal or globalC. Notredame et al.1996 (new version 1998)
SAMHidden Markov modelProteinLocal or globalA. Krogh et al.1994 (most recent version 2002)
Se-AlManual alignmentBothLocalA. Rambaut2002
StatAlignBayesian co-estimation of alignment and phylogeny (MCMC)BothGlobalA. Novak et al.2008
Stemloc Multiple alignment and secondary structure predictionRNALocal or globalI. Holmes2005Free, GPL 3 (parte de DART)
T-Coffee More sensitive progressive alignmentBothLocal or globalC. Notredame et al.2000 (newest version 2008)Free, GPL 2
UGENE Supports multiple alignment with MUSCLE, KAlign, Clustal and MAFFT pluginsBothLocal or globalUGENE team2010 (newest version 2020)Free, GPL 2
VectorFriendsVectorFriends Aligner, MUSCLE plugin, and ClustalW pluginBothLocal or globalBioFriends team2013 Proprietary, freeware for academic use
GLProbsAdaptive pair-Hidden Markov Model based approachProteinGlobalY. Ye et al.2013

*Sequence type: protein or nucleotide. **Alignment type: local or global

Genomics analysis

NameDescriptionSequence type*
EAGLE [32] An ultra-fast tool to find relative absent words in genomic dataNucleotide
ACT (Artemis Comparison Tool)Synteny and comparative genomicsNucleotide
AVIDPairwise global alignment with whole genomesNucleotide
BLATAlignment of cDNA sequences to a genome.Nucleotide
DECIPHER Alignment of rearranged genomes using 6 frame translationNucleotide
FLAKFuzzy whole genome alignment and analysisNucleotide
GMAPAlignment of cDNA sequences to a genome. Identifies splice site junctions with high accuracy.Nucleotide
SplignAlignment of cDNA sequences to a genome. Identifies splice site junctions with high accuracy. Able to recognize and separate gene duplications.Nucleotide
MauveMultiple alignment of rearranged genomesNucleotide
MGAMultiple Genome AlignerNucleotide
MulanLocal multiple alignments of genome-length sequencesNucleotide
MultizMultiple alignment of genomesNucleotide
PLAST-ncRNASearch for ncRNAs in genomes by partition function local alignmentNucleotide
Sequerome Profiling sequence alignment data with major servers/servicesNucleotide, peptide
SequilabProfiling sequence alignment data from NCBI-BLAST results with major servers-servicesNucleotide, peptide
Shuffle-LAGANPairwise global alignment of completed genome regionsNucleotide
SIBsim4, Sim4 A program designed to align an expressed DNA sequence with a genomic sequence, allowing for intronsNucleotide
SLAMGene finding, alignment, annotation (human-mouse homology identification)Nucleotide
SRPRISMAn efficient aligner for assemblies with explicit guarantees, aligning reads without splicesNucleotide

*Sequence type: protein or nucleotide


Motif finding

NameDescriptionSequence type*
PMSMotif search and discoveryBoth
FMMMotif search and discovery (can get also positive & negative sequences as input for enriched motif search)Nucleotide
BLOCKSUngapped motif identification from BLOCKS databaseBoth
eMOTIFExtraction and identification of shorter motifsBoth
Gibbs motif samplerStochastic motif extraction by statistical likelihoodBoth
HMMTOPPrediction of transmembrane helices and topology of proteinsProtein
I-sitesLocal structure motif libraryProtein
JCoilsPrediction of Coiled coil and Leucine Zipper Protein
MEME/MASTMotif discovery and searchBoth
CUDA-MEMEGPU accelerated MEME (v4.4.0) algorithm for GPU clustersBoth
MERCIDiscriminative motif discovery and searchBoth
PHI-BlastMotif search and alignment toolBoth
Phyloscan Motif search toolNucleotide
PRATTPattern generation for use with ScanPrositeProtein
ScanPrositeMotif database search toolProtein
TEIRESIASMotif extraction and database searchBoth
BASALTMultiple motif and regular expression searchBoth

*Sequence type: protein or nucleotide


Benchmarking

NameAuthors
PFAM 30.0 (2016)
SMART (2015)Letunic, Copley, Schmidt, Ciccarelli, Doerks, Schultz, Ponting, Bork
BAliBASE 3 (2015)Thompson, Plewniak, Poch
Oxbench (2011)Raghava, Searle, Audley, Barber, Barton
Benchmark collection (2009)Edgar
HOMSTRAD (2005)Mizuguchi
PREFAB 4.0 (2005)Edgar
SABmark (2004)Van Walle, Lasters, Wyns

Alignment viewers, editors

Please see List of alignment visualization software.

Short-read sequence alignment

NameDescriptionpaired-end optionUse FASTQ qualityGappedMulti-threadedLicenseReferenceYear
AriocComputes Smith-Waterman gapped alignments and mapping qualities on one or more GPUs. Supports BS-seq alignments. Processes 100,000 to 500,000 reads per second (varies with data, hardware, and configured sensitivity).YesNoYesYesFree, BSD [33] 2015
BarraCUDAA GPGPU accelerated Burrows–Wheeler transform (FM-index) short read alignment program based on BWA, supports alignment of indels with gap openings and extensions.YesNoYesYes, POSIX Threads and CUDA Free, GPL
BBMapUses a short kmers to rapidly index genome; no size or scaffold count limit. Higher sensitivity and specificity than Burrows–Wheeler aligners, with similar or greater speed. Performs affine-transform-optimized global alignment, which is slower but more accurate than Smith-Waterman. Handles Illumina, 454, PacBio, Sanger, and Ion Torrent data. Splice-aware; capable of processing long indels and RNA-seq. Pure Java; runs on any platform. Used by the Joint Genome Institute.YesYesYesYesFree, BSD 2010
BFAST Explicit time and accuracy tradeoff with a prior accuracy estimation, supported by indexing the reference sequences. Optimally compresses indexes. Can handle billions of short reads. Can handle insertions, deletions, SNPs, and color errors (can map ABI SOLiD color space reads). Performs a full Smith Waterman alignment.Yes, POSIX Threads Free, GPL [34] 2009
BigBWARuns the Burrows–Wheeler Aligner-BWA on a Hadoop cluster. It supports the algorithms BWA-MEM, BWA-ALN, and BWA-SW, working with paired and single reads. It implies an important reduction in the computational time when running in a Hadoop cluster, adding scalability and fault-tolerance.YesLow quality bases trimmingYesYesFree, GPL 3 [35] 2015
BLASTNBLAST's nucleotide alignment program, slow and not accurate for short reads, and uses a sequence database (EST, Sanger sequence) rather than a reference genome.
BLAT Made by Jim Kent. Can handle one mismatch in initial alignment step.Yes, client-server Proprietary, freeware for academic and noncommercial use [36] 2002
Bowtie Uses a Burrows–Wheeler transform to create a permanent, reusable index of the genome; 1.3 GB memory footprint for human genome. Aligns more than 25 million Illumina reads in 1 CPU hour. Supports Maq-like and SOAP-like alignment policiesYesYesNoYes, POSIX Threads Free, Artistic [37] 2009
BWAUses a Burrows–Wheeler transform to create an index of the genome. It's a bit slower than Bowtie but allows indels in alignment.YesLow quality bases trimmingYesYesFree, GPL [38] 2009
BWA-PSSMA probabilistic short read aligner based on the use of position specific scoring matrices (PSSM). The aligner is adaptable in the sense that it can take into account the quality scores of the reads and models of data specific biases, such as those observed in Ancient DNA, PAR-CLIP data or genomes with biased nucleotide compositions. [39] YesYesYesYesFree, GPL [39] 2014
CASHXQuantify and manage large quantities of short-read sequence data. CASHX pipeline contains a set of tools that can be used together, or separately as modules. This algorithm is very accurate for perfect hits to a reference genome.No Proprietary, freeware for academic and noncommercial use
CloudburstShort-read mapping using Hadoop MapReduceYes, Hadoop MapReduce Free, Artistic
CUDA-ECShort-read alignment error correction using GPUs.Yes, GPU enabled
CUSHAWA CUDA compatible short read aligner to large genomes based on Burrows–Wheeler transformYesYesNoYes (GPU enabled)Free, GPL [40] 2012
CUSHAW2Gapped short-read and long-read alignment based on maximal exact match seeds. This aligner supports both base-space (e.g. from Illumina, 454, Ion Torrent and PacBio sequencers) and ABI SOLiD color-space read alignments.YesNoYesYesFree, GPL 2014
CUSHAW2-GPUGPU-accelerated CUSHAW2 short-read aligner.YesNoYesYesFree, GPL
CUSHAW3Sensitive and accurate base-space and color-space short-read alignment with hybrid seedingYesNoYesYesFree, GPL [41] 2012
drFASTRead mapping alignment software that implements cache obliviousness to minimize main/cache memory transfers like mrFAST and mrsFAST, however designed for the SOLiD sequencing platform (color space reads). It also returns all possible map locations for improved structural variation discovery.YesYes, for structural variationYesNoFree, BSD
ELANDImplemented by Illumina. Includes ungapped alignment with a finite read length.
ERNEExtended Randomized Numerical alignEr for accurate alignment of NGS reads. It can map bisulfite-treated reads.YesLow quality bases trimmingYesMultithreading and MPI-enabledFree, GPL 3
GASSSTFinds global alignments of short DNA sequences against large DNA banksMultithreading CeCILL version 2 License. [42] 2011
GEM High-quality alignment engine (exhaustive mapping with substitutions and indels). More accurate and several times faster than BWA or Bowtie 1/2. Many standalone biological applications (mapper, split mapper, mappability, and other) provided.YesYesYesYesFree, GPL3 [43] 2012
Genalice MAPUltra fast and comprehensive NGS read aligner with high precision and small storage footprint.YesLow quality bases trimmingYesYes Proprietary, commercial
Geneious AssemblerFast, accurate overlap assembler with the ability to handle any combination of sequencing technology, read length, any pairing orientations, with any spacer size for the pairing, with or without a reference genome.Yes Proprietary, commercial
GensearchNGSComplete framework with user-friendly GUI to analyse NGS data. It integrates a proprietary high quality alignment algorithm and plug-in ability to integrate various public aligner into a framework allowing to import short reads, align them, detect variants, and generate reports. It is made for resequencing projects, namely in a diagnostic setting.YesNoYesYes Proprietary, commercial
GMAP and GSNAPRobust, fast short-read alignment. GMAP: longer reads, with multiple indels and splices (see entry above under Genomics analysis); GSNAP: shorter reads, with one indel or up to two splices per read. Useful for digital gene expression, SNP and indel genotyping. Developed by Thomas Wu at Genentech. Used by the National Center for Genome Resources (NCGR) in Alpheus.YesYesYesYes Proprietary, freeware for academic and noncommercial use
GNUMAPAccurately performs gapped alignment of sequence data obtained from next-generation sequencing machines (specifically of Solexa-Illumina) back to a genome of any size. Includes adaptor trimming, SNP calling and Bisulfite sequence analysis.Yes, also supports Illumina *_int.txt and *_prb.txt files with all 4 quality scores for each baseMultithreading and MPI-enabled [44] 2009
HIVE-hexagonUses a hash table and bloom matrix to create and filter potential positions on the genome. For higher efficiency uses cross-similarity between short reads and avoids realigning non unique redundant sequences. It is faster than Bowtie and BWA and allows indels and divergent sensitive alignments on viruses, bacteria, and more conservative eukaryotic alignments.YesYesYesYes Proprietary, freeware for academic and noncommercial users registered to HIVE deployment instance [45] 2014
IMOSImproved Meta-aligner and Minimap2 On Spark. A long read distributed aligner on Apache Spark platform with linear scalability w.r.t. single node execution.YesYesYesFree
IsaacFully uses all the computing power available on one server node; thus, it scales well over a broad range of hardware architectures, and alignment performance improves with hardware abilitiesYesYesYesYesFree, GPL
LASTUses adaptative seeds and copes more efficiently with repeat-rich sequences (e.g. genomes). For example: it can align reads to genomes without repeat-masking, without becoming overwhelmed by repetitive hits.YesYesYesYesFree, GPL [46] 2011
MAQUngapped alignment that takes into account quality scores for each base.Free, GPL
mrFAST, mrsFASTGapped (mrFAST) and ungapped (mrsFAST) alignment software that implements cache obliviousness to minimize main/cache memory transfers. They are designed for the Illumina sequencing platform and they can return all possible map locations for improved structural variation discovery.YesYes, for structural variationYesNoFree, BSD
MOMMOM or maximum oligonucleotide mapping is a query matching tool that captures a maximal length match within the short read.Yes
MOSAIK Fast gapped aligner and reference-guided assembler. Aligns reads using a banded Smith-Waterman algorithm seeded by results from a k-mer hashing scheme. Supports reads ranging in size from very short to very long.Yes
MPscanFast aligner based on a filtration strategy (no indexing, use q-grams and Backward Nondeterministic DAWG Matching) [47] 2009
Novoalign & NovoalignCSGapped alignment of single end and paired end Illumina GA I & II, ABI Colour space & ION Torrent reads. High sensitivity and specificity, using base qualities at all steps in the alignment. Includes adapter trimming, base quality calibration, Bi-Seq alignment, and options for reporting multiple alignments per read. Use of ambiguous IUPAC codes in reference for common SNPs can improve SNP recall and remove allelic bias.YesYesYesMulti-threading and MPI versions available with paid license Proprietary, freeware single threaded version for academic and noncommercial use
NextGENeDeveloped for use by biologists performing analysis of next generation sequencing data from Roche Genome Sequencer FLX, Illumina GA/HiSeq, Life Technologies Applied BioSystems’ SOLiD System, PacBio and Ion Torrent platforms.YesYesYesYes Proprietary, commercial
NextGenMapFlexible and fast read mapping program (twice as fast as BWA), achieves a mapping sensitivity comparable to Stampy. Internally uses a memory efficient index structure (hash table) to store positions of all 13-mers present in the reference genome. Mapping regions where pairwise alignments are required are dynamically determined for each read. Uses fast SIMD instructions (SSE) to accelerate alignment calculations on CPU. If available, alignments are computed on GPU (using OpenCL/CUDA) further reducing runtime 20-50%.YesNoYesYes, POSIX Threads, OpenCL/CUDA, SSEFree [48] 2013
Omixon Variant ToolkitIncludes highly sensitive and highly accurate tools for detecting SNPs and indels. It offers a solution to map NGS short reads with a moderate distance (up to 30% sequence divergence) from reference genomes. It poses no restrictions on the size of the reference, which, combined with its high sensitivity, makes the Variant Toolkit well-suited for targeted sequencing projects and diagnostics.YesYesYesYes Proprietary, commercial
PALMapperEfficiently computes both spliced and unspliced alignments at high accuracy. Relying on a machine learning strategy combined with a fast mapping based on a banded Smith-Waterman-like algorithm, it aligns around 7 million reads per hour on one CPU. It refines the originally proposed QPALMA approach.YesFree, GPL
Partek FlowFor use by biologists and bioinformaticians. It supports ungapped, gapped and splice-junction alignment from single and paired-end reads from Illumina, Life technologies Solid TM, Roche 454 and Ion Torrent raw data (with or without quality information). It integrates powerful quality control on FASTQ/Qual level and on aligned data. Additional functionality include trimming and filtering of raw reads, SNP and InDel detection, mRNA and microRNA quantification and fusion gene detection.YesYesYesMultiprocessor-core, client-server installation possible Proprietary, commercial, free trial version
PASSIndexes the genome, then extends seeds using pre-computed alignments of words. Works with base space, color space (SOLID), and can align genomic and spliced RNA-seq reads.YesYesYesYes Proprietary, freeware for academic and noncommercial use
PerMIndexes the genome with periodic seeds to quickly find alignments with full sensitivity up to four mismatches. It can map Illumina and SOLiD reads. Unlike most mapping programs, speed increases for longer read lengths.YesFree, GPL [49]
PRIMEXIndexes the genome with a k-mer lookup table with full sensitivity up to an adjustable number of mismatches. It is best for mapping 15-60 bp sequences to a genome.NoNoYesNo, multiple processes per search 2003
QPalmaCan use quality scores, intron lengths, and computation splice site predictions to perform and performs an unbiased alignment. Can be trained to the specifics of a RNA-seq experiment and genome. Useful for splice site/intron discovery and for gene model building. (See PALMapper for a faster version).Yes, client-serverFree, GPL 2
RazerSNo read length limit. Hamming or edit distance mapping with configurable error rates. Configurable and predictable sensitivity (runtime/sensitivity tradeoff). Supports paired-end read mapping.Free, LGPL
REAL, cREALREAL is an efficient, accurate, and sensitive tool for aligning short reads obtained from next-generation sequencing. The programme can handle an enormous amount of single-end reads generated by the next-generation Illumina/Solexa Genome Analyzer. cREAL is a simple extension of REAL for aligning short reads obtained from next-generation sequencing to a genome with circular structure.YesYesFree, GPL
RMAPCan map reads with or without error probability information (quality scores) and supports paired-end reads or bisulfite-treated read mapping. There are no limitations on read length or number of mismatches.YesYesYesFree, GPL 3
rNAA randomized Numerical Aligner for Accurate alignment of NGS readsYesLow quality bases trimmingYesMultithreading and MPI-enabledFree, GPL 3
RTG InvestigatorExtremely fast, tolerant to high indel and substitution counts. Includes full read alignment. Product includes comprehensive pipelines for variant detection and metagenomic analysis with any combination of Illumina, Complete Genomics and Roche 454 data.YesYes, for variant callingYesYes Proprietary, freeware for individual investigator use
SegemehlCan handle insertions, deletions, mismatches; uses enhanced suffix arraysYesNoYesYes Proprietary, freeware for noncommercial use [50] 2009
SeqMapUp to 5 mixed substitutions and insertions-deletions; various tuning options and input-output formats Proprietary, freeware for academic and noncommercial use
ShrecShort read error correction with a suffix tree data structureYes, Java
SHRiMPIndexes the reference genome as of version 2. Uses masks to generate possible keys. Can map ABI SOLiD color space reads.YesYesYesYes, OpenMP Free, [[BSD licensesFree, BSD]] derivative

[51] [52]

2009-2011
SLIDERSlider is an application for the Illumina Sequence Analyzer output that uses the "probability" files instead of the sequence files as an input for alignment to a reference sequence or a set of reference sequences.YesYesNoNo [53] [54] 2009-2010
SOAP, SOAP2, SOAP3, SOAP3-dpSOAP: robust with a small (1-3) number of gaps and mismatches. Speed improvement over BLAT, uses a 12 letter hash table. SOAP2: using bidirectional BWT to build the index of reference, and it is much faster than the first version. SOAP3: GPU-accelerated version that could find all 4-mismatch alignments in tens of seconds per one million reads. SOAP3-dp, also GPU accelerated, supports arbitrary number of mismatches and gaps according to affine gap penalty scores.YesNoYes, SOAP3-dpYes, POSIX Threads; SOAP3, SOAP3-dp need GPU with CUDA supportFree, GPL [55] [56]
SOCSFor ABI SOLiD technologies. Significant increase in time to map reads with mismatches (or color errors). Uses an iterative version of the Rabin-Karp string search algorithm.YesFree, GPL
SparkBWAIntegrates the Burrows–Wheeler Aligner (BWA) on an Apache Spark framework running atop Hadoop. Version 0.2 of October 2016, supports the algorithms BWA-MEM, BWA-backtrack, and BWA-ALN. All of them work with single-reads and paired-end reads.YesLow quality bases trimmingYesYesFree, GPL 3 [57] 2016
SSAHA, SSAHA2Fast for a small number of variants Proprietary, freeware for academic and noncommercial use
StampyFor Illumina reads. High specificity, and sensitive for reads with indels, structural variants, or many SNPs. Slow, but speed increased dramatically by using BWA for first alignment pass.YesYesYesNo Proprietary, freeware for academic and noncommercial use [58] 2010
SToRMFor Illumina or ABI SOLiD reads, with SAM native output. Highly sensitive for reads with many errors, indels (full from 0 to 15, extended support otherwise). Uses spaced seeds (single hit) and a very fast SSE-SSE2-AVX2-AVX-512 banded alignment filter. For fixed-length reads only, authors recommend SHRiMP2 otherwise.NoYesYesYes, OpenMP Free [59] 2010
Subread, SubjuncSuperfast and accurate read aligners. Subread can be used to map both gDNA-seq and RNA-seq reads. Subjunc detects exon-exon junctions and maps RNA-seq reads. They employ a novel mapping paradigm named seed-and-vote.YesYesYesYesFree, GPL 3
TaipanDe-novo assembler for Illumina reads Proprietary, freeware for academic and noncommercial use
UGENE Visual interface both for Bowtie and BWA, and an embedded alignerYesYesYesYesFree, GPL
VelociMapperFPGA-accelerated reference sequence alignment mapping tool from TimeLogic. Faster than Burrows–Wheeler transform-based algorithms like BWA and Bowtie. Supports up to 7 mismatches and/or indels with no performance penalty. Produces sensitive Smith–Waterman gapped alignments.YesYesYesYes Proprietary, commercial
XpressAlignFPGA based sliding window short read aligner which exploits the embarrassingly parallel property of short read alignment. Performance scales linearly with number of transistors on a chip (i.e. performance guaranteed to double with each iteration of Moore's Law without modification to algorithm). Low power consumption is useful for datacentre equipment. Predictable runtime. Better price/performance than software sliding window aligners on current hardware, but not better than software BWT-based aligners currently. Can manage large numbers (>2) of mismatches. Will find all hit positions for all seeds. Single-FPGA experimental version, needs work to develop it into a multi-FPGA production version. Proprietary, freeware for academic and noncommercial use
ZOOM100% sensitivity for a reads between 15 and 240 bp with practical mismatches. Very fast. Support insertions and deletions. Works with Illumina & SOLiD instruments, not 454.Yes (GUI), no (CLI) Proprietary, commercial [60]

See also

Related Research Articles

In bioinformatics, BLAST is an algorithm and program for comparing primary biological sequence information, such as the amino-acid sequences of proteins or the nucleotides of DNA and/or RNA sequences. A BLAST search enables a researcher to compare a subject protein or nucleotide sequence with a library or database of sequences, and identify database sequences that resemble the query sequence above a certain threshold. For example, following the discovery of a previously unknown gene in the mouse, a scientist will typically perform a BLAST search of the human genome to see if humans carry a similar gene; BLAST will identify sequences in the human genome that resemble the mouse gene based on similarity of sequence.

In bioinformatics and biochemistry, the FASTA format is a text-based format for representing either nucleotide sequences or amino acid (protein) sequences, in which nucleotides or amino acids are represented using single-letter codes.

In bioinformatics, sequence clustering algorithms attempt to group biological sequences that are somehow related. The sequences can be either of genomic, "transcriptomic" (ESTs) or protein origin. For proteins, homologous sequences are typically grouped into families. For EST data, clustering is important to group sequences originating from the same gene before the ESTs are assembled to reconstruct the original mRNA.

In molecular biology, open reading frames (ORFs) are defined as spans of DNA sequence between the start and stop codons. Usually, this is considered within a studied region of a prokaryotic DNA sequence, where only one of the six possible reading frames will be "open". Such an ORF may contain a start codon and by definition cannot extend beyond a stop codon. That start codon indicates where translation may start. The transcription termination site is located after the ORF, beyond the translation stop codon. If transcription were to cease before the stop codon, an incomplete protein would be made during translation.

<span class="mw-page-title-main">Pfam</span> Database of protein families

Pfam is a database of protein families that includes their annotations and multiple sequence alignments generated using hidden Markov models. The most recent version, Pfam 36.0, was released in September 2023 and contains 20,795 families.

<span class="mw-page-title-main">Dot plot (bioinformatics)</span>

In bioinformatics a dot plot is a graphical method for comparing two biological sequences and identifying regions of close similarity after sequence alignment. It is a type of recurrence plot.

SUPERFAMILY is a database and search platform of structural and functional annotation for all proteins and genomes. It classifies amino acid sequences into known structural domains, especially into SCOP superfamilies. Domains are functional, structural, and evolutionary units that form proteins. Domains of common Ancestry are grouped into superfamilies. The domains and domain superfamilies are defined and described in SCOP. Superfamilies are groups of proteins which have structural evidence to support a common evolutionary ancestor but may not have detectable sequence homology.

<span class="mw-page-title-main">Richard M. Durbin</span> British computational biologist

Richard Michael Durbin is a British computational biologist and Al-Kindi Professor of Genetics at the University of Cambridge. He also serves as an associate faculty member at the Wellcome Sanger Institute where he was previously a senior group leader.

SOAP is a suite of bioinformatics software tools from the BGI Bioinformatics department enabling the assembly, alignment, and analysis of next generation DNA sequencing data. It is particularly suited to short read sequencing data.

<span class="mw-page-title-main">David T. Jones (scientist)</span> British bioinformatician

David Tudor Jones is a Professor of Bioinformatics, and Head of Bioinformatics Group in the University College London. He is also the director in Bloomsbury Center for Bioinformatics, which is a joint Research Centre between UCL and Birkbeck, University of London and which also provides bioinformatics training and support services to biomedical researchers. In 2013, he is a member of editorial boards for PLoS ONE, BioData Mining, Advanced Bioinformatics, Chemical Biology & Drug Design, and Protein: Structure, Function and Bioinformatics.

In metagenomics, binning is the process of grouping reads or contigs and assigning them to individual genome. Binning methods can be based on either compositional features or alignment (similarity), or both.

In bioinformatics, alignment-free sequence analysis approaches to molecular sequence and structure data provide alternatives over alignment-based approaches.

Bowtie is a software package commonly used for sequence alignment and sequence analysis in bioinformatics. The source code for the package is distributed freely and compiled binaries are available for Linux, macOS and Windows platforms. As of 2017, the Genome Biology paper describing the original Bowtie method has been cited more than 11,000 times. Bowtie is open-source software and is currently maintained by Johns Hopkins University.

De novo sequence assemblers are a type of program that assembles short nucleotide sequences into longer ones without the use of a reference genome. These are most commonly used in bioinformatic studies to assemble genomes or transcriptomes. Two common types of de novo assemblers are greedy algorithm assemblers and De Bruijn graph assemblers.

Bloom filters are space-efficient probabilistic data structures used to test whether an element is a part of a set. Bloom filters require much less space than other data structures for representing sets, however the downside of Bloom filters is that there is a false positive rate when querying the data structure. Since multiple elements may have the same hash values for a number of hash functions, then there is a probability that querying for a non-existent element may return a positive if another element with the same hash values has been added to the Bloom filter. Assuming that the hash function has equal probability of selecting any index of the Bloom filter, the false positive rate of querying a Bloom filter is a function of the number of bits, number of hash functions and number of elements of the Bloom filter. This allows the user to manage the risk of a getting a false positive by compromising on the space benefits of the Bloom filter.

In bioinformatics, a spaced seed is a pattern of relevant and irrelevant positions in a biosequence and a method of approximate string matching that allows for substitutions. They are a straightforward modification to the earliest heuristic-based alignment efforts that allow for minor differences between the sequences of interest. Spaced seeds have been used in homology search., alignment, assembly, and metagenomics. They are usually represented as a sequence of zeroes and ones, where a one indicates relevance and a zero indicates irrelevance at the given position. Some visual representations use pound signs for relevant and dashes or asterisks for irrelevant positions.

References

  1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ; Gish; Miller; Myers; Lipman (October 1990). "Basic local alignment search tool". Journal of Molecular Biology. 215 (3): 403–10. doi:10.1016/S0022-2836(05)80360-2. PMID   2231712. S2CID   14441902.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  2. HPC-BLAST code repository https://github.com/UTennessee-JICS/HPC-BLAST
  3. Angermüller, C.; Biegert, A.; Söding, J. (Dec 2012). "Discriminative modelling of context-specific amino acid substitution probabilities". Bioinformatics. 28 (24): 3240–7. doi: 10.1093/bioinformatics/bts622 . hdl: 11858/00-001M-0000-0015-8D22-F . PMID   23080114.
  4. Buchfink, Xie and Huson (2015). "Fast and sensitive protein alignment using DIAMOND". Nature Methods. 12 (1): 59–60. doi:10.1038/nmeth.3176. PMID   25402007. S2CID   5346781.
  5. B Buchfink, K Reuter and HG Drost (2021). "Sensitive protein alignments at tree-of-life scale using DIAMOND". Nature Methods. 18 (4): 366–368. doi: 10.1038/s41592-021-01101-x . PMC   8026399 . PMID   33828273.
  6. Durbin, Richard; Eddy, Sean R.; Krogh, Anders; Mitchison, Graeme, eds. (1998). Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge, UK: Cambridge University Press. ISBN   978-0-521-62971-3.[ page needed ]
  7. Söding J (April 2005). "Protein homology detection by HMM-HMM comparison". Bioinformatics. 21 (7): 951–60. doi: 10.1093/bioinformatics/bti125 . hdl: 11858/00-001M-0000-0017-EC7A-F . PMID   15531603.
  8. Remmert, Michael; Biegert, Andreas; Hauser, Andreas; Söding, Johannes (2011-12-25). "HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment". Nature Methods. 9 (2): 173–175. doi:10.1038/nmeth.1818. hdl: 11858/00-001M-0000-0015-8D56-A . ISSN   1548-7105. PMID   22198341. S2CID   205420247.
  9. Hauswedell H, Singer J, Reinert K (2014-09-01). "Lambda: the local aligner for massive biological data". Bioinformatics. 30 (17): 349–355. doi:10.1093/bioinformatics/btu439. PMC   4147892 . PMID   25161219.
  10. Steinegger, Martin; Soeding, Johannes (2017-10-16). "MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets". Nature Biotechnology. 35 (11): 1026–1028. doi:10.1038/nbt.3988. hdl: 11858/00-001M-0000-002E-1967-3 . PMID   29035372. S2CID   402352.
  11. Rucci, Enzo; Garcia, Carlos; Botella, Guillermo; Giusti, Armando E. De; Naiouf, Marcelo; Prieto-Matias, Manuel (2016-06-30). "OSWALD: OpenCL Smith–Waterman on Altera's FPGA for Large Protein Databases". International Journal of High Performance Computing Applications. 32 (3): 337–350. doi:10.1177/1094342016654215. ISSN   1094-3420. S2CID   212680914.
  12. Altschul SF, Madden TL, Schäffer AA, et al. (September 1997). "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs". Nucleic Acids Research. 25 (17): 3389–402. doi:10.1093/nar/25.17.3389. PMC   146917 . PMID   9254694.
  13. Li W, McWilliam H, Goujon M, et al. (June 2012). "PSI-Search: iterative HOE-reduced profile SSEARCH searching". Bioinformatics. 28 (12): 1650–1651. doi:10.1093/bioinformatics/bts240. PMC   3371869 . PMID   22539666.
  14. Oehmen, C.; Nieplocha, J. (August 2006). "ScalaBLAST: A scalable implementation of BLAST for high-performance data-intensive bioinformatics analysis". IEEE Transactions on Parallel and Distributed Systems. 17 (8): 740–749. doi:10.1109/TPDS.2006.112. S2CID   11122366.
  15. Hughey, R.; Karplus, K.; Krogh, A. (2003). SAM: sequence alignment and modeling software system. Technical report UCSC-CRL-99-11 (Report). University of California, Santa Cruz, CA.
  16. Rucci, Enzo; García, Carlos; Botella, Guillermo; De Giusti, Armando; Naiouf, Marcelo; Prieto-Matías, Manuel (2015-12-25). "An energy-aware performance analysis of SWIMM: Smith–Waterman implementation on Intel's Multicore and Manycore architectures". Concurrency and Computation: Practice and Experience. 27 (18): 5517–5537. doi:10.1002/cpe.3598. hdl: 11336/53930 . ISSN   1532-0634. S2CID   42945406.
  17. Rucci, Enzo; García, Carlos; Botella, Guillermo; De Giusti, Armando; Naiouf, Marcelo; Prieto-Matías, Manuel (2015-12-25). "SWIMM 2.0: enhanced Smith-Waterman on Intel's Multicore and Manycore architectures based on AVX-512 vector extensions". International Journal of Parallel Programming. 47 (2): 296–317. doi:10.1007/s10766-018-0585-7. ISSN   1573-7640. S2CID   49670113.
  18. Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W; Kent; Smit; Zhang; Baertsch; Hardison; Haussler; Miller (2003). "Human-mouse alignments with BLASTZ". Genome Research. 13 (1): 103–107. doi:10.1101/gr.809403. PMC   430961 . PMID   12529312.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  19. Harris R S (2007). Improved pairwise alignment of genomic DNA (Thesis).
  20. Sandes, Edans F. de O.; de Melo, Alba Cristina M.A. (May 2013). "Retrieving Smith-Waterman Alignments with Optimizations for Megabase Biological Sequences Using GPU". IEEE Transactions on Parallel and Distributed Systems. 24 (5): 1009–1021. doi:10.1109/TPDS.2012.194.
  21. Sandes, Edans F. de O.; Miranda, G.; De Melo, A.C.M.A.; Martorell, X.; Ayguade, E. (May 2014). CUDAlign 3.0: Parallel Biological Sequence Comparison in Large GPU Clusters. Cluster, Cloud and Grid Computing (CCGrid), 2014 14th IEEE/ACM International Symposium on. p. 160. doi:10.1109/CCGrid.2014.18.
  22. Sandes, Edans F. de O.; Miranda, G.; De Melo, A.C.M.A.; Martorell, X.; Ayguade, E. (August 2014). Fine-grain Parallel Megabase Sequence Comparison with Multiple Heterogeneous GPUs. Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. pp. 383–384. doi:10.1145/2555243.2555280.
  23. Chivian, D; Baker, D (2006). "Homology modeling using parametric alignment ensemble generation with consensus and energy-based model selection". Nucleic Acids Research. 34 (17): e112. doi:10.1093/nar/gkl480. PMC   1635247 . PMID   16971460.
  24. Girdea, M; Noe, L; Kucherov, G (January 2010). "Back-translation for discovering distant protein homologies in the presence of frameshift mutations". Algorithms for Molecular Biology. 5 (6): 6. doi: 10.1186/1748-7188-5-6 . PMC   2821327 . PMID   20047662.
  25. Ma, B.; Tromp, J.; Li, M. (2002). "PatternHunter: faster and more sensitive homology search". Bioinformatics. 18 (3): 440–445. doi: 10.1093/bioinformatics/18.3.440 . PMID   11934743.
  26. Li, M.; Ma, B.; Kisman, D.; Tromp, J. (2004). "Patternhunter II: highly sensitive and fast homology search". Journal of Bioinformatics and Computational Biology. 2 (3): 417–439. CiteSeerX   10.1.1.1.2393 . doi:10.1142/S0219720004000661. PMID   15359419.
  27. Gusfield, Dan (1997). Algorithms on strings, trees and sequences. Cambridge university press. ISBN   978-0-521-58519-4.
  28. Rucci, Enzo; Garcia, Carlos; Botella, Guillermo; Naiouf, Marcelo; De Giusti,Armando; Prieto-Matias, Manuel (2018). "SWIFOLD: Smith-Waterman implementation on FPGA with OpenCL for long DNA sequences". BMC Systems Biology. 12 (Suppl 5): 96. doi: 10.1186/s12918-018-0614-6 . PMC   6245597 . PMID   30458766.
  29. Rucci, Enzo; Garcia, Carlos; Botella, Guillermo; Naiouf, Marcelo; De Giusti,Armando; Prieto-Matias, Manuel. Accelerating Smith-Waterman Alignment of Long DNA Sequences with OpenCL on FPGA. 5th International Work-Conference on Bioinformatics and Biomedical Engineering. pp. 500–511. doi:10.1007/978-3-319-56154-7_45.
  30. Rasmussen K, Stoye J, Myers EW; Stoye; Myers (2006). "Efficient q-Gram Filters for Finding All epsilon-Matches over a Given Length". Journal of Computational Biology. 13 (2): 296–308. CiteSeerX   10.1.1.465.2084 . doi:10.1089/cmb.2006.13.296. PMID   16597241.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  31. Noe L, Kucherov G; Kucherov (2005). "YASS: enhancing the sensitivity of DNA similarity search". Nucleic Acids Research. 33 (suppl_2): W540–W543. doi:10.1093/nar/gki478. PMC   1160238 . PMID   15980530.
  32. Pratas, Diogo; Silva, Jorge (2020). "Persistent minimal sequences of SARS-CoV-2". Bioinformatics. 36 (21): 5129–5132. doi: 10.1093/bioinformatics/btaa686 . PMC   7559010 . PMID   32730589.
  33. Wilton, Richard; Budavari, Tamas; Langmead, Ben; Wheelan, Sarah J.; Salzberg, Steven L.; Szalay, Alexander S. (2015). "Arioc: high-throughput read alignment with GPU-accelerated exploration of the seed-and-extend search space". PeerJ. 3: e808. doi: 10.7717/peerj.808 . PMC   4358639 . PMID   25780763.
  34. Homer, Nils; Merriman, Barry; Nelson, Stanley F. (2009). "BFAST: An Alignment Tool for Large Scale Genome Resequencing". PLOS ONE. 4 (11): e7767. Bibcode:2009PLoSO...4.7767H. doi: 10.1371/journal.pone.0007767 . PMC   2770639 . PMID   19907642.
  35. Abuín, J.M.; Pichel, J.C.; Pena, T.F.; Amigo, J. (2015). "BigBWA: approaching the Burrows–Wheeler aligner to Big Data technologies". Bioinformatics. 31 (24): 4003–5. doi: 10.1093/bioinformatics/btv506 . PMID   26323715.
  36. Kent, W. J. (2002). "BLAT---The BLAST-Like Alignment Tool". Genome Research. 12 (4): 656–664. doi:10.1101/gr.229202. ISSN   1088-9051. PMC   187518 . PMID   11932250.
  37. Langmead, Ben; Trapnell, Cole; Pop, Mihai; Salzberg, Steven L (2009). "Ultrafast and memory-efficient alignment of short DNA sequences to the human genome". Genome Biology. 10 (3): R25. doi: 10.1186/gb-2009-10-3-r25 . ISSN   1465-6906. PMC   2690996 . PMID   19261174.
  38. Li, H.; Durbin, R. (2009). "Fast and accurate short read alignment with Burrows–Wheeler transform". Bioinformatics. 25 (14): 1754–1760. doi:10.1093/bioinformatics/btp324. ISSN   1367-4803. PMC   2705234 . PMID   19451168.
  39. 1 2 Kerpedjiev, Peter; Frellsen, Jes; Lindgreen, Stinus; Krogh, Anders (2014). "Adaptable probabilistic mapping of short reads using position specific scoring matrices". BMC Bioinformatics. 15 (1): 100. doi: 10.1186/1471-2105-15-100 . ISSN   1471-2105. PMC   4021105 . PMID   24717095.
  40. Liu, Y.; Schmidt, B.; Maskell, D. L. (2012). "CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows–Wheeler transform". Bioinformatics. 28 (14): 1830–1837. doi: 10.1093/bioinformatics/bts276 . ISSN   1367-4803. PMID   22576173.
  41. Liu, Y.; Schmidt, B. (2012). "Long read alignment based on maximal exact match seeds". Bioinformatics. 28 (18): i318–i324. doi:10.1093/bioinformatics/bts414. ISSN   1367-4803. PMC   3436841 . PMID   22962447.
  42. Rizk, Guillaume; Lavenier, Dominique (2010). "GASSST: global alignment short sequence search tool". Bioinformatics. 26 (20): 2534–2540. doi:10.1093/bioinformatics/btq485. PMC   2951093 . PMID   20739310.
  43. Marco-Sola, Santiago; Sammeth, Michael; Guigó, Roderic; Ribeca, Paolo (2012). "The GEM mapper: fast, accurate and versatile alignment by filtration". Nature Methods. 9 (12): 1185–1188. doi:10.1038/nmeth.2221. ISSN   1548-7091. PMID   23103880. S2CID   2004416.
  44. Clement, N. L.; Snell, Q.; Clement, M. J.; Hollenhorst, P. C.; Purwar, J.; Graves, B. J.; Cairns, B. R.; Johnson, W. E. (2009). "The GNUMAP algorithm: unbiased probabilistic mapping of oligonucleotides from next-generation sequencing". Bioinformatics. 26 (1): 38–45. doi:10.1093/bioinformatics/btp614. ISSN   1367-4803. PMC   6276904 . PMID   19861355.
  45. Santana-Quintero, Luis; Dingerdissen, Hayley; Thierry-Mieg, Jean; Mazumder, Raja; Simonyan, Vahan (2014). "HIVE-Hexagon: High-Performance, Parallelized Sequence Alignment for Next-Generation Sequencing Data Analysis". PLOS ONE. 9 (6): 1754–1760. Bibcode:2014PLoSO...999033S. doi: 10.1371/journal.pone.0099033 . PMC   4053384 . PMID   24918764.
  46. Kielbasa, S.M.; Wan, R.; Sato, K.; Horton, P.; Frith, M.C. (2011). "Adaptive seeds tame genomic sequence comparison". Genome Research. 21 (3): 487–493. doi:10.1101/gr.113985.110. PMC   3044862 . PMID   21209072.
  47. Rivals, Eric; Salmela, Leena; Kiiskinen, Petteri; Kalsi, Petri; Tarhio, Jorma (2009). "Mpscan: Fast Localisation of Multiple Reads in Genomes". Algorithms in Bioinformatics. Lecture Notes in Computer Science. Vol. 5724. pp. 246–260. Bibcode:2009LNCS.5724..246R. CiteSeerX   10.1.1.156.928 . doi:10.1007/978-3-642-04241-6_21. ISBN   978-3-642-04240-9. S2CID   17187140.
  48. Sedlazeck, Fritz J.; Rescheneder, Philipp; von Haeseler, Arndt (2013). "NextGenMap: fast and accurate read mapping in highly polymorphic genomes". Bioinformatics. 29 (21): 2790–2791. doi: 10.1093/bioinformatics/btt468 . PMID   23975764.
  49. Chen, Yangho; Souaiaia, Tade; Chen, Ting (2009). "PerM: efficient mapping of short sequencing reads with periodic full sensitive spaced seeds". Bioinformatics. 25 (19): 2514–2521. doi:10.1093/bioinformatics/btp486. PMC   2752623 . PMID   19675096.
  50. Searls, David B.; Hoffmann, Steve; Otto, Christian; Kurtz, Stefan; Sharma, Cynthia M.; Khaitovich, Philipp; Vogel, Jörg; Stadler, Peter F.; Hackermüller, Jörg (2009). "Fast Mapping of Short Sequences with Mismatches, Insertions and Deletions Using Index Structures". PLOS Computational Biology. 5 (9): e1000502. Bibcode:2009PLSCB...5E0502H. doi: 10.1371/journal.pcbi.1000502 . ISSN   1553-7358. PMC   2730575 . PMID   19750212.
  51. Rumble, Stephen M.; Lacroute, Phil; Dalca, Adrian V.; Fiume, Marc; Sidow, Arend; Brudno, Michael (2009). "SHRiMP: Accurate Mapping of Short Color-space Reads". PLOS Computational Biology. 5 (5): e1000386. Bibcode:2009PLSCB...5E0386R. doi: 10.1371/journal.pcbi.1000386 . PMC   2678294 . PMID   19461883.
  52. David, Matei; Dzamba, Misko; Lister, Dan; Ilie, Lucian; Brudno, Michael (2011). "SHRiMP2: Sensitive yet Practical Short Read Mapping". Bioinformatics. 27 (7): 1011–1012. doi: 10.1093/bioinformatics/btr046 . PMID   21278192.
  53. Malhis, Nawar; Butterfield, Yaron S. N.; Ester, Martin; Jones, Steven J. M. (2009). "Slider – Maximum use of probability information for alignment of short sequence reads and SNP detection". Bioinformatics. 25 (1): 6–13. doi:10.1093/bioinformatics/btn565. PMC   2638935 . PMID   18974170.
  54. Malhis, Nawar; Jones, Steven J. M. (2010). "High Quality SNP Calling Using Illumina Data at Shallow Coverage". Bioinformatics. 26 (8): 1029–1035. doi:10.1093/bioinformatics/btq092. PMID   20190250.
  55. Li, R.; Li, Y.; Kristiansen, K.; Wang, J. (2008). "SOAP: short oligonucleotide alignment program". Bioinformatics. 24 (5): 713–714. doi: 10.1093/bioinformatics/btn025 . ISSN   1367-4803. PMID   18227114.
  56. Li, R.; Yu, C.; Li, Y.; Lam, T.-W.; Yiu, S.-M.; Kristiansen, K.; Wang, J. (2009). "SOAP2: an improved ultrafast tool for short read alignment". Bioinformatics. 25 (15): 1966–1967. doi:10.1093/bioinformatics/btp336. ISSN   1367-4803. PMID   19497933.
  57. Abuín, José M.; Pichel, Juan C.; Pena, Tomás F.; Amigo, Jorge (2016-05-16). "SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data". PLOS ONE. 11 (5): e0155461. Bibcode:2016PLoSO..1155461A. doi: 10.1371/journal.pone.0155461 . ISSN   1932-6203. PMC   4868289 . PMID   27182962.
  58. Lunter, G.; Goodson, M. (2010). "Stampy: A statistical algorithm for sensitive and fast mapping of Illumina sequence reads". Genome Research. 21 (6): 936–939. doi:10.1101/gr.111120.110. ISSN   1088-9051. PMC   3106326 . PMID   20980556.
  59. Noe, L.; Girdea, M.; Kucherov, G. (2010). "Designing efficient spaced seeds for SOLiD read mapping". Advances in Bioinformatics. 2010: 708501. doi: 10.1155/2010/708501 . PMC   2945724 . PMID   20936175.
  60. Lin, H.; Zhang, Z.; Zhang, M.Q.; Ma, B.; Li, M. (2008). "ZOOM! Zillions of oligos mapped". Bioinformatics. 24 (21): 2431–2437. doi:10.1093/bioinformatics/btn416. PMC   2732274 . PMID   18684737.