Short Oligonucleotide Analysis Package

Last updated February 24, 2025

SOAP (Short Oligonucleotide Analysis Package) is a suite of bioinformatics software tools from the BGI Bioinformatics department enabling the assembly, alignment, and analysis of next generation DNA sequencing data. It is particularly suited to short read sequencing data.

Functionality

The SOAP suite of tools can be used to perform the following genome assembly tasks:

Sequence Alignment

SOAPaligner (SOAP2) is specifically designed for fast alignment of short reads and performs favorably with respect to similar alignment tools such as Bowtie and MAQ.^[1]

Genome Assembly

SOAPdenovo is a short read de novo assembler utilizing De Bruijn graph construction. It is optimized for short reads such as that generated by Illumina and is capable of assembling large genomes such as the human genome.^[2]SOAPdenovo was used to assemble the genome of the giant panda.^[3] This was upgraded to SOAPdenovo2, which was optimized for large genomes and included the widely used GapCloser module.^[4]

Transcriptome Assembly

SOAPdenovo-Trans is a de novo transcriptome assembler designed specifically for RNA-Seq that was created for the 1000 Plant Genomes project.^[5]

Indel Discovery

SOAPindel is a tool to find insertions and deletions from next generation paired-end sequencing data, providing a list of candidate indels with quality scores.^[6]

SNP Discovery

SOAPsnp is a consensus sequence builder. This tool uses the output from SOAPaligner to generate a consensus sequence which enables SNPs to be called on a newly sequenced individual.

Structural Variation Discovery

SOAPsv is a tool to find structural variations using whole genome assembly.^[7]

Quality control and preprocessing

SOAPnuke is a tool for integrated quality control and preprocessing of datasets from genomic, small RNA, Digital Gene Expression, and metagenomic experiments.^[8]

History

SOAP v1

The first release of SOAP consisted only of the sequence alignment tool SOAPaligner.^[9]

SOAP v2

SOAP v2 ^[1] extended and improved on SOAP v1 by significantly improving the performance of the SOAPaligner tool. Alignment time was reduced by a factor of 20-30, while memory usage was reduced by a factor of 3. Support was added for compressed file formats.

The SOAP suite was expanded then to include the new tools: SOAPdenovo 1&2, SOAPindel, SOAPsnp, and SOAPsv.

SOAP v3

SOAP v3 extended the alignment tool by being the first short-read alignment tool to utilize GPU processors.^[10] As a result of these improvements, SOAPalign significantly outperformed competing aligners Bowtie and BWA in terms of speed.

External links

References

1 2 Li, R.; Yu, C.; Li, Y.; Lam, T.-W.; Yiu, S.-M.; Kristiansen, K.; Wang, J. (2009). "SOAP2: an improved ultrafast tool for short read alignment". Bioinformatics. 25 (15): 1966–1967. doi:10.1093/bioinformatics/btp336. ISSN 1367-4803. PMID 19497933.
↑ Li, R.; Zhu, H.; Ruan, J.; Qian, W.; Fang, X.; Shi, Z.; Li, Y.; Li, S.; Shan, G.; Kristiansen, K.; Li, S.; Yang, H.; Wang, J.; Wang, J. (2009). "De novo assembly of human genomes with massively parallel short read sequencing". Genome Research. 20 (2): 265–272. doi:10.1101/gr.097261.109. ISSN 1088-9051. PMC 2813482 . PMID 20019144.
↑ Li, Ruiqiang; Fan, Wei; Tian, Geng; Zhu, Hongmei; He, Lin; Cai, Jing; Huang, Quanfei; Cai, Qingle; Li, Bo; Bai, Yinqi; Zhang, Zhihe; Zhang, Yaping; Wang, Wen; Li, Jun; Wei, Fuwen; Li, Heng; Jian, Min; Li, Jianwen; Zhang, Zhaolei; Nielsen, Rasmus; Li, Dawei; Gu, Wanjun; Yang, Zhentao; Xuan, Zhaoling; Ryder, Oliver A.; Leung, Frederick Chi-Ching; Zhou, Yan; Cao, Jianjun; Sun, Xiao; et al. (2009). "The sequence and de novo assembly of the giant panda genome". Nature. 463 (7279): 311–317. Bibcode:2010Natur.463..311L. doi:10.1038/nature08696. ISSN 0028-0836. PMC 3951497 . PMID 20010809.
↑ Luo, Ruibang; Liu, Binghang; Xie, Yinlong; Li, Zhenyu; Huang, Weihua; Yuan, Jianying; He, Guangzhu; Chen, Yanxiang; Pan, Qi; Liu, Yunjie; Tang, Jingbo (2012-12-01). "SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler". GigaScience. 1 (1): 18. doi: 10.1186/2047-217X-1-18 . PMC 3626529 . PMID 23587118.
↑ Xie, Yinlong; Wu, Gengxiong; Tang, Jingbo; Luo, Ruibang; Patterson, Jordan; Liu, Shanlin; Huang, Weihua; He, Guangzhu; Gu, Shengchang; Li, Shengkang; Zhou, Xin (2014-06-15). "SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads". Bioinformatics. 30 (12): 1660–1666. arXiv: 1305.6760 . doi: 10.1093/bioinformatics/btu077 . ISSN 1367-4803. PMID 24532719.
↑ Li, Shengting; Li, Ruiqiang; Li, Heng; Lu, Jianliang; Li, Yingrui; Bolund, Lars; Schierup, Mikkel H.; Wang, Jun (2013-01-01). "SOAPindel: Efficient identification of indels from short paired reads". Genome Research. 23 (1): 195–200. doi: 10.1101/gr.132480.111 . ISSN 1088-9051. PMC 3530679 . PMID 22972939.
↑ Li, Yingrui; Zheng, Hancheng; Luo, Ruibang; Wu, Honglong; Zhu, Hongmei; Li, Ruiqiang; Cao, Hongzhi; Wu, Boxin; Huang, Shujia; Shao, Haojing; Ma, Hanzhou (August 2011). "Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly". Nature Biotechnology. 29 (8): 723–730. doi: 10.1038/nbt.1904 . ISSN 1546-1696. PMID 21785424.
↑ Chen, Yuxin; Chen, Yongsheng; Shi, Chunmei; Huang, Zhibo; Zhang, Yong; Li, Shengkang; Li, Yan; Ye, Jia; Yu, Chang; Li, Zhuo; Zhang, Xiuqing (2018-01-01). "SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data". GigaScience. 7 (1): 1–6. doi:10.1093/gigascience/gix120. PMC 5788068 . PMID 29220494.
↑ Li, R.; Li, Y.; Kristiansen, K.; Wang, J. (2008). "SOAP: short oligonucleotide alignment program". Bioinformatics. 24 (5): 713–714. doi: 10.1093/bioinformatics/btn025 . ISSN 1367-4803. PMID 18227114.
↑ Liu, C.-M.; Wong, T.; Wu, E.; Luo, R.; Yiu, S.-M.; Li, Y.; Wang, B.; Yu, C.; Chu, X.; Zhao, K.; Li, R.; Lam, T.-W. (2012). "SOAP3: ultra-fast GPU-based parallel alignment tool for short reads". Bioinformatics. 28 (6): 878–879. doi: 10.1093/bioinformatics/bts061 . ISSN 1367-4803. PMID 22285832.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[LiYu2009-1] 1 2 Li, R.; Yu, C.; Li, Y.; Lam, T.-W.; Yiu, S.-M.; Kristiansen, K.; Wang, J. (2009). "SOAP2: an improved ultrafast tool for short read alignment". Bioinformatics. 25 (15): 1966–1967. doi:10.1093/bioinformatics/btp336. ISSN 1367-4803. PMID 19497933.

[LiZhu2009-2] Li, R.; Zhu, H.; Ruan, J.; Qian, W.; Fang, X.; Shi, Z.; Li, Y.; Li, S.; Shan, G.; Kristiansen, K.; Li, S.; Yang, H.; Wang, J.; Wang, J. (2009). "De novo assembly of human genomes with massively parallel short read sequencing". Genome Research. 20 (2): 265–272. doi:10.1101/gr.097261.109. ISSN 1088-9051. PMC 2813482 . PMID 20019144.

[LiFan2009-3] Li, Ruiqiang; Fan, Wei; Tian, Geng; Zhu, Hongmei; He, Lin; Cai, Jing; Huang, Quanfei; Cai, Qingle; Li, Bo; Bai, Yinqi; Zhang, Zhihe; Zhang, Yaping; Wang, Wen; Li, Jun; Wei, Fuwen; Li, Heng; Jian, Min; Li, Jianwen; Zhang, Zhaolei; Nielsen, Rasmus; Li, Dawei; Gu, Wanjun; Yang, Zhentao; Xuan, Zhaoling; Ryder, Oliver A.; Leung, Frederick Chi-Ching; Zhou, Yan; Cao, Jianjun; Sun, Xiao; et al. (2009). "The sequence and de novo assembly of the giant panda genome". Nature. 463 (7279): 311–317. Bibcode:2010Natur.463..311L. doi:10.1038/nature08696. ISSN 0028-0836. PMC 3951497 . PMID 20010809.

[4] Luo, Ruibang; Liu, Binghang; Xie, Yinlong; Li, Zhenyu; Huang, Weihua; Yuan, Jianying; He, Guangzhu; Chen, Yanxiang; Pan, Qi; Liu, Yunjie; Tang, Jingbo (2012-12-01). "SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler". GigaScience. 1 (1): 18. doi: 10.1186/2047-217X-1-18 . PMC 3626529 . PMID 23587118.

[5] Xie, Yinlong; Wu, Gengxiong; Tang, Jingbo; Luo, Ruibang; Patterson, Jordan; Liu, Shanlin; Huang, Weihua; He, Guangzhu; Gu, Shengchang; Li, Shengkang; Zhou, Xin (2014-06-15). "SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads". Bioinformatics. 30 (12): 1660–1666. arXiv: 1305.6760 . doi: 10.1093/bioinformatics/btu077 . ISSN 1367-4803. PMID 24532719.

[6] Li, Shengting; Li, Ruiqiang; Li, Heng; Lu, Jianliang; Li, Yingrui; Bolund, Lars; Schierup, Mikkel H.; Wang, Jun (2013-01-01). "SOAPindel: Efficient identification of indels from short paired reads". Genome Research. 23 (1): 195–200. doi: 10.1101/gr.132480.111 . ISSN 1088-9051. PMC 3530679 . PMID 22972939.

[7] Li, Yingrui; Zheng, Hancheng; Luo, Ruibang; Wu, Honglong; Zhu, Hongmei; Li, Ruiqiang; Cao, Hongzhi; Wu, Boxin; Huang, Shujia; Shao, Haojing; Ma, Hanzhou (August 2011). "Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly". Nature Biotechnology. 29 (8): 723–730. doi: 10.1038/nbt.1904 . ISSN 1546-1696. PMID 21785424.

[8] Chen, Yuxin; Chen, Yongsheng; Shi, Chunmei; Huang, Zhibo; Zhang, Yong; Li, Shengkang; Li, Yan; Ye, Jia; Yu, Chang; Li, Zhuo; Zhang, Xiuqing (2018-01-01). "SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data". GigaScience. 7 (1): 1–6. doi:10.1093/gigascience/gix120. PMC 5788068 . PMID 29220494.

[LiLi2008-9] Li, R.; Li, Y.; Kristiansen, K.; Wang, J. (2008). "SOAP: short oligonucleotide alignment program". Bioinformatics. 24 (5): 713–714. doi: 10.1093/bioinformatics/btn025 . ISSN 1367-4803. PMID 18227114.

[LiuWong2012-10] Liu, C.-M.; Wong, T.; Wu, E.; Luo, R.; Yiu, S.-M.; Li, Y.; Wang, B.; Yu, C.; Chu, X.; Zhao, K.; Li, R.; Lam, T.-W. (2012). "SOAP3: ultra-fast GPU-based parallel alignment tool for short reads". Bioinformatics. 28 (6): 878–879. doi: 10.1093/bioinformatics/bts061 . ISSN 1367-4803. PMID 22285832.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

v t e Bioinformatics
Databases	Sequence databases: GenBank, European Nucleotide Archive, DNA Data Bank of Japan and China National GeneBank Secondary databases: UniProt, database of protein sequences grouping together Swiss-Prot, TrEMBL and Protein Information Resource Other databases: BioNumbers, Protein Data Bank, Ensembl, InterPro, KEGG, and Gene Ontology Specialised genomic databases: BOLD, Saccharomyces Genome Database, FlyBase, VectorBase, WormBase, Rat Genome Database, PHI-base, Arabidopsis Information Resource, GISAID and Zebrafish Information Network
Software	BLAST Bowtie Clustal EMBOSS HMMER MUSCLE PANGOLIN SAMtools SOAP suite TopHat
Other	Server: ExPASy Rosalind (education platform)
Institutions	Broad Institute Computational Biology Department (CBD) Microsoft Research - University of Trento Centre for Computational and Systems Biology (COSBI) Database Center for Life Science (DBCLS) DNA Data Bank of Japan (DDBJ) European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory (EMBL) Flatiron Institute J. Craig Venter Institute (JCVI) Max Planck Institute of Molecular Cell Biology and Genetics (MPI-CBG) US National Center for Biotechnology Information (NCBI) Japanese Institute of Genetics Netherlands Bioinformatics Centre (NBIC) Philippine Genome Center (PGC) Scripps Research Swiss Institute of Bioinformatics (SIB) Wellcome Sanger Institute Whitehead Institute
Organizations	African Society for Bioinformatics and Computational Biology (ASBCB) Australia Bioinformatics Resource (EMBL-AR) European Molecular Biology network (EMBnet) International Nucleotide Sequence Database Collaboration (INSDC) International Society for Biocuration (ISB) International Society for Computational Biology (ISCB) Student Council (ISCB-SC) Institute of Genomics and Integrative Biology (CSIR-IGIB) Japanese Society for Bioinformatics (JSBi)
Meetings	Basel Computational Biology Conference‎ ([BC²]) European Conference on Computational Biology (ECCB) Intelligent Systems for Molecular Biology (ISMB) International Conference on Bioinformatics (InCoB) International Conference on Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB) ISCB Africa ASBCB Conference on Bioinformatics Pacific Symposium on Biocomputing (PSB) Research in Computational Molecular Biology (RECOMB)
File formats	CRAM format FASTA format FASTQ format NeXML format Nexus format Pileup format SAM format Stockholm format VCF format GFF format GTF format
Related topics	Computational biology List of biobanks List of biological databases Molecular phylogenetics Sequencing Sequence database Sequence alignment
Category Commons