Nextflow

Nextflow
Original author(s)	Paolo Di Tommaso
Developer(s)	Seqera Labs, Centre for Genomic Regulation
Initial release	April 9, 2013;11 years ago
Stable release	v23.10.1 / January 12, 2024;11 months ago
Preview release	v24.02.0-edge / March 9, 2024;10 months ago
Repository	https://github.com/nextflow-io/nextflow
Written in	Groovy, Java
Operating system	Linux, macOS, WSL
Type	Scientific workflow system, Dataflow programming, Big data
License	Apache License 2.0
Website	nextflow.io

Last updated January 10, 2025

Nextflow is a scientific workflow system predominantly used for bioinformatic data analysis. It establishes standards for programmatically creating a series of dependent computational steps and facilitates their execution on various local and cloud resources.^[1]^[2]

Purpose

Many scientific data analyses require a significant amount of sequential processing steps. Custom scripts may suffice when developing new methods or infrequently running particular analyses, but scale poorly to complex task successions or many samples.^[3]^[4]^[5]

Scientific workflow systems like Nextflow allow formalizing an analysis as a data analysis pipeline. Pipelines, also known as workflows, specify the order and conditions of computing steps. They are accomplished by special purpose programs, so-called workflow executors, which ensure predictable and reproducible behavior in various computing environments.^[3]^[6]^[7]^[8]

Workflow systems also provide built-in solutions to common challenges of workflow development, such as the application to multiple samples, the validation of input and intermediate results, conditional execution of steps, error handling, and report generation. Advanced features of workflow systems may also include scheduling capabilities, graphical user interfaces for monitoring workflow executions, and the management of dependencies by containerizing the whole workflow or its components.^[9]^[10]

Typically, scientific workflow systems initially present a steep learning challenge as all their features and complexities are built on in addition to the actual analysis. However, the standards and abstraction imposed by workflow systems ultimately improve the traceability of analysis steps, which is particularly relevant when collaborating on pipeline development, as is customary in scientific settings.^[11]

Characteristics

Specification of workflows

In Nextflow, pipelines are constructed from individual processes that work in parallel to perform computational tasks. Each process is defined with input requirements and output declarations. Instead of running in a fixed sequence, a process starts executing when all its input requirements are fulfilled. By specifying the output of one process as the input of another, a logical and sequential connection between processes is established.^[12]

This reactive implementation is a key design pattern of Nextflow and is also known as the functional dataflow model.^[13]

Processes and entire workflows are programmed in a domain-specific language (DSL) which is provided by Nextflow which is based on Apache Groovy.^[14] While Nextflow's DSL is used to declare the workflow logic, developers can use their scripting language of choice within a process and mix multiple languages in a workflow. It is also possible to port existing scripts and workflows to Nextflow. Supported scripting languages include bash, csh, ksh, Python, Ruby, and R. Any scripting language that uses the standard Unix shebang declaration (#!/bin/bash) is compatible with Nextflow.

Below is an example of a workflow consisting of only one process:

processhello_world{input:valgreetingoutput:path"${greeting}.txt"script:"""    echo "${greeting} World!" > ${greeting}.txt    """}workflow{Channel.of("Hello","Ciao","Hola","Bonjour")|hello_world}

To enable easy collaboration on workflows, Nextflow natively support for source-code management systems and DevOps platforms including GitHub, GitLab, and others.^[15]

Execution of workflows

Nextflow's DSL allows you to deploy and run workflows across different computing environments without having to modify the pipeline code. Nextflow comes with specific executors for various platforms, including major cloud providers. It supports the following environments for pipeline execution:^[16]

Local: This is the default executor where Nextflow pipelines run on Linux or Mac OS, and the execution occurs on the computer where the pipeline is launched.
HPC workload managers: Nextflow supports workload managers such as Slurm, SGE, LSF, Moab, PBS Pro, PBS/Torque, HTCondor, NQSII, and OAR.
Kubernetes: Nextflow can be used with local or cloud-based Kubernetes implementations (GKE, EKS, or AKS).
Cloud batch services: It is compatible with AWS Batch^[17] and Azure Batch^[18]
Other environments: Nextflow can also be used with Apache Ignite, Google Life Sciences, and various container frameworks for portability.^[19]

Containers for portability across computing environments

In Nextflow, there is tight integration with software containers. Workflows and single processes can utilize containers for their execution across different computing environments, eliminating the need for complex installation and configuration routines.^[3]^[20]

Nextflow supports container frameworks such as Docker, Singularity, Charliecloud, Podman, and Shifter. These containers can be automatically retrieved from external repositories when the pipeline is executed. Additionally, it was revealed at Nextflow Summit 2022 that future versions of Nextflow will support a dedicated container provisioning service for better integration of customized containers into workflows.^[21]^[22]

Developmental history

Nextflow was originally developed at the Centre for Genomic Regulation in Spain and released as an open-source project on GitHub in July 2013.^[23] In October 2018, the project license for Nextflow was changed from GPLv3 to Apache 2.0.^[24]

In July 2018, Seqera Labs was launched as a spin-off from the Centre for Genomic Regulation.^[20] The company employs many of Nextflow's core developers and maintainers and provides commercial services and consulting with a focus on Nextflow. ^[25]

In July 2020, a major extension and revision of Nextflow's domain-specific language was introduced to allow for sub-workflows and additional improvements.^[26] In the same year, monthly downloads of Nextflow reached approximately 55,000.^[20]

Adoption and reception

The nf-core community

The nf-core project has been adopted by several sequencing facilities including the Centre for Genomic Regulation,^[27] the Quantitative Biology Center in Tübingen, the Francis Crick Institute, A*STAR Genome Institute of Singapore, and the Swedish National Genomics Infrastructure as their preferred Scientific workflow system.^[20] These facilities have collaborated to share, harmonize, and curate bioinformatic pipelines,^[28]^[29]^[30]^[31] leading to the creation of the nf-core project.^[32] Led by Phil Ewels, at the Swedish National Genomics Infrastructure at the time,^[33]^[34] nf-core focuses on ensuring reproducibility and portability of pipelines across different hardware, operating systems, and software versions. In July 2020, Nextflow and nf-core received a grant from the Chan Zuckerberg Initiative in recognition of their importance as open-source software.^[35] As of 2024, the nf-core organization hosts 117 Nextflow pipelines for the biosciences and more than 1382 process modules. With more than 1200 developers and scientists involved, it is the largest collaborative effort and community for developing bioinformatic data analysis pipelines.^[36]

By domain and research subject

Nextflow is the preferred tool for processing sequencing data and conducting genomic data analysis by domain and research subject. Over the past five years, numerous pipelines have been published for various applications and analyses in the genomics field.

One notable use case is its role in pathogen surveillance during the COVID-19 pandemic.^[37] Swift and highly automated processing of raw data, variant analysis, and lineage designation were essential for monitoring the emergence of new virus variants and tracing their global spread. Nextflow-enabled pipelines played a crucial role in this effort.^[38]^[39]^[40]^[41]^[42]^[43]^[44]

Nextflow also plays a significant role for the non-profit plasmid repository Addgene, using it to confirm the integrity of all deposited plasmids.^[45]

In addition to genomics, Nextflow is gaining popularity in other domains of biomedical data processing, where complex workflows on large amounts of primary data are required. These domains include Drug screening,^[46] Diffusion magnetic resonance imaging (dMRI) in radiology,^[47] and mass spectrometry data processing,^[48]^[49]^[50] the latter with a particular focus on proteomics^[51]^[52]^[53]^[54]^[55]^[56]^[57]^[58]

Related Research Articles

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, data science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The process of analyzing and interpreting data can sometimes be referred to as computational biology, however this distinction between the two terms is often disputed. To some, the term computational biology refers to building and using models of biological systems.

Genomics is an interdisciplinary field of molecular biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dimensional structural configuration. In contrast to genetics, which refers to the study of individual genes and their roles in inheritance, genomics aims at the collective characterization and quantification of all of an organism's genes, their interrelations and influence on the organism. Genes may direct the production of proteins with the assistance of enzymes and messenger molecules. In turn, proteins make up body structures such as organs and tissues as well as control chemical reactions and carry signals between cells. Genomics also involves the sequencing and analysis of genomes through uses of high throughput DNA sequencing and bioinformatics to assemble and analyze the function and structure of entire genomes. Advances in genomics have triggered a revolution in discovery-based research and systems biology to facilitate understanding of even the most complex biological systems such as the brain.

In bioinformatics, sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of a wide range of analytical methods to understand its features, function, structure, or evolution. It can be performed on the entire genome, transcriptome or proteome of an organism, and can also involve only selected segments or regions, like tandem repeats and transposable elements. Methodologies used include sequence alignment, searches against biological databases, and others.

<span class="mw-page-title-main">Omics</span> Suffix in biology

Omics is the collective characterization and quantification of entire sets of biological molecules and the investigation of how they translate into the structure, function, and dynamics of an organism or group of organisms. The branches of science known informally as omics are various disciplines in biology whose names end in the suffix -omics, such as genomics, proteomics, metabolomics, metagenomics, phenomics and transcriptomics.

<span class="mw-page-title-main">Orange (software)</span> Open-source data analysis software

Orange is an open-source data visualization, machine learning and data mining toolkit. It features a visual programming front-end for exploratory qualitative data analysis and interactive data visualization.

The Generic Model Organism Database (GMOD) project provides biological research communities with a toolkit of open-source software components for visualizing, annotating, managing, and storing biological data. The GMOD project is funded by the United States National Institutes of Health, National Science Foundation and the USDA Agricultural Research Service.

GenePattern is a freely available computational biology open-source software package originally created and developed at the Broad Institute for the analysis of genomic data. Designed to enable researchers to develop, capture, and reproduce genomic analysis methodologies, GenePattern was first released in 2004. GenePattern is currently developed at the University of California, San Diego.

<span class="mw-page-title-main">Galaxy (computational biology)</span>

Galaxy is a scientific workflow, data integration, and data and analysis persistence and publishing platform that aims to make computational biology accessible to research scientists that do not have computer programming or systems administration experience. Although it was initially developed for genomics research, it is largely domain agnostic and is now used as a general bioinformatics workflow management system.

OpenMS is an open-source project for data analysis and processing in mass spectrometry and is released under the 3-clause BSD licence. It supports most common operating systems including Microsoft Windows, MacOS and Linux.

In the fields of molecular biology and genetics, a pan-genome is the entire set of genes from all strains within a clade. More generally, it is the union of all the genomes of a clade. The pan-genome can be broken down into a "core pangenome" that contains genes present in all individuals, a "shell pangenome" that contains genes present in two or more strains, and a "cloud pangenome" that contains genes only found in a single strain. Some authors also refer to the cloud genome as "accessory genome" containing 'dispensable' genes present in a subset of the strains and strain-specific genes. Note that the use of the term 'dispensable' has been questioned, at least in plant genomes, as accessory genes play "an important role in genome evolution and in the complex interplay between the genome and the environment". The field of study of pangenomes is called pangenomics.

LabKey Server is a software suite available for scientists to integrate, analyze, and share biomedical research data. The platform provides a secure data repository that allows web-based querying, reporting, and collaborating across a range of data sources. Specific scientific applications and workflows can be added on top of the basic platform and leverage a data processing pipeline.

Anduril is an open source component-based workflow framework for scientific data analysis developed at the Systems Biology Laboratory, University of Helsinki.

Integromics was a global bioinformatics company headquartered in Granada, Spain and Madrid. The company had subsidiaries in the United States and United Kingdom, and distributors in 10 countries. Integromics specialised in bioinformatics software for data management and data analysis in genomics and proteomics. The company provided a line of products that serve gene expression, sequencing, and proteomics markets. Customers included genomic research centers, pharmaceutical companies, academic institutions, clinical research organizations, and biotechnology companies.

A bioinformatics workflow management system is a specialized form of workflow management system designed specifically to compose and execute a series of computational or data manipulation steps, or a workflow, that relate to bioinformatics.

The BioCompute Object (BCO) project is a community-driven initiative to build a framework for standardizing and sharing computations and analyses generated from High-throughput sequencing. The project has since been standardized as IEEE 2791-2020, and the project files are maintained in an open source repository. The July 22nd, 2020 edition of the Federal Register announced that the FDA now supports the use of BioCompute in regulatory submissions, and the inclusion of the standard in the Data Standards Catalog for the submission of HTS data in NDAs, ANDAs, BLAs, and INDs to CBER, CDER, and CFSAN.

The 'German Network for Bioinformatics Infrastructure – de.NBI' is a national, academic and non-profit infrastructure initiated by the Federal Ministry of Education and Research funding 2015-2021. The network provides bioinformatics services to users in life sciences research and biomedicine in Germany and Europe. The partners organize training events, courses and summer schools on tools, standards and compute services provided by de.NBI to assist researchers to more effectively exploit their data. From 2022, the network will be integrated into Forschungszentrum Jülich.

Debasis Dash is an Indian computational biologist and chief scientist at the Institute of Genomics and Integrative Biology (IGIB). Known for his research on proteomics and Big Data and Artificial Intelligence studies, his studies have been documented by way of a number of articles and ResearchGate, an online repository of scientific articles has listed 120 of them. The Department of Biotechnology of the Government of India awarded him the National Bioscience Award for Career Development, one of the highest Indian science awards, for his contributions to biosciences, in 2014. He was appointed as the director of Institute of Life Sciences, Bhubaneswar on 18 May 2023.

Nvidia Parabricks is a suite of free software for genome analysis developed by Nvidia, designed to deliver high throughput by using graphics processing unit (GPU) acceleration.

References

↑ Strozzi, Francesco; Janssen, Roel; Wurmus, Ricardo; Crusoe, Michael R.; Githinji, George; Di Tommaso, Paolo; Belhachemi, Dominique; Möller, Steffen; Smant, Geert; De Ligt, Joep; Prins, Pjotr (2019). "Scalable Workflows and Reproducible Data Analysis for Genomics". Evolutionary Genomics. Methods in Molecular Biology. Vol. 1910. pp. 723–745. doi:10.1007/978-1-4939-9074-0_24. ISBN 978-1-4939-9073-3. PMC 7613310 . PMID 31278683.
↑ Gao, Mingxuan; Ling, Mingyi; Tang, Xinwei; Wang, Shun; Xiao, Xu; Qiao, Ying; Yang, Wenxian; Yu, Rongshan (2021). "Comparison of high-throughput single-cell RNA sequencing data processing pipelines". Briefings in Bioinformatics. 22 (3). doi:10.1093/bib/bbaa116. PMID 34020539.
1 2 3 Wratten, Laura; Wilm, Andreas; Göke, Jonathan (October 2021). "Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers". Nature Methods. 18 (10): 1161–1168. doi:10.1038/s41592-021-01254-9. PMID 34556866. S2CID 237616424.
↑ Terrón-Camero, Laura C.; Gordillo-González, Fernando; Salas-Espejo, Eduardo; Andrés-León, Eduardo (2022). "Comparison of Metagenomics and Metatranscriptomics Tools: A Guide to Making the Right Choice". Genes. 13 (12): 2280. doi: 10.3390/genes13122280 . PMC 9777648 . PMID 36553546.
↑ Federico, Anthony; Karagiannis, Tanya; Karri, Kritika; Kishore, Dileep; Koga, Yusuke; Campbell, Joshua D.; Monti, Stefano (2019). "Pipeliner: A Nextflow-Based Framework for the Definition of Sequencing Data Processing Pipelines". Frontiers in Genetics. 10: 614. doi: 10.3389/fgene.2019.00614 . PMC 6609566 . PMID 31316552.
↑ Kolpakov, Fedor; Akberdin, Ilya; Kiselev, Ilya; Kolmykov, Semyon; Kondrakhin, Yury; Kulyashov, Mikhail; Kutumova, Elena; Pintus, Sergey; Ryabova, Anna; Sharipov, Ruslan; Yevshin, Ivan; Zhatchenko, Sergey; Kel, Alexander (2022). "BioUML—towards a universal research platform". Nucleic Acids Research. 50 (W1): W124 –W131. doi:10.1093/nar/gkac286. PMC 9252820 . PMID 35536253.
↑ Yukselen, Onur; Turkyilmaz, Osman; Ozturk, Ahmet Rasit; Garber, Manuel; Kucukural, Alper (2020). "Dolphin Next: A distributed data processing platform for high throughput genomics". BMC Genomics. 21 (1): 310. doi: 10.1186/s12864-020-6714-x . PMC 7168977 . PMID 32306927.
↑ Yuen, Denis; Cabansay, Louise; Duncan, Andrew; Luu, Gary; Hogue, Gregory; Overbeck, Charles; Perez, Natalie; Shands, Walt; Steinberg, David; Reid, Chaz; Olunwa, Nneka; Hansen, Richard; Sheets, Elizabeth; o'Farrell, Ash; Cullion, Kim; o'Connor, Brian D; Paten, Benedict; Stein, Lincoln (2021). "The Dockstore: Enhancing a community platform for sharing reproducible and accessible computational protocols". Nucleic Acids Research. 49 (W1): W624 –W632. doi:10.1093/nar/gkab346. PMC 8218198 . PMID 33978761.
↑ Ahmed, Azza E.; Allen, Joshua M.; Bhat, Tajesvi; Burra, Prakruthi; Fliege, Christina E.; Hart, Steven N.; Heldenbrand, Jacob R.; Hudson, Matthew E.; Istanto, Dave Deandre; Kalmbach, Michael T.; Kapraun, Gregory D.; Kendig, Katherine I.; Kendzior, Matthew Charles; Klee, Eric W.; Mattson, Nate; Ross, Christian A.; Sharif, Sami M.; Venkatakrishnan, Ramshankar; Fadlelmola, Faisal M.; Mainzer, Liudmila S. (2021). "Design considerations for workflow management systems use in production genomics research and the clinic". Scientific Reports. 11 (1): 21680. Bibcode:2021NatSR..1121680A. doi:10.1038/s41598-021-99288-8. PMC 8569008 . PMID 34737383.
↑ Baichoo, Shakuntala; Souilmi, Yassine; Panji, Sumir; Botha, Gerrit; Meintjes, Ayton; Hazelhurst, Scott; Bendou, Hocine; Beste, Eugene de; Mpangase, Phelelani T.; Souiai, Oussema; Alghali, Mustafa; Yi, Long; o'Connor, Brian D.; Crusoe, Michael; Armstrong, Don; Aron, Shaun; Joubert, Fourie; Ahmed, Azza E.; Mbiyavanga, Mamana; Heusden, Peter van; Magosi, Lerato E.; Zermeno, Jennie; Mainzer, Liudmila Sergeevna; Fadlelmola, Faisal M.; Jongeneel, C. Victor; Mulder, Nicola (2018). "Developing reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics". BMC Bioinformatics. 19 (1): 457. doi: 10.1186/s12859-018-2446-1 . PMC 6264621 . PMID 30486782.
↑ Jackson, Michael; Kavoussanakis, Kostas; Wallace, Edward W. J. (2021). "Using prototyping to choose a bioinformatics workflow management system". PLOS Computational Biology. 17 (2): e1008622. Bibcode:2021PLSCB..17E8622J. doi: 10.1371/journal.pcbi.1008622 . PMC 7906312 . PMID 33630841.
↑ Tommaso, Paolo Di; Floden, Evan W.; Magis, Cedrik; Palumbo, Emilio; Notredame, Cedric (2017). "Nextflow : Un outil efficace pour l'amélioration de la stabilité numérique des calculs en analyse génomique". Biologie Aujourd'hui. 211 (3): 233–237. doi:10.1051/jbio/2017029. PMID 29412134.
↑ "Nextflow Documentation - Channels". docs.nextflow.io. Retrieved 6 June 2022.
↑ "Nextflow Documentation - Domain Specific Language (DSL) 2". docs.nextflow.io. March 2020. Retrieved 6 June 2022.
↑ "Nextflow Documentation - Pipeline Sharing". docs.nextflow.io. Retrieved 6 June 2022.
↑ "Nextflow Documentation - Executors". docs.nextflow.io. Retrieved 6 June 2022.
↑ "Nextflow Documentation - Amazon Cloud". docs.nextflow.io. Retrieved 6 June 2022.
↑ "Nextflow Documentation - Azure Cloud". docs.nextflow.io. Retrieved 6 June 2022.
↑ "Nextflow Documentation - Google Cloud". docs.nextflow.io. Retrieved 6 June 2022.
1 2 3 4 Di Tomasso, Paolo (14 October 2021). "The story of Nextflow: Building a modern pipeline orchestrator". eLifeSciences.org. Retrieved 6 June 2022.
↑ "Nextflow Documentation - Containers". docs.nextflow.io. Retrieved 7 June 2022.
↑ Di Tommaso, Paolo (13 October 2022). "Nextflow and the future of containers". YouTube. Retrieved 17 November 2022.
↑ "Release Version 0.3.0 · nextflow-io/nextflow". GitHub. Retrieved 31 May 2022.
↑ Di Tomasso, Paolo (24 October 2018). "Goodbye zero, Hello Apache!". Nextflow.io/blog. Retrieved 7 June 2022.
↑ Di Tommaso, Paolo (8 October 2019). "Introducing Nextflow Tower - Seamless monitoring of data analysis workflows from anywhere". Seqera.IO. Retrieved 7 June 2022.
↑ Di Tommaso, Paolo (24 July 2020). "Nextflow DSL 2 is here!". Nextflow.IO/blog. Retrieved 7 June 2022.
↑ Di Tomasso, Paolo; Chatzou, Maria; Floden, Evan; Prieto Barja, Pablo; Palumbo, Emilio; Notredame, Cedric (11 April 2017). "Nextflow enables reproducible computational workflows". Nature Biotechnology. 35 (4): 316–319. doi:10.1038/nbt.3820. PMID 28398311. S2CID 9690740 . Retrieved 7 June 2022.
↑ Fellows Yates, James A.; Lamnidis, Thiseas C.; Borry, Maxime; Andrades Valtueña, Aida; Fagernäs, Zandra; Clayton, Stephen; Garcia, Maxime U.; Neukamm, Judith; Peltzer, Alexander (2021). "Reproducible, portable, and efficient ancient genome reconstruction with nf-core/Eager". PeerJ. 9: e10947. doi: 10.7717/peerj.10947 . PMC 7977378 . PMID 33777521.
↑ Krakau, Sabrina; Straub, Daniel; Gourlé, Hadrien; Gabernet, Gisela; Nahnsen, Sven (2022). "Nf-core/Mag: A best-practice pipeline for metagenome hybrid assembly and binning". Nar Genomics and Bioinformatics. 4: lqac007. doi:10.1093/nargab/lqac007. PMC 8808542 . PMID 35118380.
↑ Garcia, Maxime; Juhos, Szilveszter; Larsson, Malin; Olason, Pall I.; Martin, Marcel; Eisfeldt, Jesper; Dilorenzo, Sebastian; Sandgren, Johanna; Díaz De Ståhl, Teresita; Ewels, Philip; Wirta, Valtteri; Nistér, Monica; Käller, Max; Nystedt, Björn (2020). "Sarek: A portable workflow for whole-genome sequencing analysis of germline and somatic variants". F1000Research. 9: 63. doi: 10.12688/f1000research.16665.2 . PMC 7111497 . PMID 32269765.
↑ Digby, Barry; Finn, Stephen P.; ó Broin, Pilib (2023). "Nf-core/Circrna: A portable workflow for the quantification, miRNA target prediction and differential expression analysis of circular RNAs". BMC Bioinformatics. 24 (1): 27. doi: 10.1186/s12859-022-05125-8 . PMC 9875403 . PMID 36694127.
↑ Ewels, Philip; Peltzer, Alexander; Fillinger, Sven; Alneberg, Johannes; Patel, Harshil; Wilm, Andreas; Garcia, Maxime Ulysse; Di Tommaso, Paolo; Nahnsen, Sven (April 1, 2019). "Nf-core: Community curated bioinformatics pipelines". Research Gate. Retrieved June 30, 2022.
↑ Zapata Garin, Claire-Alix. "nf-core: a community-driven initiative to standardise Nextflow-based pipelines". Lifebit.ai. Retrieved June 30, 2022.
↑ "The nf-core community provides computational pipelines". SciLifeLab. February 14, 2020. Retrieved June 30, 2022.
↑ "Nextflow and nf-core: Reproducible Workflows for the Scientific Community". Chan Zuckerberg Initiative. 27 July 2020. Retrieved 15 June 2022.
↑ "nf-core Github organization". GitHub. Retrieved 12 November 2024.
↑ Floden, Evan (5 November 2021). "Genetic Sequencing Will Enable Us To Win The Global Battle Against COVID-19".
↑ Afolayan, Ayorinde O.; et al. (2021). "Overcoming Data Bottlenecks in Genomic Pathogen Surveillance". Clinical Infectious Diseases. 73 (Suppl_4): S267 –S274. doi:10.1093/cid/ciab785. PMC 8634317 . PMID 34850839.
↑ Tilloy, Valentin; Cuzin, Pierre; Leroi, Laura; Guérin, Emilie; Durand, Patrick; Alain, Sophie (2022). "ASPICov: An automated pipeline for identification of SARS-Cov2 nucleotidic variants". PLOS ONE. 17 (1): e0262953. Bibcode:2022PLoSO..1762953T. doi: 10.1371/journal.pone.0262953 . PMC 8791494 . PMID 35081137.
↑ Petit, Robert A.; Read, Timothy D. (2020). "Bactopia: A Flexible Pipeline for Complete Analysis of Bacterial Genomes". mSystems. 5 (4). doi:10.1128/mSystems.00190-20. PMC 7406220 . PMID 32753501.
↑ Pandolfo, Mattia; Telatin, Andrea; Lazzari, Gioele; Adriaenssens, Evelien M.; Vitulo, Nicola (2022). "Meta Phage: An Automated Pipeline for Analyzing, Annotating, and Classifying Bacteriophages in Metagenomics Sequencing Data". mSystems. 7 (5): e0074122. doi:10.1128/msystems.00741-22. PMC 9599279 . PMID 36069454.
↑ Gauthier, Marie-Emilie A.; Lelwala, Ruvini V.; Elliott, Candace E.; Windell, Craig; Fiorito, Sonia; Dinsdale, Adrian; Whattam, Mark; Pattemore, Julie; Barrero, Roberto A. (2022). "Side-by-Side Comparison of Post-Entry Quarantine and High Throughput Sequencing Methods for Virus and Viroid Diagnosis". Biology. 11 (2): 263. doi: 10.3390/biology11020263 . PMC 8868628 . PMID 35205129.
↑ Brandt, Christian; Krautwurst, Sebastian; Spott, Riccardo; Lohde, Mara; Jundzill, Mateusz; Marquet, Mike; Hölzer, Martin (2021). "Pore Cov-An Easy to Use, Fast, and Robust Workflow for SARS-CoV-2 Genome Reconstruction via Nanopore Sequencing". Frontiers in Genetics. 12: 711437. doi: 10.3389/fgene.2021.711437 . PMC 8355734 . PMID 34394197.
↑ Afiahayati; Bernard, Stefanus; Gunadi; Wibawa, Hendra; Hakim, Mohamad Saifudin; Marcellus; Parikesit, Arli Aditya; Dewa, Chandra Kusuma; Sakakibara, Yasubumi (2022). "A Comparison of Bioinformatics Pipelines for Enrichment Illumina Next Generation Sequencing Systems in Detecting SARS-CoV-2 Virus Strains". Genes. 13 (8): 1330. doi: 10.3390/genes13081330 . PMC 9394340 . PMID 35893066.
↑ Niehaus, Jason (14 July 2022). "Bioinformatics at Addgene". Addgene corporate blog. Retrieved 25 February 2023.
↑ Ssekagiri, Alfred; Jjingo, Daudi; Lujumba, Ibra; Bbosa, Nicholas; Bugembe, Daniel L.; Kateete, David P.; Jordan, I King; Kaleebu, Pontiano; Ssemwanga, Deogratius (2022). "Quasi Flow: A Nextflow pipeline for analysis of NGS-based HIV-1 drug resistance data". Bioinformatics Advances. 2: vbac089. doi:10.1093/bioadv/vbac089. PMC 9722223 . PMID 36699347.
↑ Theaud, Guillaume; Houde, Jean-Christophe; Boré, Arnaud; Rheault, François; Morency, Felix; Descoteaux, Maxime (2020). "Tracto Flow: A robust, efficient and reproducible diffusion MRI pipeline leveraging Nextflow & Singularity". NeuroImage. 218: 116889. doi: 10.1016/j.neuroimage.2020.116889 . PMID 32447016. S2CID 164318811.
↑ Van Maldegem, Febe; Valand, Karishma; Cole, Megan; Patel, Harshil; Angelova, Mihaela; Rana, Sareena; Colliver, Emma; Enfield, Katey; Bah, Nourdine; Kelly, Gavin; Tsang, Victoria Siu Kwan; Mugarza, Edurne; Moore, Christopher; Hobson, Philip; Levi, Dina; Molina-Arcas, Miriam; Swanton, Charles; Downward, Julian (2021). "Characterisation of tumour microenvironment remodelling following oncogene inhibition in preclinical studies with imaging mass cytometry". Nature Communications. 12 (1): 5906. Bibcode:2021NatCo..12.5906V. doi:10.1038/s41467-021-26214-x. PMC 8501076 . PMID 34625563.
↑ Li, Chenxin; Gao, Mingxuan; Yang, Wenxian; Zhong, Chuanqi; Yu, Rongshan (2021). "Diamond: A multi-modal DIA mass spectrometry data processing pipeline". Bioinformatics. 37 (2): 265–267. doi:10.1093/bioinformatics/btaa1093. PMID 33416868.
↑ Luu, Gordon T.; Freitas, Michael A.; Lizama-Chamu, Itzel; McCaughey, Catherine S.; Sanchez, Laura M.; Wang, Mingxun (2022). "TIMSCONVERT: A workflow to convert trapped ion mobility data to open data formats". Bioinformatics. 38 (16): 4046–4047. doi:10.1093/bioinformatics/btac419. PMC 9991885 . PMID 35758608.
↑ Perez-Riverol, Yasset; Moreno, Pablo (2020). "Scalable Data Analysis in Proteomics and Metabolomics Using Bio Containers and Workflows Engines". Proteomics. 20 (9): e1900147. doi:10.1002/pmic.201900147. PMC 7613303 . PMID 31657527.
↑ Vlasova, Anna; Hermoso Pulido, Toni; Camara, Francisco; Ponomarenko, Julia; Guigó, Roderic (2021). "FA-nf: A Functional Annotation Pipeline for Proteins from Non-Model Organisms Implemented in Nextflow". Genes. 12 (10): 1645. doi: 10.3390/genes12101645 . PMC 8535801 . PMID 34681040.
↑ Miller, Rachel M.; Jordan, Ben T.; Mehlferber, Madison M.; Jeffery, Erin D.; Chatzipantsiou, Christina; Kaur, Simi; Millikin, Robert J.; Dai, Yunxiang; Tiberi, Simone; Castaldi, Peter J.; Shortreed, Michael R.; Luckey, Chance John; Conesa, Ana; Smith, Lloyd M.; Deslattes Mays, Anne; Sheynkman, Gloria M. (2022). "Enhanced protein isoform characterization through long-read proteogenomics". Genome Biology. 23 (1): 69. doi: 10.1186/s13059-022-02624-y . PMC 8892804 . PMID 35241129.
↑ Othman, Houcemeddine; Jemimah, Sherlyn; Da Rocha, Jorge Emanuel Batista (2022). "SWAAT Bioinformatics Workflow for Protein Structure-Based Annotation of ADME Gene Variants". Journal of Personalized Medicine. 12 (2): 263. doi: 10.3390/jpm12020263 . PMC 8875676 . PMID 35207751.
↑ Bichmann, Leon; Gupta, Shubham; Rosenberger, George; Kuchenbecker, Leon; Sachsenberg, Timo; Ewels, Phil; Alka, Oliver; Pfeuffer, Julianus; Kohlbacher, Oliver; Röst, Hannes (2021). "DIAproteomics: A Multifunctional Data Analysis Pipeline for Data-Independent Acquisition Proteomics and Peptidomics". Journal of Proteome Research. 20 (7): 3758–3766. doi: 10.1021/acs.jproteome.1c00123 . PMID 34153189. S2CID 235597603.
↑ Walzer, Mathias; García-Seisdedos, David; Prakash, Ananth; Brack, Paul; Crowther, Peter; Graham, Robert L.; George, Nancy; Mohammed, Suhaib; Moreno, Pablo; Papatheodorou, Irene; Hubbard, Simon J.; Vizcaíno, Juan Antonio (2022). "Implementing the reuse of public DIA proteomics datasets: From the PRIDE database to Expression Atlas". Scientific Data. 9 (1): 335. Bibcode:2022NatSD...9..335W. doi:10.1038/s41597-022-01380-9. PMC 9197839 . PMID 35701420.
↑ Hulstaert, Niels; Shofstahl, Jim; Sachsenberg, Timo; Walzer, Mathias; Barsnes, Harald; Martens, Lennart; Perez-Riverol, Yasset (2020). "ThermoRawFile Parser: Modular, Scalable, and Cross-Platform RAW File Conversion". Journal of Proteome Research. 19 (1): 537–542. doi:10.1021/acs.jproteome.9b00328. PMC 7116465 . PMID 31755270.
↑ Li, Kai; Jain, Antrix; Malovannaya, Anna; Wen, Bo; Zhang, Bing (2020). "Deep Rescore: Leveraging Deep Learning to Improve Peptide Identification in Immunopeptidomics". Proteomics. 20 (21–22): e1900334. doi:10.1002/pmic.201900334. PMC 7718998 . PMID 32864883.