This article has multiple issues. Please help improve it or discuss these issues on the talk page . (Learn how and when to remove these template messages)
|
The Common Workflow Language standards | |
Abbreviation | CWL |
---|---|
Status | Published |
Year started | 10 July 2014 |
Latest version | 1.2 7 August 2020 |
Related standards | BioCompute Object |
License | Apache 2.0 |
Website | commonwl |
The Common Workflow Language (CWL) is a standard for describing computational data-analysis workflows. [1] Development of CWL is focused particularly on serving the data-intensive sciences, such as bioinformatics, [2] medical imaging, astronomy, physics, and chemistry.
A key goal of the CWL is to allow the creation of a workflow that is portable and thus may be run reproducibly in different computational environments. [3]
The CWL originated from discussions in 2014 between Peter Amstutz, John Chilton, Nebojša Tijanić, and Michael R. Crusoe (at that time their respective affiliations were: Galaxy, Arvados, Seven Bridges, and Michigan State University) at the Open Bioinformatics Foundation BOSC 2014 codefest.
CWL is supported by multiple analysis runners and platforms [4] such as Apache Airflow (via CWL-Airflow [5] ), Arvados, Rabix, [6] Cromwell workflow engine, Toil, REANA - Reusable Analyses and CWLEXEC for IBM Spectrum LSF, and was identified in 2017 as one of the future trends for bioinformatics pipeline development. [2] Several additional analysis environments are currently implementing support for CWL including Pegasus [7] and Galaxy. [8]
The CWL Project [9] is a multi-stakeholder working group consisting of both organizations and individuals. A member project of Software Freedom Conservancy, it publishes the CWL standards freely available via its GitHub repository under a permissive Apache License 2.0.
Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.
Orange is an open-source data visualization, machine learning and data mining toolkit. It features a visual programming front-end for explorative qualitative data analysis and interactive data visualization.
CellProfiler is free, open-source software designed to enable biologists without training in computer vision or programming to quantitatively measure phenotypes from thousands of images automatically. Advanced algorithms for image analysis are available as individual modules that can be placed in sequential order together to form a pipeline; the pipeline is then used to identify and measure biological objects and features in images, particularly those obtained through fluorescence microscopy.
The Chemistry Development Kit (CDK) is computer software, a library in the programming language Java, for chemoinformatics and bioinformatics. It is available for Windows, Linux, Unix, and macOS. It is free and open-source software distributed under the GNU Lesser General Public License (LGPL) 2.0.
The Open Biological and Biomedical Ontologies (OBO) Foundry is a group of people dedicated to build and maintain ontologies related to the life sciences. The OBO Foundry establishes a set of principles for ontology development for creating a suite of interoperable reference ontologies in the biomedical domain. Currently, there are more than a hundred ontologies that follow the OBO Foundry principles.
GenePattern is a freely available computational biology open-source software package originally created and developed at the Broad Institute for the analysis of genomic data. Designed to enable researchers to develop, capture, and reproduce genomic analysis methodologies, GenePattern was first released in 2004. GenePattern is currently developed at the University of California, San Diego.
Apache Taverna was an open source software tool for designing and executing workflows, initially created by the myGrid project under the name Taverna Workbench, then a project under the Apache incubator. Taverna allowed users to integrate many different software components, including WSDL SOAP or REST Web services, such as those provided by the National Center for Biotechnology Information, the European Bioinformatics Institute, the DNA Databank of Japan (DDBJ), SoapLab, BioMOBY and EMBOSS. The set of available services was not finite and users could import new service descriptions into the Taverna Workbench.
Galaxy is a scientific workflow, data integration, and data and analysis persistence and publishing platform that aims to make computational biology accessible to research scientists that do not have computer programming or systems administration experience. Although it was initially developed for genomics research, it is largely domain agnostic and is now used as a general bioinformatics workflow management system.
Fiji is an open source image processing package based on ImageJ2.
OpenMS is an open-source project for data analysis and processing in mass spectrometry and is released under the 3-clause BSD licence. It supports most common operating systems including Microsoft Windows, MacOS and Linux.
Avogadro is a molecule editor and visualizer designed for cross-platform use in computational chemistry, molecular modeling, bioinformatics, materials science, and related areas. It is extensible via a plugin architecture.
A scientific workflow system is a specialized form of a workflow management system designed specifically to compose and execute a series of computational or data manipulation steps, or workflow, in a scientific application.
A bioinformatics workflow management system is a specialized form of workflow management system designed specifically to compose and execute a series of computational or data manipulation steps, or a workflow, that relate to bioinformatics.
Figshare is an online open access repository where researchers can preserve and share their research outputs, including figures, datasets, images, and videos. It is free to upload content and free to access, in adherence to the principle of open data. Figshare is one of a number of portfolio businesses supported by Digital Science, a subsidiary of Springer Nature.
PrecisionFDA is a secure, collaborative, high-performance computing platform that has established a growing community of experts around the analysis of biological datasets in order to advance precision medicine, inform regulatory science, and enable improvements in health outcomes. This cloud-based platform is developed and served by the United States Food and Drug Administration (FDA). PrecisionFDA connects experts, citizen scientists, and scholars from around the world and provides them with a library of computational tools, workflow features, and reference data. The platform allows researchers to upload and compare data against reference genomes, and execute bioinformatic pipelines. The variant call file (VCF) comparator tool also enables users to compare their genetic test results to reference genomes. The platform's code is open source and available on GitHub. The platform also features a crowdsourcing model to sponsor community challenges in order to stimulate the development of innovative analytics that inform precision medicine and regulatory science. Community members from around the world come together to participate in scientific challenges, solving problems that demonstrate the effectiveness of their tools, testing the capabilities of the platform, sharing their results, and engaging the community in discussions. Globally, precisionFDA has more than 5,000 users.
spaCy is an open-source software library for advanced natural language processing, written in the programming languages Python and Cython. The library is published under the MIT license and its main developers are Matthew Honnibal and Ines Montani, the founders of the software company Explosion.
The BioCompute Object (BCO) project is a community-driven initiative to build a framework for standardizing and sharing computations and analyses generated from High-throughput sequencing. The project has since been standardized as IEEE 2791-2020, and the project files are maintained in an open source repository. The July 22nd, 2020 edition of the Federal Register announced that the FDA now supports the use of BioCompute in regulatory submissions, and the inclusion of the standard in the Data Standards Catalog for the submission of HTS data in NDAs, ANDAs, BLAs, and INDs to CBER, CDER, and CFSAN.
Originally started as a collaborative contract between the George Washington University and the Food and Drug Administration, the project has grown to include over 20 universities, biotechnology companies, public-private partnerships and pharmaceutical companies including Seven Bridges and Harvard Medical School. The BCO aims to ease the exchange of HTS workflows between various organizations, such as the FDA, pharmaceutical companies, contract research organizations, bioinformatic platform providers, and academic researchers. Due to the sensitive nature of regulatory filings, few direct references to material can be published. However, the project is currently funded to train FDA Reviewers and administrators to read and interpret BCOs, and currently has 4 publications either submitted or nearly submitted.
Jennifer "Jenny" Bryan is a data scientist and an associate professor of statistics at the University of British Columbia where she developed the Master in Data Science Program. She is a statistician and software engineer at RStudio from Vancouver, Canada and is known for creating open source tools which connect R to Google Sheets and Google Drive.
Nextflow is a scientific workflow system predominantly used for bioinformatic data analyses. It imposes standards on how to programmatically author a sequence of dependent compute steps and enables their execution on various local and cloud resources. Nextflow was conceived at the Centre for Genomic Regulation in Barcelona, Spain, but has since found world-wide adoption in biomedical and genomics research facilities and laboratories.
{{cite journal}}
: CS1 maint: multiple names: authors list (link)