Apache Taverna

Last updated

Apache Taverna
Developer(s) Apache Software Foundation (myGrid for 2.x)
Stable release
3.1 / July 1, 2016;7 years ago (2016-07-01)
Repository
Written in Java
Operating system Linux, Mac OS X, Microsoft Windows
Type Scientific workflow system
License Apache License 2.0 (LGPL for 2.x)
Website taverna.incubator.apache.org

Apache Taverna was an open source software tool for designing and executing workflows, initially created by the myGrid project under the name Taverna Workbench, then a project under the Apache incubator. Taverna allowed users to integrate many different software components, including WSDL SOAP or REST Web services, such as those provided by the National Center for Biotechnology Information, the European Bioinformatics Institute, the DNA Databank of Japan (DDBJ), SoapLab, BioMOBY and EMBOSS. The set of available services was not finite and users could import new service descriptions into the Taverna Workbench. [1] [2] [3] [4] [5] [6] [7] [8]

Contents

Taverna Workbench provided a desktop authoring environment and enactment engine for scientific workflows. The Taverna workflow engine was also available separately, as a Java API, command line tool or as a server.

Taverna was used by users in many domains, such as bioinformatics, [9] [10] cheminformatics, [11] medicine, astronomy, [12] social science, music, and digital preservation. [13]

Some of the services for the use in Taverna workflows could be discovered through the BioCatalogue - a public, centralised and curated registry of Life Science Web services. Taverna workflows could also be shared with other people through the myExperiment social web site for scientists. [14] BioCatalogue and myExperiment are another two product from the myGrid consortium.

Taverna was used in over 350 organizations around the world, both academic and commercial. As of 2011, there have been over 80,000 downloads of Taverna across different versions.

On February 20, 2020, Apache Incubator retired the project and removed the code from its website. [15]

Capabilities

Taverna workflows can invoke general SOAP/WSDL or REST Web services, and more specific SADI, BioMart, BioMoby and SoapLab Web services. It can also invoke R statistical services, local Java code, external tools on local and remote machines (via ssh), do XPath and other text manipulation, import a spreadsheet and include sub-workflows.

Taverna Workbench includes the ability to monitor the running of a workflow and to examine the provenance of the data produced, exposing details of the workflow run as a W3C PROV-O RDF provenance graph, [16] within a structured Research Object bundle [17] ZIP file that includes inputs, outputs, intermediate values and the executed workflow definition; together this format is called TavernaProv. [18]

Taverna includes the ability to search for services described in BioCatalogue to invoke from workflows. However, services do not need to be described within BioCatalogue to be included in workflows as they can be added from a WSDL Web Service description or entered as a REST URI pattern.

Taverna also includes the capability to search for workflows on myExperiment. The Taverna Workbench can download, modify and run workflows discovered on myExperiment, and also upload created workflows in order to share them with others using the social aspects of myExperiment.

Taverna workflows do not need to be executed within the Taverna Workbench. Workflows can also be run by:

Taverna allows pipelining and streaming of data. [19] This means that services downstream in the workflow can start as soon as the first data item is received, without waiting for the whole data list to become available from upstream services and iterations. Taverna services execute in parallel when possible, as Taverna workflows are primarily data-driven rather than control-driven. [20]

Taverna Workbench 2.1 splash screen Taverna 2.1 Splashscreen.png
Taverna Workbench 2.1 splash screen

Open source community

Taverna has been an open-source project since 2003, [21] with contributors from multiple academic and industry institutions. In October 2014 Taverna became an independent Apache incubator project, [15] and changed its name to Apache Taverna (incubating). The project is developing Apache Taverna 3.x, [22] which license changed from LGPL 2.1 to Apache License 2.0.

Related Research Articles

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.

The cancer Biomedical Informatics Grid (caBIG) was a US government program to develop an open-source, open access information network called caGrid for secure data exchange on cancer research. The initiative was developed by the National Cancer Institute and was maintained by the Center for Biomedical Informatics and Information Technology (CBIIT) and program managed by Booz Allen Hamilton. In 2011 a report on caBIG raised significant questions about effectiveness and oversight, and its budget and scope were significantly trimmed. In May 2012, the National Cancer Informatics Program (NCIP) was created as caBIG's successor program.

<span class="mw-page-title-main">Chemistry Development Kit</span> Computer software

The Chemistry Development Kit (CDK) is computer software, a library in the programming language Java, for chemoinformatics and bioinformatics. It is available for Windows, Linux, Unix, and macOS. It is free and open-source software distributed under the GNU Lesser General Public License (LGPL) 2.0.

BioMOBY is a registry of web services used in bioinformatics. It allows interoperability between biological data hosts and analytical services by annotating services with terms taken from standard ontologies. BioMOBY is released under the Artistic License.

<span class="mw-page-title-main">Carole Goble</span> British computer scientist

Carole Anne Goble, is a British academic who is Professor of Computer Science at the University of Manchester. She is principal investigator (PI) of the myGrid, BioCatalogue and myExperiment projects and co-leads the Information Management Group (IMG) with Norman Paton.

The myGrid consortium produces and uses a suite of tools design to “help e-Scientists get on with science and get on with scientists”. The tools support the creation of e-laboratories and have been used in domains as diverse as systems biology, social science, music, astronomy, multimedia and chemistry.

<span class="mw-page-title-main">Robert Stevens (scientist)</span>

Robert David Stevens is a professor of bio-health informatics. and former Head of Department of Computer Science at The University of Manchester

<span class="mw-page-title-main">Galaxy (computational biology)</span>

Galaxy is a scientific workflow, data integration, and data and analysis persistence and publishing platform that aims to make computational biology accessible to research scientists that do not have computer programming or systems administration experience. Although it was initially developed for genomics research, it is largely domain agnostic and is now used as a general bioinformatics workflow management system.

myExperiment is a social web site for researchers sharing research objects such as scientific workflows.

taveRNA is a software suite for RNA/DNA secondary structure. It is developed in the laboratories for computational biology of the School of Computing Science at the Simon Fraser University. The suite is composed by alteRNA, for RNA density fold computing, inteRNA, for RNA-RNA interaction prediction, piRNA, for predicting the joint partition function, equilibrium concentration, ensemble energy, and melting temperature for two RNA sequences, pRuNA, a sequence based pruning RNA interaction search engine, and smyRNA, a platform independent C program novel ab initio ncRNA finder.

The BioCatalogue is a curated catalogue of Life Science Web Services. The BioCatalogue was launched in June 2009 at the Intelligent Systems for Molecular Biology Conference. The project is a collaboration between the myGrid project at the University of Manchester led by Carole Goble and the European Bioinformatics Institute led by Rodrigo Lopez. It is funded by the Biotechnology and Biological Sciences Research Council.

Discovery Net is one of the earliest examples of a scientific workflow system allowing users to coordinate the execution of remote services based on Web service and Grid Services standards. The system was designed and implemented at Imperial College London as part of the Discovery Net pilot project funded by the UK e-Science Programme. Many of the concepts pioneered by Discovery Net have been later incorporated into a variety of other scientific workflow systems.

A scientific workflow system is a specialized form of a workflow management system designed specifically to compose and execute a series of computational or data manipulation steps, or workflow, in a scientific application.

<span class="mw-page-title-main">ChEMBL</span> Chemical database of bioactive molecules also having drug-like properties

ChEMBL or ChEMBLdb is a manually curated chemical database of bioactive molecules with drug inducing properties. It is maintained by the European Bioinformatics Institute (EBI), of the European Molecular Biology Laboratory (EMBL), based at the Wellcome Trust Genome Campus, Hinxton, UK.

<span class="mw-page-title-main">BioMart</span>

BioMart is a community-driven project to provide a single point of access to distributed research data. The BioMart project contributes open source software and data services to the international scientific community. Although the BioMart software is primarily used by the biomedical research community, it is designed in such a way that any type of data can be incorporated into the BioMart framework. The BioMart project originated at the European Bioinformatics Institute as a data management solution for the Human Genome Project. Since then, BioMart has grown to become a multi-institute collaboration involving various database projects on five continents.

A bioinformatics workflow management system is a specialized form of workflow management system designed specifically to compose and execute a series of computational or data manipulation steps, or a workflow, that relate to bioinformatics.

<span class="mw-page-title-main">OnlineHPC</span>

The OnlineHPC was a free public web service that supplied tools to deal with high performance computers and online workflow editor. OnlineHPC allowed users to design and execute workflows using the online workflow designer and to work with high performance computers – clusters and clouds. Access to high performance resources was available as directly from the service user interface, as from workflow components. The workflow engine of the OnlineHPC service was Taverna as traditionally used for scientific workflow execution in such domains, as bioinformatics, cheminformatics, medicine, astronomy, social science, music, and digital preservation.

The BioCompute Object (BCO) project is a community-driven initiative to build a framework for standardizing and sharing computations and analyses generated from High-throughput sequencing. The project has since been standardized as IEEE 2791-2020, and the project files are maintained in an open source repository. The July 22nd, 2020 edition of the Federal Register announced that the FDA now supports the use of BioCompute in regulatory submissions, and the inclusion of the standard in the Data Standards Catalog for the submission of HTS data in NDAs, ANDAs, BLAs, and INDs to CBER, CDER, and CFSAN.

Originally started as a collaborative contract between the George Washington University and the Food and Drug Administration, the project has grown to include over 20 universities, biotechnology companies, public-private partnerships and pharmaceutical companies including Seven Bridges and Harvard Medical School. The BCO aims to ease the exchange of HTS workflows between various organizations, such as the FDA, pharmaceutical companies, contract research organizations, bioinformatic platform providers, and academic researchers. Due to the sensitive nature of regulatory filings, few direct references to material can be published. However, the project is currently funded to train FDA Reviewers and administrators to read and interpret BCOs, and currently has 4 publications either submitted or nearly submitted.

<span class="mw-page-title-main">Common Workflow Language</span> Standard for computational data-analysis workflows

The Common Workflow Language (CWL) is a standard for describing computational data-analysis workflows. Development of CWL is focused particularly on serving the data-intensive sciences, such as bioinformatics, medical imaging, astronomy, physics, and chemistry.

References

  1. Belhajjame K, Wolstencroft K, Corcho O, Oinn T, Tanoh F, William A, Goble C (2008). "Metadata Management in the Taverna Workflow System". 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID). pp. 651–656. doi:10.1109/CCGRID.2008.17. ISBN   9780769531564. S2CID   9996653.
  2. Li P, Castrillo JI, Velarde G, Wassink I, Soiland-Reyes S, Owen S, et al. (August 2008). "Performing statistical analyses on quantitative data in Taverna workflows: an example using R and maxdBrowse to identify differentially-expressed genes from microarray data". BMC Bioinformatics. 9: 334. doi: 10.1186/1471-2105-9-334 . PMC   2528018 . PMID   18687127.
  3. Oinn T, Addis M, Ferris J, Marvin D, Senger M, Greenwood M, et al. (November 2004). "Taverna: a tool for the composition and enactment of bioinformatics workflows". Bioinformatics. 20 (17): 3045–54. doi: 10.1093/bioinformatics/bth361 . PMID   15201187.
  4. Oinn T, Greenwood M, Addis M, Alpdemir MN, Ferris J, Glover K, et al. (2006). "Taverna: Lessons in creating a workflow environment for the life sciences" (PDF). Concurrency and Computation: Practice and Experience. 18 (10): 1067–1100. doi:10.1002/cpe.993. S2CID   10219281.
  5. Hull D, Wolstencroft K, Stevens R, Goble C, Pocock MR, Li P, Oinn T (July 2006). "Taverna: a tool for building and running workflows of services". Nucleic Acids Research. 34 (Web Server issue): W729-32. doi:10.1093/nar/gkl320. PMC   1538887 . PMID   16845108. Open Access logo PLoS transparent.svg
  6. Kawas E, Senger M, Wilkinson MD (November 2006). "BioMoby extensions to the Taverna workflow management and enactment software". BMC Bioinformatics. 7: 523. doi: 10.1186/1471-2105-7-523 . PMC   1693925 . PMID   17137515.
  7. Sroka J, Kaczor G, Tyszkiewicz J, Kierzek AM (May 2006). "XQTav: an XQuery processor for Taverna environment". Bioinformatics. 22 (10): 1280–1. doi: 10.1093/bioinformatics/btl101 . PMID   16551662.
  8. Wolstencroft K, Haines R, Fellows D, Williams A, Withers D, Owen S, et al. (July 2013). "The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud". Nucleic Acids Research. 41 (Web Server issue): W557-61. doi:10.1093/nar/gkt328. PMC   3692062 . PMID   23640334.
  9. Stevens RD, Robinson AJ, Goble CA (2003). "myGrid: personalised bioinformatics on the information grid". Bioinformatics. 19 (Suppl 1): i302-4. doi: 10.1093/bioinformatics/btg1041 . PMID   12855473.
  10. Stevens RD, Tipney HJ, Wroe CJ, Oinn TM, Senger M, Lord PW, et al. (August 2004). "Exploring Williams-Beuren syndrome using myGrid". Bioinformatics. 20 (Suppl 1): i303-10. doi: 10.1093/bioinformatics/bth944 . PMID   15262813.
  11. Truszkowski A, Jayaseelan KV, Neumann S, Willighagen EL, Zielesny A, Steinbeck C (December 2011). "New developments on the cheminformatics open workflow environment CDK-Taverna". Journal of Cheminformatics. 3: 54. doi: 10.1186/1758-2946-3-54 . PMC   3292505 . PMID   22166170.
  12. Hook RN, Romaniello M, Ullgrén M, Järveläinen P, Maisala S, Oittinen T, et al. (2008). "ESO Reflex: A Graphical Workflow Engine for Running Recipes". The 2007 ESO Instrument Calibration Workshop. ESO Astrophysics Symposia European Southern Observatory. pp. 169–175. doi:10.1007/978-3-540-76963-7_23. ISBN   978-3-540-76962-0.
  13. Raditsch M, Schlarb S, Møldrup-Dalum P, Medjkoune L (2012). "Web content executable workflows for experimental executio" (PDF).
  14. Goble CA, Bhagat J, Aleksejevs S, Cruickshank D, Michaelides D, Newman D, et al. (July 2010). "myExperiment: a repository and social network for the sharing of bioinformatics workflows". Nucleic Acids Research. 38 (Web Server issue): W677-82. doi:10.1093/nar/gkq429. PMC   2896080 . PMID   20501605.
  15. 1 2 "Taverna Project Incubation Status". Apache Incubator. Apache Software Foundation. Retrieved 28 January 2015.
  16. Belhajjame K, Zhao J, Garijo D, Garrido A, Soiland-Reyes S, Alper P, Corcho O (2013). "A workflow PROV-corpus based on Taverna and Wings". Proceedings of the Joint EDBT/ICDT 2013 Workshops on - EDBT '13. p. 331. doi:10.1145/2457317.2457376. ISBN   9781450315999.
  17. Soiland-Reyes S, Gamble M, Haines R (5 November 2014). "Research Object Bundle 1.0" (Specification). researchobject.org. doi:10.5281/zenodo.12586 . Retrieved 28 January 2015.
  18. Soiland-Reyes, Stian; Alper, Pinar; Goble, Carole (11 May 2016). "Tracking workflow execution with TavernaProv". zenodo.org. doi: 10.5281/zenodo.51314 .
  19. "Implicit iteration". Taverna 2.5 User Manual. myGrid. 9 September 2014. Retrieved 28 January 2015.
  20. Soiland-Reyes S (13 December 2010). "Parallel service invocations". The Taverna Knowledge Blog. knowledgeblog.org. Retrieved 28 January 2015.
  21. Soiland-Reyes S, Sufi S, Seaborne S (23 September 2014). "Taverna Proposal". Incubator Wiki. Apache Software Foundation. Retrieved 28 January 2015.
  22. "Download Apache Taverna". Apache Software Foundation. Retrieved 28 January 2015.