![]() | |
![]() Taverna Workbench | |
Developer(s) | Apache Software Foundation (myGrid for 2.x) |
---|---|
Stable release | 3.1 / July 1, 2016 |
Repository | |
Written in | Java |
Operating system | Linux, Mac OS X, Microsoft Windows |
Type | Scientific workflow system |
License | Apache License 2.0 (LGPL for 2.x) |
Website | taverna |
Apache Taverna was an open source software tool for designing and executing workflows, initially created by the myGrid project under the name Taverna Workbench, then a project under the Apache incubator. Taverna allowed users to integrate many different software components, including WSDL SOAP or REST Web services, such as those provided by the National Center for Biotechnology Information, the European Bioinformatics Institute, the DNA Databank of Japan (DDBJ), SoapLab, BioMOBY and EMBOSS. The set of available services was not finite and users could import new service descriptions into the Taverna Workbench. [1] [2] [3] [4] [5] [6] [7] [8]
Taverna Workbench provided a desktop authoring environment and enactment engine for scientific workflows. The Taverna workflow engine was also available separately, as a Java API, command line tool or as a server.
Taverna was used by users in many domains, such as bioinformatics, [9] [10] cheminformatics, [11] medicine, astronomy, [12] social science, music, and digital preservation. [13]
Some of the services for the use in Taverna workflows could be discovered through the BioCatalogue - a public, centralised and curated registry of Life Science Web services. Taverna workflows could also be shared with other people through the myExperiment social web site for scientists. [14] BioCatalogue and myExperiment are another two product from the myGrid consortium.
Taverna was used in over 350 organizations around the world, both academic and commercial. As of 2011, there have been over 80,000 downloads of Taverna across different versions.
On February 20, 2020, Apache Incubator retired the project and removed the code from its website. [15]
Taverna workflows can invoke general SOAP/WSDL or REST Web services, and more specific SADI, BioMart, BioMoby and SoapLab Web services. It can also invoke R statistical services, local Java code, external tools on local and remote machines (via ssh), do XPath and other text manipulation, import a spreadsheet and include sub-workflows.
Taverna Workbench includes the ability to monitor the running of a workflow and to examine the provenance of the data produced, exposing details of the workflow run as a W3C PROV-O RDF provenance graph, [16] within a structured Research Object bundle [17] ZIP file that includes inputs, outputs, intermediate values and the executed workflow definition; together this format is called TavernaProv. [18]
Taverna includes the ability to search for services described in BioCatalogue to invoke from workflows. However, services do not need to be described within BioCatalogue to be included in workflows as they can be added from a WSDL Web Service description or entered as a REST URI pattern.
Taverna also includes the capability to search for workflows on myExperiment. The Taverna Workbench can download, modify and run workflows discovered on myExperiment, and also upload created workflows in order to share them with others using the social aspects of myExperiment.
Taverna workflows do not need to be executed within the Taverna Workbench. Workflows can also be run by:
Taverna allows pipelining and streaming of data. [19] This means that services downstream in the workflow can start as soon as the first data item is received, without waiting for the whole data list to become available from upstream services and iterations. Taverna services execute in parallel when possible, as Taverna workflows are primarily data-driven rather than control-driven. [20]
Taverna has been an open-source project since 2003, [21] with contributors from multiple academic and industry institutions. In October 2014 Taverna became an independent Apache incubator project, [15] and changed its name to Apache Taverna (incubating). The project is developing Apache Taverna 3.x, [22] which license changed from LGPL 2.1 to Apache License 2.0.
Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.
The cancer Biomedical Informatics Grid (caBIG) was a US government program to develop an open-source, open access information network called caGrid for secure data exchange on cancer research. The initiative was developed by the National Cancer Institute and was maintained by the Center for Biomedical Informatics and Information Technology (CBIIT) and program managed by Booz Allen Hamilton. In 2011 a report on caBIG raised significant questions about effectiveness and oversight, and its budget and scope were significantly trimmed. In May 2012, the National Cancer Informatics Program (NCIP) was created as caBIG's successor program.
The Chemistry Development Kit (CDK) is computer software, a library in the programming language Java, for chemoinformatics and bioinformatics. It is available for Windows, Linux, Unix, and macOS. It is free and open-source software distributed under the GNU Lesser General Public License (LGPL) 2.0.
BioMOBY is a registry of web services used in bioinformatics. It allows interoperability between biological data hosts and analytical services by annotating services with terms taken from standard ontologies. BioMOBY is released under the Artistic License.
Carole Anne Goble, is a British academic who is Professor of Computer Science at the University of Manchester. She is principal investigator (PI) of the myGrid, BioCatalogue and myExperiment projects and co-leads the Information Management Group (IMG) with Norman Paton.
The myGrid consortium produces and uses a suite of tools design to “help e-Scientists get on with science and get on with scientists”. The tools support the creation of e-laboratories and have been used in domains as diverse as systems biology, social science, music, astronomy, multimedia and chemistry.
Robert David Stevens is a professor of bio-health informatics. and former Head of Department of Computer Science at The University of Manchester
Galaxy is a scientific workflow, data integration, and data and analysis persistence and publishing platform that aims to make computational biology accessible to research scientists that do not have computer programming or systems administration experience. Although it was initially developed for genomics research, it is largely domain agnostic and is now used as a general bioinformatics workflow management system.
myExperiment is a social web site for researchers sharing research objects such as scientific workflows.
taveRNA is a software suite for RNA/DNA secondary structure. It is developed in the laboratories for computational biology of the School of Computing Science at the Simon Fraser University. The suite is composed by alteRNA, for RNA density fold computing, inteRNA, for RNA-RNA interaction prediction, piRNA, for predicting the joint partition function, equilibrium concentration, ensemble energy, and melting temperature for two RNA sequences, pRuNA, a sequence based pruning RNA interaction search engine, and smyRNA, a platform independent C program novel ab initio ncRNA finder.
The BioCatalogue is a curated catalogue of Life Science Web Services. The BioCatalogue was launched in June 2009 at the Intelligent Systems for Molecular Biology Conference. The project is a collaboration between the myGrid project at the University of Manchester led by Carole Goble and the European Bioinformatics Institute led by Rodrigo Lopez. It is funded by the Biotechnology and Biological Sciences Research Council.
Discovery Net is one of the earliest examples of a scientific workflow system allowing users to coordinate the execution of remote services based on Web service and Grid Services standards. The system was designed and implemented at Imperial College London as part of the Discovery Net pilot project funded by the UK e-Science Programme. Many of the concepts pioneered by Discovery Net have been later incorporated into a variety of other scientific workflow systems.
A scientific workflow system is a specialized form of a workflow management system designed specifically to compose and execute a series of computational or data manipulation steps, or workflow, in a scientific application.
ChEMBL or ChEMBLdb is a manually curated chemical database of bioactive molecules with drug inducing properties. It is maintained by the European Bioinformatics Institute (EBI), of the European Molecular Biology Laboratory (EMBL), based at the Wellcome Trust Genome Campus, Hinxton, UK.
BioMart is a community-driven project to provide a single point of access to distributed research data. The BioMart project contributes open source software and data services to the international scientific community. Although the BioMart software is primarily used by the biomedical research community, it is designed in such a way that any type of data can be incorporated into the BioMart framework. The BioMart project originated at the European Bioinformatics Institute as a data management solution for the Human Genome Project. Since then, BioMart has grown to become a multi-institute collaboration involving various database projects on five continents.
A bioinformatics workflow management system is a specialized form of workflow management system designed specifically to compose and execute a series of computational or data manipulation steps, or a workflow, that relate to bioinformatics.
The OnlineHPC was a free public web service that supplied tools to deal with high performance computers and online workflow editor. OnlineHPC allowed users to design and execute workflows using the online workflow designer and to work with high performance computers – clusters and clouds. Access to high performance resources was available as directly from the service user interface, as from workflow components. The workflow engine of the OnlineHPC service was Taverna as traditionally used for scientific workflow execution in such domains, as bioinformatics, cheminformatics, medicine, astronomy, social science, music, and digital preservation.
The BioCompute Object (BCO) project is a community-driven initiative to build a framework for standardizing and sharing computations and analyses generated from High-throughput sequencing. The project has since been standardized as IEEE 2791-2020, and the project files are maintained in an open source repository. The July 22nd, 2020 edition of the Federal Register announced that the FDA now supports the use of BioCompute in regulatory submissions, and the inclusion of the standard in the Data Standards Catalog for the submission of HTS data in NDAs, ANDAs, BLAs, and INDs to CBER, CDER, and CFSAN.
Originally started as a collaborative contract between the George Washington University and the Food and Drug Administration, the project has grown to include over 20 universities, biotechnology companies, public-private partnerships and pharmaceutical companies including Seven Bridges and Harvard Medical School. The BCO aims to ease the exchange of HTS workflows between various organizations, such as the FDA, pharmaceutical companies, contract research organizations, bioinformatic platform providers, and academic researchers. Due to the sensitive nature of regulatory filings, few direct references to material can be published. However, the project is currently funded to train FDA Reviewers and administrators to read and interpret BCOs, and currently has 4 publications either submitted or nearly submitted.
The Common Workflow Language (CWL) is a standard for describing computational data-analysis workflows. Development of CWL is focused particularly on serving the data-intensive sciences, such as bioinformatics, medical imaging, astronomy, physics, and chemistry.