VisTrails

Last updated
VisTrails
Developer(s) University of Utah, NYU-Poly
Final release
2.2.4 / May 3, 2016;8 years ago (2016-05-03)
Repository https://github.com/VisTrails/VisTrails
Written in Python
Operating system Cross-platform
Type Scientific workflow management; Scientific visualization
License BSD License 3-clause [1]
Website www.vistrails.org

VisTrails is a scientific workflow management system developed at the Scientific Computing and Imaging Institute at the University of Utah that provides support for data exploration and visualization. It is written in Python and employs Qt via PyQt bindings. The system is open source, released under the GPL v2 license. The pre-compiled versions for Windows, Mac OS X, and Linux come with an installer and several packages, including VTK, matplotlib, and ImageMagick. VisTrails also supports user-defined packages.

Contents

Overview

VisTrails is a new system that provides provenance management support for exploratory computational tasks. It combines features of workflow and visualization systems. Similar to workflow systems, it allows the combination of loosely coupled resources, specialized libraries, and grid and Web services. Similar to some visualization systems, it provides a mechanism for parameter exploration and comparison of different results. But unlike these other systems, VisTrails was designed to manage exploratory processes in which computational tasks evolve over time as a user iteratively formulates and tests hypotheses. A key distinguishing feature of VisTrails is its comprehensive provenance infrastructure that maintains detailed history information about the steps followed in the course of an exploratory task. VisTrails leverages this information to provide novel operations and user interfaces that streamline this process.

VisTrails has been developed for exploratory visualization, [2] but the system is general, and provides functionality in the following areas:

History

VisTrails is the result of a collaborative effort between computer scientists Cláudio Silva and Juliana Freire. Initial development began in 2004 by graduate students at the University of Utah. Although the first prototypes were implemented in C++, the current version of VisTrails is written in Python. The first public release was in September 2007.

Functionality

A common use for VisTrails is scientific visualization. Visualizations generated as part of a workflow are rendered in a spreadsheet-style interface, allowing multiple visualizations from different versions of a workflow to be viewed and compared simultaneously. The VisTrails spreadsheet currently supports VTK and HTML rendering.

VisTrails supports four basic modes, or views. Each view interacts with the underlying workflow in a different way.

Commercial variants

In 2007, the University of Utah formed VisTrails, Inc., a spinoff company intended to commercialize VisTrails technology. Development for the free version of VisTrails is currently funded by the University of Utah and VisTrails, Inc. The company's first product is a plugin for the 3D modeling software Maya. [8] While the main VisTrails distribution is free software, the VisTrails plugin for Maya is distributed under a closed-source/proprietary license.

Version release dates history

See also

Related Research Articles

<span class="mw-page-title-main">VTK</span> Free software system for 3D computer graphics, image processing and visualization

The Visualization Toolkit (VTK) is a free software system for 3D computer graphics, image processing and scientific visualization.

GenePattern is a freely available computational biology open-source software package originally created and developed at the Broad Institute for the analysis of genomic data. Designed to enable researchers to develop, capture, and reproduce genomic analysis methodologies, GenePattern was first released in 2004. GenePattern is currently developed at the University of California, San Diego.

<span class="mw-page-title-main">ParaView</span> Scientific visualization software

ParaView is an open-source multiple-platform application for interactive, scientific visualization. It has a client–server architecture to facilitate remote visualization of datasets, and generates level of detail (LOD) models to maintain interactive frame rates for large datasets. It is an application built on top of the Visualization Toolkit (VTK) libraries. ParaView is an application designed for data parallelism on shared-memory or distributed-memory multicomputers and clusters. It can also be run as a single-computer application.

<span class="mw-page-title-main">VisIt</span>

VisIt is an open-source, interactive parallel visualization, and graphical analysis tool designed for viewing scientific data. It can visualize scalar and vector fields on 2D and 3D structured and unstructured meshes.

Kepler is a free software system for designing, executing, reusing, evolving, archiving, and sharing scientific workflows. Kepler's facilities provide process and data monitoring, provenance information, and high-speed data movement. Workflows in general, and scientific workflows in particular, are directed graphs where the nodes represent discrete computational components, and the edges represent paths along which data and results can flow between components. In Kepler, the nodes are called 'Actors' and the edges are called 'channels'. Kepler includes a graphical user interface for composing workflows in a desktop environment, a runtime engine for executing workflows within the GUI and independently from a command-line, and a distributed computing option that allows workflow tasks to be distributed among compute nodes in a computer cluster or computing grid. The Kepler system principally targets the use of a workflow metaphor for organizing computational tasks that are directed towards particular scientific analysis and modeling goals. Thus, Kepler scientific workflows generally model the flow of data from one step to another in a series of computations that achieve some scientific goal.

<span class="mw-page-title-main">UGENE</span> Computer software for bioinformatics

UGENE is computer software for bioinformatics. It works on personal computer operating systems such as Windows, macOS, or Linux. It is released as free and open-source software, under a GNU General Public License (GPL) version 2.

<span class="mw-page-title-main">LONI Pipeline</span> Scientific workflow software

The LONI Pipeline is a free distributed system for designing, executing, monitoring and sharing scientific workflows on grid computing architectures. Pipeline allows users to connect and run any number of different software tools, and conveniently visualize and download the results.

<span class="mw-page-title-main">Scientific Computing and Imaging Institute</span> Research institute at the University of Utah

The Scientific Computing and Imaging (SCI) Institute is a permanent research institute at the University of Utah that focuses on the development of new scientific computing and visualization techniques, tools, and systems with primary applications to biomedical engineering. The SCI Institute is noted worldwide in the visualization community for contributions by faculty, alumni, and staff. Faculty are associated primarily with the School of Computing, Department of Bioengineering, Department of Mathematics, and Department of Electrical and Computer Engineering, with auxiliary faculty in the Medical School and School of Architecture.

KNIME, the Konstanz Information Miner, is a free and open-source data analytics, reporting and integration platform. KNIME integrates various components for machine learning and data mining through its modular data pipelining "Building Blocks of Analytics" concept. A graphical user interface and use of JDBC allows assembly of nodes blending different data sources, including preprocessing, for modeling, data analysis and visualization without, or with minimal, programming.

A scientific workflow system is a specialized form of a workflow management system designed specifically to compose and execute a series of computational or data manipulation steps, or workflow, in a scientific application.

<span class="mw-page-title-main">GIMIAS</span>

GIMIAS is a workflow-oriented environment focused on biomedical image computing and simulation. The open-source framework is extensible through plug-ins and is focused on building research and clinical software prototypes. Gimias has been used to develop clinical prototypes in the fields of cardiac imaging and simulation, angiography imaging and simulation, and neurology

Pipeline Pilot is a desktop software application developed by Dassault Systèmes. Initially focused on extract, transform, and load (ETL) processes and data analytics, the software has evolved to offer broader capabilities in various scientific and industrial applications.

A bioinformatics workflow management system is a specialized form of workflow management system designed specifically to compose and execute a series of computational or data manipulation steps, or a workflow, that relate to bioinformatics.

Interactive Visual Analysis (IVA) is a set of techniques for combining the computational power of computers with the perceptive and cognitive capabilities of humans, in order to extract knowledge from large and complex datasets. The techniques rely heavily on user interaction and the human visual system, and exist in the intersection between visual analytics and big data. It is a branch of data visualization. IVA is a suitable technique for analyzing high-dimensional data that has a large number of data points, where simple graphing and non-interactive techniques give an insufficient understanding of the information.

Data lineage refers to the process of tracking how data is generated, transformed, transmitted and used across a system over time. It documents data's origins, transformations and movements, providing detailed visibility into its life cycle. This process simplifies the identification of errors in data analytics workflows, by enabling users to trace issues back to their root causes.

Juliana Freire de Lima e Silva is a Brazilian computer scientist who works as a professor of computer science and engineering at the New York University. She is known for her research in information visualization, data provenance, and computerized assistance for scientific reproducibility.

BisQue is a free, open source web-based platform for the exchange and exploration of large, complex datasets. It is being developed at the Vision Research Lab at the University of California, Santa Barbara. BisQue specifically supports large scale, multi-dimensional multimodal-images and image analysis. Metadata is stored as arbitrarily nested and linked tag/value pairs, allowing for domain-specific data organization. Image analysis modules can be added to perform complex analysis tasks on compute clusters. Analysis results are stored within the database for further querying and processing. The data and analysis provenance is maintained for reproducibility of results. BisQue can be easily deployed in cloud computing environments or on computer clusters for scalability. BisQue has been integrated into the NSF Cyberinfrastructure project CyVerse. The user interacts with BisQue via any modern web browser.

Claudio Silva is a Brazilian American computer scientist and data scientist. He is a professor of computer science and engineering at the New York University Tandon School of Engineering, the head of disciplines at the NYU Center for Urban Science and Progress (CUSP) and affiliate faculty member at NYU's Courant Institute of Mathematical Sciences. He co-developed the open-source data-exploration system VisTrails with his wife Juliana Freire and many other collaborators. He is a former chair of the executive committee for the IEEE Computer Society Technical Committee on Visualization and Graphics.

Nirvana was virtual object storage software developed and maintained by General Atomics.

The BioCompute Object (BCO) project is a community-driven initiative to build a framework for standardizing and sharing computations and analyses generated from High-throughput sequencing. The project has since been standardized as IEEE 2791-2020, and the project files are maintained in an open source repository. The July 22nd, 2020 edition of the Federal Register announced that the FDA now supports the use of BioCompute in regulatory submissions, and the inclusion of the standard in the Data Standards Catalog for the submission of HTS data in NDAs, ANDAs, BLAs, and INDs to CBER, CDER, and CFSAN.

References

  1. "LICENSE file in code repository". github.com.
  2. Cláudio T. Silva, Juliana Freire, and Steven Callahan. "Provenance for Visualizations: Reproducibility and Beyond" (PDF). Computing in Science & Engineering, 9(5), pp. 82-90, 2007.{{cite web}}: CS1 maint: multiple names: authors list (link)
  3. Juliana Freire, David Koop, Emanuele Santos, and Cláudio T. Silva. "Provenance for Computational Tasks: A Survey" (PDF). Computing in Science & Engineering, 10(3), pp. 11-21, 2008.{{cite web}}: CS1 maint: multiple names: authors list (link)
  4. Carlos E. Scheidegger, David Koop, Emanuele Santos, Huy T. Vo, Steven P. Callahan, Juliana Freire, and Cláudio T. Silva. "Tackling the Provenance Challenge one layer at a time" (PDF). Concurrency and Computation: Practice and Experience, 20(5), pp. 473-483, 2008.{{cite web}}: CS1 maint: multiple names: authors list (link)
  5. Carlos E. Scheidegger, Huy T. Vo, David Koop, Juliana Freire and Cláudio T. Silva. "Querying and Creating Visualizations by Analogy" (PDF). IEEE Transactions on Visualization and Computer Graphics, 13(6), pp. 1560-1567, 2007.{{cite web}}: CS1 maint: multiple names: authors list (link)
  6. Tommy Ellkvist, David Koop, Erik Anderson, Juliana Freire, and Cláudio T. Silva. "Using Provenance to Support Real-Time Collaborative Design of Workflows" (PDF). Proceedings of International Provenance and Annotation Workshop (IPAW), 2008.{{cite web}}: CS1 maint: multiple names: authors list (link)
  7. Louis Bavoil, Steven P. Callahan, Patricia J. Crossno, Juliana Freire, Carlos E. Scheidegger, Cláudio T. Silva, and Huy T. Vo. "VisTrails: Enabling Interactive Multiple-View Visualizations" (PDF). Proceedings of IEEE Visualization, pp. 135-142, 2005.{{cite web}}: CS1 maint: multiple names: authors list (link)
  8. "Announcement on VisTrails, Inc. website". www.vistrails.com.