Parallel Virtual Machine

Last updated
Parallel Virtual Machine
Original author(s) Oak Ridge National Laboratory
Developer(s) University of Tennessee
Initial release1989
Stable release
3.4.6 / February 2, 2009;14 years ago (2009-02-02) [1]
Written in C
Operating system Windows and Unix
License BSD, GPL
Website https://www.epm.ornl.gov/pvm/pvm_home.html

Parallel Virtual Machine (PVM) is a software tool for parallel networking of computers. It is designed to allow a network of heterogeneous Unix and/or Windows machines to be used as a single distributed parallel processor. Thus large computational problems can be solved more cost effectively by using the aggregate power and memory of many computers. The software is very portable; the source code, available free through netlib, has been compiled on everything from laptops to Crays. [2]

Contents

PVM enables users to exploit their existing computer hardware to solve much larger problems at less additional cost. PVM has been used as an educational tool to teach parallel programming but has also been used to solve important practical problems. [2] It was developed by the University of Tennessee, Oak Ridge National Laboratory and Emory University. The first version was written at ORNL in 1989, and after being rewritten by University of Tennessee, version 2 was released in March 1991. Version 3 was released in March 1993, and supported fault tolerance and better portability.

PVM was a step towards modern trends in distributed processing and grid computing but has, since the mid-1990s, largely been supplanted by the much more successful MPI standard for message passing on parallel machines. PVM is free software, released under both the BSD License and the GNU General Public License.

Design

PVM is a software system that enables a collection of heterogeneous computers to be used as a coherent and flexible concurrent computational resource, or a "parallel virtual machine".

The individual computers may be shared-memory or local-memory multiprocessors, vector supercomputers, specialized graphics engines, or scalar workstations and PCs, that may be interconnected by a variety of networks, such as Ethernet or FDDI.

PVM consists of a run-time environment and library for message passing, task and resource management, and fault notification. While PVM will not automatically make a commercial software package run faster, it does provide a powerful set of functions for manually parallelizing an existing source program, or for writing new parallel/distributed programs.

The PVM software must be specifically installed on every machine that is to be used in a given "virtual machine". There is no "automatic" installation of executables onto remote machines in PVM, although simply copying the pvm3/lib and pvm3/bin directories to another similar machine (and setting $PVM_ROOT and $PVM_ARCH) is sufficient for running PVM programs. Compiling or building PVM programs requires the full PVM installation.

User programs written in C, C++, or Fortran can access PVM through provided library routines.

PVM also supports broadcasting (PVM_bcast) which sends to all processes in a group and multicasting (PVM_mcast) which sends to a specific list of processes.

See also

Related Research Articles

A distributed system is a system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another. Distributed computing is a field of computer science that studies distributed systems.

<span class="mw-page-title-main">Beowulf cluster</span> Type of computing cluster

A Beowulf cluster is a computer cluster of what are normally identical, commodity-grade computers networked into a small local area network with libraries and programs installed which allow processing to be shared among them. The result is a high-performance parallel computing cluster from inexpensive personal computer hardware.

<span class="mw-page-title-main">Parallel computing</span> Programming paradigm in which many processes are executed simultaneously

Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different forms of parallel computing: bit-level, instruction-level, data, and task parallelism. Parallelism has long been employed in high-performance computing, but has gained broader interest due to the physical constraints preventing frequency scaling. As power consumption by computers has become a concern in recent years, parallel computing has become the dominant paradigm in computer architecture, mainly in the form of multi-core processors.

Message Passing Interface (MPI) is a standardized and portable message-passing standard designed to function on parallel computing architectures. The MPI standard defines the syntax and semantics of library routines that are useful to a wide range of users writing portable message-passing programs in C, C++, and Fortran. There are several open-source MPI implementations, which fostered the development of a parallel software industry, and encouraged development of portable and scalable large-scale parallel applications.

<span class="mw-page-title-main">Theoretical computer science</span> Subfield of computer science and mathematics

Theoretical computer science (TCS) is a subset of general computer science and mathematics that focuses on mathematical aspects of computer science such as the theory of computation, formal language theory, the lambda calculus and type theory.

Checkpointing is a technique that provides fault tolerance for computing systems. It basically consists of saving a snapshot of the application's state, so that applications can restart from that point in case of failure. This is particularly important for long running applications that are executed in failure-prone computing systems.

<span class="mw-page-title-main">Jack Dongarra</span> American computer scientist (born 1950)

Jack Joseph Dongarra is an American computer scientist and mathematician. He is the American University Distinguished Professor of Computer Science in the Electrical Engineering and Computer Science Department at the University of Tennessee. He holds the position of a Distinguished Research Staff member in the Computer Science and Mathematics Division at Oak Ridge National Laboratory, Turing Fellowship in the School of Mathematics at the University of Manchester, and is an adjunct professor and teacher in the Computer Science Department at Rice University. He served as a faculty fellow at the Texas A&M University Institute for Advanced Study (2014–2018). Dongarra is the founding director of the Innovative Computing Laboratory at the University of Tennessee. He was the recipient of the Turing Award in 2021.

In computing, single program, multiple data (SPMD) is a term that has been used to refer to computational models for exploiting parallelism where-by multiple processors cooperate in the execution of a program in order to obtain results faster.

In computer science, message passing is a technique for invoking behavior on a computer. The invoking program sends a message to a process and relies on that process and its supporting infrastructure to then select and run some appropriate code. Message passing differs from conventional programming where a process, subroutine, or function is directly invoked by name. Message passing is key to some models of concurrency and object-oriented programming.

The actor model in computer science is a mathematical model of concurrent computation that treats an actor as the basic building block of concurrent computation. In response to a message it receives, an actor can: make local decisions, create more actors, send more messages, and determine how to respond to the next message received. Actors may modify their own private state, but can only affect each other indirectly through messaging.

In computing, a parallel programming model is an abstraction of parallel computer architecture, with which it is convenient to express algorithms and their composition in programs. The value of a programming model can be judged on its generality: how well a range of different problems can be expressed for a variety of different architectures, and its performance: how efficiently the compiled programs can execute. The implementation of a parallel programming model can take the form of a library invoked from a sequential language, as an extension to an existing language, or as an entirely new language.

Concurrent computing is a form of computing in which several computations are executed concurrently—during overlapping time periods—instead of sequentially—with one completing before the next starts.

The Parallel Multithreaded Machine (PM2) is a software for parallel networking of computers.

The ScaLAPACK library includes a subset of LAPACK routines redesigned for distributed memory MIMD parallel computers. It is currently written in a Single-Program-Multiple-Data style using explicit message passing for interprocessor communication. It assumes matrices are laid out in a two-dimensional block cyclic decomposition.

The bulk synchronous parallel (BSP) abstract computer is a bridging model for designing parallel algorithms. It is similar to the parallel random access machine (PRAM) model, but unlike PRAM, BSP does not take communication and synchronization for granted. In fact, quantifying the requisite synchronization and communication is an important part of analyzing a BSP algorithm.

<span class="mw-page-title-main">Computer cluster</span> Set of computers configured in a distributed computing system

A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software.

ISP is a tool for the formal verification of MPI programs developed within the School of Computing at the University of Utah. Like model checkers, such as SPIN, ISP verifies the complete state space of a system for a set of safety properties. However, unlike model checkers, ISP performs code level verification. This means that the tool verifies all relevant interleavings of a concurrent program by replaying the actual program code without building verification models. This idea was pioneered in a number of tools, notably by Godefroid, in his VeriSoft tool. Other recent tools of this genre include the Java Pathfinder, Microsoft's CHESS tool, and MODIST. Relevant interleavings are computed using a customized dynamic partial order reduction algorithm called POE.

<span class="mw-page-title-main">Róbert Lovas</span> Hungarian computer scientist

Róbert Lovas is a Hungarian computer scientist at SZTAKI, Budapest, Hungary.

<span class="mw-page-title-main">Message passing in computer clusters</span> Aspect of computer clusters

Message passing is an inherent element of all computer clusters. All computer clusters, ranging from homemade Beowulfs to some of the fastest supercomputers in the world, rely on message passing to coordinate the activities of the many nodes they encompass. Message passing in computer clusters built with commodity servers and switches is used by virtually every internet service.

References

  1. Release Notes
  2. 1 2 "Parallel Virtual Machine (PVM) Homepage".