Python SCOOP (software)

Last updated
Python SCOOP
Original author(s) Marc Parizeau and Yannick Hold
Developer(s) Yannick Hold and Olivier Gagnon
Stable release
0.7.1 / March 17, 2014;10 years ago (2014-03-17)
Repository
Written in Python
Operating system POSIX-compliant
Platform Cross-platform
Type Distributed computing framework
License LGPL
Website www.pyscoop.org

SCOOP (Scalable Concurrent Operations in Python) is a Python software module for distributing concurrent tasks on various environments, from heterogeneous grids of workstations to supercomputers.

It uses ØMQ and the Greenlet package as building blocks to encapsulate and distribute tasks (named a Future) between processes and/or systems. Its interface is inspired from the PEP-3148 proposal.

SCOOP is targeted towards scientific applications that require execution of many loosely coupled tasks using all available hardware resources. These resources need to be accessible through SSH.

History

SCOOP was initiated by Yannick Hold and Marc Parizeau at the Computer Vision and Systems Laboratory of Université Laval. It is an iterative step over the now deprecated DTM module of the DEAP framework for developing evolutionary algorithm. While DTM used MPI for its communications, SCOOP uses instead ØMQ.

Network topology

SCOOP uses the Broker Architecture [1] to distribute its Futures. It is based on a central element, called the Broker, that dispatches work to its workers. The main difference between this pattern and a Master/slave topology reside in the Future origin. In the Broker architecture, the Futures emanate from a worker, which is located on the periphery of the topology, instead of the master in the Master/slave architecture. This allows higher reliability in regard to worker faults and generally better performances due to the generic function of the Broker. Since he doesn't need to serialize nor deserialize any Future to route them around the grid, his workload consists of networking or interprocess I/O and almost no CPU processing time. This lowers the bottleneck of the Broker topology.

The Broker architecture won't stress the networking fabric and elements as much as a totally distributed topology since there is only one connection required on every worker.

Example

An introductory parallel "Hello, world!" example is implemented this way:

fromscoopimportfuturesdefhello_world(value)->str:return"Hello World from Future #{}".format(value)if__name__=="__main__":return_values=futures.map(hello_world,range(16))print("\n".join(return_values))

Related Research Articles

<span class="mw-page-title-main">Erlang (programming language)</span> Programming language

Erlang is a general-purpose, concurrent, functional high-level programming language, and a garbage-collected runtime system. The term Erlang is used interchangeably with Erlang/OTP, or Open Telecom Platform (OTP), which consists of the Erlang runtime system, several ready-to-use components (OTP) mainly written in Erlang, and a set of design principles for Erlang programs.

Message Passing Interface (MPI) is a standardized and portable message-passing standard designed to function on parallel computing architectures. The MPI standard defines the syntax and semantics of library routines that are useful to a wide range of users writing portable message-passing programs in C, C++, and Fortran. There are several open-source MPI implementations, which fostered the development of a parallel software industry, and encouraged development of portable and scalable large-scale parallel applications.

<span class="mw-page-title-main">Modula-3</span>

Modula-3 is a programming language conceived as a successor to an upgraded version of Modula-2 known as Modula-2+. While it has been influential in research circles it has not been adopted widely in industry. It was designed by Luca Cardelli, James Donahue, Lucille Glassman, Mick Jordan, Bill Kalsow and Greg Nelson at the Digital Equipment Corporation (DEC) Systems Research Center (SRC) and the Olivetti Research Center (ORC) in the late 1980s.

Message-oriented middleware (MOM) is software or hardware infrastructure supporting sending and receiving messages between distributed systems. MOM allows application modules to be distributed over heterogeneous platforms and reduces the complexity of developing applications that span multiple operating systems and network protocols. The middleware creates a distributed communications layer that insulates the application developer from the details of the various operating systems and network interfaces. APIs that extend across diverse platforms and networks are typically provided by MOM.

HTCondor is an open-source high-throughput computing software framework for coarse-grained distributed parallelization of computationally intensive tasks. It can be used to manage workload on a dedicated cluster of computers, or to farm out work to idle desktop computers – so-called cycle scavenging. HTCondor runs on Linux, Unix, Mac OS X, FreeBSD, and Microsoft Windows operating systems. HTCondor can integrate both dedicated resources and non-dedicated desktop machines into one computing environment.

A tuple space is an implementation of the associative memory paradigm for parallel/distributed computing. It provides a repository of tuples that can be accessed concurrently. As an illustrative example, consider that there are a group of processors that produce pieces of data and a group of processors that use the data. Producers post their data as tuples in the space, and the consumers then retrieve data from the space that match a certain pattern. This is also known as the blackboard metaphor. Tuple space may be thought as a form of distributed shared memory.

In computer science, future, promise, delay, and deferred refer to constructs used for synchronizing program execution in some concurrent programming languages. They describe an object that acts as a proxy for a result that is initially unknown, usually because the computation of its value is not yet complete.

A problem solving environment (PSE) is a completed, integrated and specialised computer software for solving one class of problems, combining automated problem-solving methods with human-oriented tools for guiding the problem resolution. A PSE may also assist users in formulating problem resolution. A PSE may also assist users in formulating problems, selecting algorithm, simulating numerical value and viewing and analysing results.

Concurrent computing is a form of computing in which several computations are executed concurrently—during overlapping time periods—instead of sequentially—with one completing before the next starts.

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop was originally designed for computer clusters built from commodity hardware, which is still the common use. It has since also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.

OurGrid is an open-source grid middleware based on a peer-to-peer architecture. OurGrid was mainly developed at the Federal University of Campina Grande (Brazil), which has run an OurGrid instance named "OurGrid" since December 2004. Anyone can freely join it to gain access to large amount of computational power and run parallel applications. This computational power is provided by the idle resources of all participants, and is shared in a way that makes those who contribute more get more when they need. Currently, the platform can be used to run any application whose tasks do not communicate among themselves during execution, like most simulations, data mining and searching.

ProActive Parallel Suite is an open-source software for enterprise workload orchestration, part of the OW2 community. A workflow model allows a set of executables or scripts, written in any language, to be defined along with their dependencies, so ProActive Parallel Suite can schedule and orchestrate executions while optimising the use of computational resources.

Web2py is an open-source web application framework written in the Python programming language. Web2py allows web developers to program dynamic web content using Python. Web2py is designed to help reduce tedious web development tasks, such as developing web forms from scratch, although a web developer may build a form from scratch if required.

<span class="mw-page-title-main">Cython</span> Programming language

Cython is a superset of the programming language Python, which allows developers to write Python code that yields performance comparable to that of C.

<span class="mw-page-title-main">Shinken (software)</span> Network monitoring software

Shinken is an open source computer system and network monitoring software application compatible with Nagios. It watches hosts and services, gathers performance data and alerts users when error conditions occur and again when the conditions clear.

<span class="mw-page-title-main">DIET</span>

DIET is a software for grid-computing. As middleware, DIET sits between the operating system and the application software. DIET was created in 2000. It was designed for high-performance computing. It is currently developed by INRIA, École Normale Supérieure de Lyon, CNRS, Claude Bernard University Lyon 1, SysFera. It is open-source software released under the CeCILL license.

<span class="mw-page-title-main">Data grid</span> Set of services used to access, modify and transfer geographical data

A data grid is an architecture or set of services that gives individuals or groups of users the ability to access, modify and transfer extremely large amounts of geographically distributed data for research purposes. Data grids make this possible through a host of middleware applications and services that pull together data and resources from multiple administrative domains and then present it to users upon request. The data in a data grid can be located at a single site or multiple sites where each site can be its own administrative domain governed by a set of security restrictions as to who may access the data. Likewise, multiple replicas of the data may be distributed throughout the grid outside their original administrative domain and the security restrictions placed on the original data for who may access it must be equally applied to the replicas. Specifically developed data grid middleware is what handles the integration between users and the data they request by controlling access while making it available as efficiently as possible. The adjacent diagram depicts a high level view of a data grid.

Elixir is a functional, concurrent, high-level general-purpose programming language that runs on the BEAM virtual machine, which is also used to implement the Erlang programming language. Elixir builds on top of Erlang and shares the same abstractions for building distributed, fault-tolerant applications. Elixir also provides tooling and an extensible design. The latter is supported by compile-time metaprogramming with macros and polymorphism via protocols.

In computer programming, the async/await pattern is a syntactic feature of many programming languages that allows an asynchronous, non-blocking function to be structured in a way similar to an ordinary synchronous function. It is semantically related to the concept of a coroutine and is often implemented using similar techniques, and is primarily intended to provide opportunities for the program to execute other code while waiting for a long-running, asynchronous task to complete, usually represented by promises or similar data structures. The feature is found in C#, C++, Python, F#, Hack, Julia, Dart, Kotlin, Rust, Nim, JavaScript, Swift and Zig.

<span class="mw-page-title-main">Nim (programming language)</span> Programming language

Nim is a general-purpose, multi-paradigm, statically typed, compiled high-level systems programming language, designed and developed by a team around Andreas Rumpf. Nim is designed to be "efficient, expressive, and elegant", supporting metaprogramming, functional, message passing, procedural, and object-oriented programming styles by providing several features such as compile time code generation, algebraic data types, a foreign function interface (FFI) with C, C++, Objective-C, and JavaScript, and supporting compiling to those same languages as intermediate representations.

References

  1. "ZeroMQ - The Guide, Pieter Hintjens". iMatix Corporation. Retrieved 4 October 2012.