ProActive

Last updated
ProActive Parallel Suite
Developer(s) ActiveEon, Inria, OW2 Consortium
Stable release
10.0 (F-Zero) / July 12, 2019;4 years ago (2019-07-12)
Written in Java
Operating system Cross-platform
Type Job Scheduler
License AGPL
Website www.activeeon.com

ProActive Parallel Suite is an open-source software for enterprise workload orchestration, part of the OW2 community. A workflow model allows a set of executables or scripts, written in any language, to be defined along with their dependencies, so ProActive Parallel Suite can schedule and orchestrate executions while optimising the use of computational resources.

Contents

ProActive Parallel Suite is based on the "active object" design pattern (see Active objects) to optimise task distribution and fault-tolerance.

ProActive Parallel Suite key features

ProActive Java framework and Programming model

The model was created by Denis Caromel, professor at University of Nice Sophia Antipolis. [1] Several extensions of the model were made later on by members of the OASIS team at INRIA. [2] The book A Theory of Distributed Objects presents the ASP calculus that formalizes ProActive features, and provides formal semantics to the calculus, together with properties of ProActive program execution. [3]

Active objects

Active objects are the basic units of activity and distribution used for building concurrent applications using ProActive. An active object runs with its own thread. This thread only executes the methods invoked on this active object by other active objects, and those of the passive objects of the subsystem that belongs to this active object. With ProActive, the programmer does not have to explicitly manipulate Thread objects, unlike in standard Java.

Active objects can be created on any of the hosts involved in the computation. Once an active object is created, its activity (the fact that it runs with its own thread) and its location (local or remote) are perfectly transparent. Any active object can be manipulated as if it were a passive instance of the same class.

An active object is composed of two objects: a body, and a standard Java object. The body is not visible from the outside of the active object.

The body is responsible for receiving calls (or requests) on the active object and storing them in a queue of pending calls. It executes these calls in an order specified by a synchronization policy. If a synchronization policy is not specified, calls are managed in a "First in, first out" (FIFO) manner.

The thread of an active object then chooses a method in the queue of pending requests and executes it. No parallelism is provided inside an active object; this is an important decision in ProActive's design, enabling the use of "pre-post" conditions and class invariants.

On the side of the subsystem that sends a call to an active object, the active object is represented by a proxy. The proxy generates future objects for representing future values, transforms calls into Request objects (in terms of metaobject, this is a reification) and performs deep copies of passive objects passed as parameters.

Active object basis

ProActive is a library designed for developing applications in the model introduced by Eiffel//, a parallel extension of the Eiffel programming language.

In this model, the application is structured in subsystems. There is one active object (and therefore one thread) for each subsystem, and one subsystem for each active object (or thread). Each subsystem is thus composed of one active object and any number of passive objects—possibly no passive objects. The thread of one subsystem only executes methods in the objects of this subsystem. There are no "shared passive objects" between subsystems.

These features impact the application's topology. Of all the objects that make up a subsystem—the active object and the passive objects—only the active object is known to objects outside of the subsystem. All objects, both active and passive, may have references onto active objects. If an object o1 has a reference onto a passive object o2, then o1 and o2 are part of the same subsystem.

The model: Sequential, multithreaded, distributed Sequential 2 Distributed.png
The model: Sequential, multithreaded, distributed

This has also consequences on the semantics of message-passing between subsystems. When an object in a subsystem calls a method on an active object, the parameters of the call may be references on passive objects of the subsystem, which would lead to shared passive objects. This is why passive objects passed as parameters of calls on active objects are always passed by deep-copy. Active objects, on the other hand, are always passed by reference. Symmetrically, this also applies to objects returned from methods called on active objects.

Thanks to the concepts of asynchronous calls, futures, and no data sharing, an application written with ProActive doesn't need any structural change—actually, hardly any change at all—whether it runs in a sequential, multi-threaded, or distributed environment.

Asynchronous calls and futures

Whenever possible, a method call on an active object is reified as an asynchronous request. If not possible, the call is synchronous, and blocks until the reply is received. If the request is asynchronous, it immediately returns a future object.

The future object acts as a placeholder for the result of the not-yet-performed method invocation. As a consequence, the calling thread can go on with executing its code, as long as it doesn't need to invoke methods on the returned object. If the need arises, the calling thread is automatically blocked if the result of the method invocation is not yet available. Although a future object has structure similar to that of an active object, a future object is not active. It only has a Stub and a Proxy.

Code example

The code excerpt below highlights the notion of future objects. Suppose a user calls a method foo and a method bar from an active object a; the foo method returns void and the bar method returns an object of class V:

// a one way typed asynchronous communication towards the (remote) AO a// a request is sent to aa.foo(param);// a typed asynchronous communication with result.// v is first an awaited Future, to be transparently filled up after// service of the request, and replyVv=a.bar(param);...// use of the result of an asynchronous call.// if v is still an awaited future, it triggers an automatic// wait: Wait-by-necessityv.gee(param);

When foo is called on an active object a, it returns immediately (as the current thread cannot execute methods in the other subsystem). Similarly, when bar is called on a, it returns immediately but the result v can't be computed yet. A future object, which is a placeholder for the result of the method invocation, is returned. From the point of view of the caller subsystem, there is no difference between the future object and the object that would have been returned if the same call had been issued onto a passive object.

After both methods have returned, the calling thread continues executing its code as if the call had been effectively performed. The role of the future mechanism is to block the caller thread when the gee method is called on v and the result has not yet been set : this inter-object synchronization policy is known as wait-by-necessity.

See also

Related Research Articles

In distributed computing, a remote procedure call (RPC) is when a computer program causes a procedure (subroutine) to execute in a different address space, which is written as if it were a normal (local) procedure call, without the programmer explicitly writing the details for the remote interaction. That is, the programmer writes essentially the same code whether the subroutine is local to the executing program, or remote. This is a form of client–server interaction, typically implemented via a request–response message-passing system. In the object-oriented programming paradigm, RPCs are represented by remote method invocation (RMI). The RPC model implies a level of location transparency, namely that calling procedures are largely the same whether they are local or remote, but usually, they are not identical, so local calls can be distinguished from remote calls. Remote calls are usually orders of magnitude slower and less reliable than local calls, so distinguishing them is important.

Jakarta Enterprise Beans is one of several Java APIs for modular construction of enterprise software. EJB is a server-side software component that encapsulates business logic of an application. An EJB web container provides a runtime environment for web related software components, including computer security, Java servlet lifecycle management, transaction processing, and other web services. The EJB specification is a subset of the Java EE specification.

In computer science, message passing is a technique for invoking behavior on a computer. The invoking program sends a message to a process and relies on that process and its supporting infrastructure to then select and run some appropriate code. Message passing differs from conventional programming where a process, subroutine, or function is directly invoked by name. Message passing is key to some models of concurrency and object-oriented programming.

In computer science, future, promise, delay, and deferred refer to constructs used for synchronizing program execution in some concurrent programming languages. They describe an object that acts as a proxy for a result that is initially unknown, usually because the computation of its value is not yet complete.

.NET Remoting is a Microsoft application programming interface (API) for interprocess communication released in 2002 with the 1.0 version of .NET Framework. It is one in a series of Microsoft technologies that began in 1990 with the first version of Object Linking and Embedding (OLE) for 16-bit Windows. Intermediate steps in the development of these technologies were Component Object Model (COM) released in 1993 and updated in 1995 as COM-95, Distributed Component Object Model (DCOM), released in 1997, and COM+ with its Microsoft Transaction Server (MTS), released in 2000. It is now superseded by Windows Communication Foundation (WCF), which is part of the .NET Framework 3.0.

The active object design pattern decouples method execution from method invocation for objects that each reside in their own thread of control. The goal is to introduce concurrency, by using asynchronous method invocation and a scheduler for handling requests.

In computer programming, flow-based programming (FBP) is a programming paradigm that defines applications as networks of black box processes, which exchange data across predefined connections by message passing, where the connections are specified externally to the processes. These black box processes can be reconnected endlessly to form different applications without having to be changed internally. FBP is thus naturally component-oriented.

Fractal is a modular and extensible component model that can be used with various programming languages to design, implement, deploy and reconfigure various systems and applications, from operating systems to middleware platforms and to graphical user interfaces. The goal of Fractal is to reduce the development, deployment and maintenance costs of software systems in general, and of OW2 projects in particular. The Fractal model already uses some well known design patterns, such as separation of interface and implementation and, more generally, separation of concerns, in order to achieve this goal. There is also ongoing research work to get even closer to this goal. Fractal is hosted and developed by the OW2 consortium. It is distributed under the LGPL open-source license.

A workflow management system provides an infrastructure for the set-up, performance, and monitoring of a defined sequence of tasks arranged as a workflow application.

<span class="mw-page-title-main">Parallel Extensions</span>

Parallel Extensions was the development name for a managed concurrency library developed by a collaboration between Microsoft Research and the CLR team at Microsoft. The library was released in version 4.0 of the .NET Framework. It is composed of two parts: Parallel LINQ (PLINQ) and Task Parallel Library (TPL). It also consists of a set of coordination data structures (CDS) – sets of data structures used to synchronize and co-ordinate the execution of concurrent tasks.

Component Object Model (COM) is a binary-interface standard for software components introduced by Microsoft in 1993. It is used to enable inter-process communication (IPC) object creation in a large range of programming languages. COM is the basis for several other Microsoft technologies and frameworks, including OLE, OLE Automation, Browser Helper Object, ActiveX, COM+, DCOM, the Windows shell, DirectX, UMDF and Windows Runtime.

<span class="mw-page-title-main">Grand Central Dispatch</span> Technology developed by Apple Inc

Grand Central Dispatch, is a technology developed by Apple Inc. to optimize application support for systems with multi-core processors and other symmetric multiprocessing systems. It is an implementation of task parallelism based on the thread pool pattern. The fundamental idea is to move the management of the thread pool out of the hands of the developer, and closer to the operating system. The developer injects "work packages" into the pool oblivious of the pool's architecture. This model improves simplicity, portability and performance.

In multithreaded computer programming, asynchronous method invocation (AMI), also known as asynchronous method calls or the asynchronous pattern is a design pattern in which the call site is not blocked while waiting for the called code to finish. Instead, the calling thread is notified when the reply arrives. Polling for a reply is an undesired option.

A scientific workflow system is a specialized form of a workflow management system designed specifically to compose and execute a series of computational or data manipulation steps, or workflow, in a scientific application.

GridRPC in distributed computing, is Remote Procedure Call over a grid. This paradigm has been proposed by the GridRPC working group of the Open Grid Forum (OGF), and an API has been defined in order for clients to access remote servers as simply as a function call. It is used among numerous Grid middleware for its simplicity of implementation, and has been standardized by the OGF in 2007. For interoperability reasons between the different existing middleware, the API has been followed by a document describing good use and behavior of the different GridRPC API implementations. Works have then been conducted on the GridRPC Data Management, which has been standardized in 2011.

Join-patterns provides a way to write concurrent, parallel and distributed computer programs by message passing. Compared to the use of threads and locks, this is a high level programming model using communication constructs model to abstract the complexity of concurrent environment and to allow scalability. Its focus is on the execution of a chord between messages atomically consumed from a group of channels.

In computer programming, the async/await pattern is a syntactic feature of many programming languages that allows an asynchronous, non-blocking function to be structured in a way similar to an ordinary synchronous function. It is semantically related to the concept of a coroutine and is often implemented using similar techniques, and is primarily intended to provide opportunities for the program to execute other code while waiting for a long-running, asynchronous task to complete, usually represented by promises or similar data structures. The feature is found in C#, C++, Python, F#, Hack, Julia, Dart, Kotlin, Rust, Nim, JavaScript, Swift and Zig.

Asynchrony, in computer programming, refers to the occurrence of events independent of the main program flow and ways to deal with such events. These may be "outside" events such as the arrival of signals, or actions instigated by a program that take place concurrently with program execution, without the program hanging to wait for results. Asynchronous input/output is an example of the latter case of asynchrony, and lets programs issue commands to storage or network devices that service these requests while the processor continues executing the program. Doing so provides a degree of parallelism.

An asynchronous procedure call (APC) is a unit of work in a computer.

Pattern-Oriented Software Architecture is a series of software engineering books describing software design patterns.

References

  1. Caromel, Denis (September 1993). "Towards a Method of Object-Oriented Concurrent Programming". Communications of the ACM. 36 (9): 90–102. doi: 10.1145/162685.162711 . S2CID   8310500.
  2. Baduel, Laurent; Baude, Françoise; Caromel, Denis; Contes, Arnaud; Huet, Fabrice; Morel, Matthieu; Quilici, Romain (January 2006). Cunha, José C.; Rana, Omer F. (eds.). Programming, Composing, Deploying for the Grid (PDF) (PDF). Sprinter-Verlag. pp. 205–229. CiteSeerX   10.1.1.58.7806 . doi:10.1007/1-84628-339-6_9. ISBN   978-1-85233-998-2. CiteSeerX: 10.1.1.58.7806 .{{cite book}}: |journal= ignored (help)
  3. Caromel, Denis; Henrio, Ludovic (2005). A Theory of Distributed Objects: asynchrony, mobility, groups, components. Berlin: Springer. ISBN   978-3-540-20866-2. LCCN   2005923024.

Further reading