GridRPC

Last updated

GridRPC in distributed computing, is Remote Procedure Call over a grid. This paradigm has been proposed by the GridRPC working group [1] of the Open Grid Forum (OGF), and an API has been defined [2] in order for clients to access remote servers as simply as a function call. It is used among numerous Grid middleware for its simplicity of implementation, and has been standardized by the OGF in 2007. For interoperability reasons between the different existing middleware, the API has been followed by a document [3] describing good use and behavior of the different GridRPC API implementations. Works have then been conducted on the GridRPC Data Management, [4] which has been standardized in 2011.

Contents

Scope

The scope of this standard is to offer recommendations for the implementation of middleware. It deals with the following topics:

Context

Among existing middleware and application programming approaches, one simple, powerful, and flexible approach consists in using servers available in different administrative domains through the classical client-server or Remote Procedure Call (RPC) paradigm. Network Enabled Servers (NES) implement this model, which is also called GridRPC. Clients submit computation requests to a resource broker whose goal is to find a server available on the Grid. Scheduling is frequently applied to balance the work among the servers and a list of available servers is sent back to the client; the client is then able to send the data and the request to one of the suggested servers to solve its problem. Thanks to the growth of network bandwidth and the reduction of network latency, small computation requests can now be sent to servers available on the Grid. To make effective use of today's scalable resource platforms, it is important to ensure scalability in the middleware layers as well. This service-oriented approach is not new.

Several research projects have targeted this paradigm in the past. The main middleware implementing the API are DIET, NetSolve/GridSolve, Ninf, but some other environments use it like the SAGA interface from the OGF, and without the standardized API calls, like OmmiRPC, XtremWeb. The RPC model over the internet has also been used for several applications. Transparently through the Internet, large optimization problems can be solved using different approaches by simply filling a web page for remote image processing computations, the use of mathematical libraries or studies on heuristics and resolution methods for sparse linear algebra like GridTLSE. [5] This approach of providing computation services through the Internet is also highly close to the Service Oriented Computing (SOA) paradigm, and is the core of the Cloud computing.

Standardization and GridRPC API presentation

One simple, yet effective, mean to execute jobs on a computing grid is to use a GridRPC middleware, which relies on the GridRPC paradigm. For each request, the GridRPC middleware manages the management of the submission, of the input and output data, of the execution of the job on the remote resource, etc. To make available a service, a programmer must implement two codes: a client, where data are defined and which is run by the user when requesting the service, and a server, which contains the implementation of the service which is executed on the remote resource.

One step to ease the development of such codes conducted to define a GridRPC API, which has been proposed as a draft in November 2002 [6] and which is an Open Grid Forum (OGF) standard since September 2007. Thus a GridRPC source code which does not involve specific middleware data can be compiled and executed with any GridRPC compliant middleware.

Due to the difference in the choice of implementation of the GridRPC API, a document describing the interoperability between GridRPC middleware has also been written. Its main goals are to describe the difference in behaviour of the GridRPC middleware and to propose a common test that all GridRPC middleware must pass.

Discussions have then been undertaken on the data management within GridRPC middleware. A draft of an API has been proposed during the OGF'21 in October 2007. The motivation for this document is to provide explicit functions to manipulate the data exchange between a GridRPC platform and a client since (1) the size of the data used in grid applications may be large and useless data transfers must be avoided; (2) data are not always stored on the client side but may be made available either on a storage resource or within the GridRPC platform. Hence, a side effect is that a fully GridRPC-compliant code can be written and compiled with any GridRPC middleware implementing the GridRPC Data Management API.

GridRPC Paradigm

GridRPC paradigm GridRPC paradigm.svg
GridRPC paradigm

The GridRPC model is pictured in the following figure. Here is how communications are handled: (1) servers register their services to a registry; (2) when a client needs the execution of a service, it contacts the registry and (3) the registry returns a handle to the client; (4) then the client uses the handle to invoke the service on the server and (5) eventually receives back the results.

GridRPC API

Mechanisms involved in the API must provide means to make synchronous and/or asynchronous calls to a service. If the latter, clients must also be able to wait in a blocking or non-blocking manner after the completion of a given service. This naturally involves some data structures and conducts to a rigorous definition of the functions of the API.

GridRPC Data Types

Three main data types are needed to implement the API: (1) grpc_function_handle_t is the type of variables representing a remote function bound to a given server. Once allocated by the client, such a variable can be used to launch the service as many times as desired. It is explicitly invalidated by the user when not needed anymore; (2) grpc_session_t is the type of variables used to identify a specific non-blocking GridRPC call. Such a variable is mandatory to obtain information on the status of a job, in order for a client to wait after, cancel or know the error status of a call; (3) grpc_error_t groups all kind of errors and returns status codes involved in the GridRPC API.

GridRPC Functions

grpc_initialize() and grpc_finalize() functions are similar to the MPI initialize and finalize calls. It is mandatory that any GridRPC call is performed in between these two calls. They read configuration files, make the GridRPC environment ready and finish it.

In order to initialize and destruct a function handle, grpc_function_handle_init() and grpc_function_handle_destruct() functions have to be called. Because a function handle can be dynamically associated to a server, because of resource discovery mechanisms for example, a call to grpc_function_handle_default() let to postpone the server selection until the actual call is made on the handle.

grpc_get_handle() let the client retrieve the function handle corresponding to a session ID (e.g., to a non-blocking call) that has been previously performed.

Depending on the type of the call, blocking or non-blocking, the client can use the grpc_call() and grpc_call_async() function. If the latter, the client possesses after the call a session ID which can be used to respectively probe or wait for completion, cancel the call and check the error status of a non-blocking call.

After issuing a unique or numerous non-blocking calls, a client can use: grpc_probe() to know if the execution of the service has completed; grpc_probe_or() to know if one of the previous non-blocking calls has completed; grpc_cancel() to cancel a call; grpc_wait() to block until the completion of the requested service; grpc_wait_and() to block until all services corresponding to session IDs used as parameters are finished; grpc_wait_or() to block until any of the service corresponding to session IDs used as parameters has finished; grpc_wait_all() to block until all non-blocking calls have completed; and grpc_wait_any() to wait until any previously issued non-blocking request has completed.

GridRPC Compliant Code

Talk about the lib (+link) against which a code must compile and give a basic example

GridRPC documents

GridRPC implementations

Related Research Articles

In distributed computing, a remote procedure call (RPC) is when a computer program causes a procedure (subroutine) to execute in a different address space, which is written as if it were a normal (local) procedure call, without the programmer explicitly writing the details for the remote interaction. That is, the programmer writes essentially the same code whether the subroutine is local to the executing program, or remote. This is a form of client–server interaction, typically implemented via a request–response message-passing system. In the object-oriented programming paradigm, RPCs are represented by remote method invocation (RMI). The RPC model implies a level of location transparency, namely that calling procedures are largely the same whether they are local or remote, but usually, they are not identical, so local calls can be distinguished from remote calls. Remote calls are usually orders of magnitude slower and less reliable than local calls, so distinguishing them is important.

Jakarta Enterprise Beans is one of several Java APIs for modular construction of enterprise software. EJB is a server-side software component that encapsulates business logic of an application. An EJB web container provides a runtime environment for web related software components, including computer security, Java servlet lifecycle management, transaction processing, and other web services. The EJB specification is a subset of the Java EE specification.

The Common Object Request Broker Architecture (CORBA) is a standard defined by the Object Management Group (OMG) designed to facilitate the communication of systems that are deployed on diverse platforms. CORBA enables collaboration between systems on different operating systems, programming languages, and computing hardware. CORBA uses an object-oriented model although the systems that use the CORBA do not have to be object-oriented. CORBA is an example of the distributed object paradigm.

Network File System (NFS) is a distributed file system protocol originally developed by Sun Microsystems (Sun) in 1984, allowing a user on a client computer to access files over a computer network much like local storage is accessed. NFS, like many other protocols, builds on the Open Network Computing Remote Procedure Call system. NFS is an open IETF standard defined in a Request for Comments (RFC), allowing anyone to implement the protocol.

Middleware in the context of distributed applications is software that provides services beyond those provided by the operating system to enable the various components of a distributed system to communicate and manage data. Middleware supports and simplifies complex distributed applications. It includes web servers, application servers, messaging and similar tools that support application development and delivery. Middleware is especially integral to modern information technology based on XML, SOAP, Web services, and service-oriented architecture.

Distributed Component Object Model (DCOM) is a proprietary Microsoft technology for communication between software components on networked computers. DCOM, which originally was called "Network OLE", extends Microsoft's COM, and provides the communication substrate under Microsoft's COM+ application server infrastructure.

<span class="mw-page-title-main">Inter-process communication</span> How computer operating systems enable data sharing

In computer science, inter-process communication (IPC), also spelled interprocess communication, are the mechanisms provided by an operating system for processes to manage shared data. Typically, applications can use IPC, categorized as clients and servers, where the client requests data and the server responds to client requests. Many applications are both clients and servers, as commonly seen in distributed computing.

In distributed computing, an object request broker (ORB) is a concept of a middleware, which allows program calls to be made from one computer to another via a computer network, providing location transparency through remote procedure calls. ORBs promote interoperability of distributed object systems, enabling such systems to be built by piecing together objects from different vendors, while different parts communicate with each other via the ORB. Common Object Request Broker Architecture standartizes the way ORB may be implemented.

Message-oriented middleware (MOM) is software or hardware infrastructure supporting sending and receiving messages between distributed systems. MOM allows application modules to be distributed over heterogeneous platforms and reduces the complexity of developing applications that span multiple operating systems and network protocols. The middleware creates a distributed communications layer that insulates the application developer from the details of the various operating systems and network interfaces. APIs that extend across diverse platforms and networks are typically provided by MOM.

In computer science, message passing is a technique for invoking behavior on a computer. The invoking program sends a message to a process and relies on that process and its supporting infrastructure to then select and run some appropriate code. Message passing differs from conventional programming where a process, subroutine, or function is directly invoked by name. Message passing is key to some models of concurrency and object-oriented programming.

The Data Distribution Service (DDS) for real-time systems is an Object Management Group (OMG) machine-to-machine standard that aims to enable dependable, high-performance, interoperable, real-time, scalable data exchanges using a publish–subscribe pattern.

Distributed Resource Management Application API (DRMAA) is a high-level Open Grid Forum (OGF) API specification for the submission and control of jobs to a distributed resource management (DRM) system, such as a cluster or grid computing infrastructure. The scope of the API covers all the high level functionality required for applications to submit, control, and monitor jobs on execution resources in the DRM system.

<span class="mw-page-title-main">Open Grid Forum</span> Computing standards organization

The Open Grid Forum (OGF) is a community of users, developers, and vendors for standardization of grid computing. It was formed in 2006 in a merger of the Global Grid Forum and the Enterprise Grid Alliance. The OGF models its process on the Internet Engineering Task Force (IETF), and produces documents with many acronyms such as OGSA, OGSI, and JSDL.

The Simple API for Grid Applications (SAGA) is a family of related standards specified by the Open Grid Forum to define an application programming interface (API) for common distributed computing functionality.

<span class="mw-page-title-main">Open Cloud Computing Interface</span> Open protocol for cloud computing

The Open Cloud Computing Interface (OCCI) is a set of specifications delivered through the Open Grid Forum, for cloud computing service providers. OCCI has a set of implementations that act as proofs of concept. It builds upon World Wide Web fundamentals by using the Representational State Transfer (REST) approach for interacting with services.

rCUDA Type of middleware software framework for remote GPU virtualization

rCUDA, which stands for Remote CUDA, is a type of middleware software framework for remote GPU virtualization. Fully compatible with the CUDA application programming interface (API), it allows the allocation of one or more CUDA-enabled GPUs to a single application. Each GPU can be part of a cluster or running inside of a virtual machine. The approach is aimed at improving performance in GPU clusters that are lacking full utilization. GPU virtualization reduces the number of GPUs needed in a cluster, and in turn, leads to a lower cost configuration – less energy, acquisition, and maintenance.

<span class="mw-page-title-main">DIET</span>

DIET is a software for grid-computing. As middleware, DIET sits between the operating system and the application software. DIET was created in 2000. It was designed for high-performance computing. It is currently developed by INRIA, École Normale Supérieure de Lyon, CNRS, Claude Bernard University Lyon 1, SysFera. It is open-source software released under the CeCILL license.

Middleware is a type of computer software that provides services to software applications beyond those available from the operating system. It can be described as "software glue".

gRPC is a cross-platform open source high performance remote procedure call (RPC) framework. gRPC was initially created by Google, which used a single general-purpose RPC infrastructure called Stubby to connect the large number of microservices running within and across its data centers from about 2001. In March 2015, Google decided to build the next version of Stubby and make it open source. The result was gRPC, which is now used in many organizations aside from Google to power use cases from microservices to the “last mile” of computing. It uses HTTP/2 for transport, Protocol Buffers as the interface description language, and provides features such as authentication, bidirectional streaming and flow control, blocking or nonblocking bindings, and cancellation and timeouts. It generates cross-platform client and server bindings for many languages. Most common usage scenarios include connecting services in a microservices style architecture, or connecting mobile device clients to backend services.

References

  1. "Open Grid Forum Areas and Groups". Archived from the original on 2011-08-11. Retrieved 2011-05-23.
  2. "Archived copy" (PDF). Archived from the original (PDF) on 2011-09-28. Retrieved 2011-05-23.{{cite web}}: CS1 maint: archived copy as title (link)
  3. http://www.ogf.org/documents/GFD.102.pdf [ bare URL PDF ]
  4. http://www.ogf.org/documents/GFD.186.pdf [ bare URL PDF ]
  5. "Archived copy". Archived from the original on 2011-07-13. Retrieved 2011-05-23.{{cite web}}: CS1 maint: archived copy as title (link)
  6. Seymour, Keyth; Nakada, Hidemoto; Matsuoka, S.; Dongarra, Jack; Lee, Craig; Casanova, Henri (November 2002). "Overview of GridRPC: A Remote Procedure Call API for Grid Computing". Grid Computing — GRID 2002. Lecture Notes in Computer Science. Vol. 2536. pp. 274–278. doi:10.1007/3-540-36133-2_25. ISBN   978-3-540-00133-1.
  7. Caron, Eddy; Desprez, Frédéric (2006). "DIET: A Scalable Toolbox to Build Network Enabled Servers on the Grid". International Journal of High Performance Computing Applications. 20 (3): 335–352. CiteSeerX   10.1.1.126.236 . doi:10.1177/1094342006067472. S2CID   1050715.
  8. Yarkhan, A.; K. Seymour; K. Sagi; Z. Shi; J. Dongarra (2006). "Recent Developments in Gridsolve". International Journal of High Performance Computing Applications. 20 (1): 131–141. CiteSeerX   10.1.1.62.3205 . doi:10.1177/1094342006061893. S2CID   3019675.
  9. Nakada, Hidemoto; Sato, Mitsuhisa; Sekiguchi, S (1999). "Design and Implementations of Ninf: towards a Global Computing Infrastructure". Future Generation Computing Systems, Metacomputing Issue. 15 (5–6): 649–658. CiteSeerX   10.1.1.177.2195 . doi:10.1016/s0167-739x(99)00016-3.
  10. Sato, M; Hirano, M; Tanaka, Y; Sekiguchi, S (2001). "OmniRPC: A Grid RPC Facility for Cluster and Global Computing in OpenMP". OpenMP Shared Memory Parallel Programming. Lecture Notes in Computer Science. Vol. 2104. pp. 130–136. CiteSeerX   10.1.1.28.7334 . doi:10.1007/3-540-44587-0_12. ISBN   978-3-540-42346-1.