DIET

Last updated
DIET
Developer(s) INRIA, École Normale Supérieure de Lyon, SysFera, CNRS, Claude Bernard University Lyon 1
Stable release
2.8 / 11/14/11
Written in C++, CORBA
Operating system Cross-platform
Type Grid and Cloud computing
License CeCILL
Website graal.ens-lyon.fr/DIET

DIET is a software for grid-computing. As middleware, DIET sits between the operating system (which handles the details of the hardware) and the application software (which deals with the specific computational task at hand). DIET was created in 2000. [1] It was designed for high-performance computing. It is currently developed by INRIA, École Normale Supérieure de Lyon, CNRS, Claude Bernard University Lyon 1, SysFera. It is open-source software released under the CeCILL license.

Contents

Like NetSolve/GridSolve and Ninf, DIET is compliant with the GridRPC standard from the Open Grid Forum. [2]

The aim of the DIET project is to develop a set of tools to build computational servers. The distributed resources are managed in a transparent way through the middleware. It can work with workstations, clusters, Grids and clouds.

DIET is used to manage the Décrypthon Grid installed by IBM in six French universities (Bordeaux 1, Lille 1, Paris 6, ENS Lyon, Crihan in Rouen, Orsay).

Architecture

Usually, GridRPC environments have five different components: clients that submit problems to servers, servers that solve the problems sent by clients, a database that contains information about software and hardware resources, a scheduler that chooses an appropriate server depending on the problem sent and the information contained in the database, and monitors that get information about the status of the computational resources.

DIET's architecture follows a different design. It is composed of:

  1. a client - the application that uses DIET to solve problems. Clients can connect to DIET from a web page or through an API or compiled program.
  2. a Master Agent (MA) that receives computation requests from clients. The MA then collects computation abilities from the servers and chooses one based on scheduling criteria. The reference of the chosen server is returned to the client. A client can be connected to an MA by a specific name server or a web page that stores the various MA locations.
  3. a Local Agent (LA) that aims at transmitting requests and information between MAs and servers. The information stored on an LA is the list of requests and, for each of its subtrees, the number of servers that can solve a given problem and information about the data distributed in this subtree. Depending on the underlying network topology, a hierarchy of LAs may be deployed between an MA and the servers.
  4. a Server Daemon (SeD) that is the point of entry of a computational server. It manages a processor or a cluster. The information stored on a SeD is the list of the data available on a server (possibly with their distribution and the way to access them), the list of the problems than can be solved on it, and all the information concerning its load (e.g., CPU capacity, available memory).
Diet-archi.png

Multi-hierarchy

Two approaches were developed:

Workflow management

For workflow management, DIET uses an additional entity called MA DAG. This entity can work in two modes: one in which it defines a complete scheduling of the workflow (ordering and mapping), and one in which it defines only an ordering for the workflow execution. Mapping is then done in the next step by the client, using the Master Agent to find the server where the workflow services should be run.

Diet-workflowarchi.png

Scheduling

DIET provides a degree of control over the scheduling subsystem via plug-in schedulers. [3] When a service request from an application arrives at a SeD, the SeD creates a performance-estimation vector, a collection of performance-estimation values that are pertinent to the scheduling process for that application. The values to be stored in this structure can be either values provided by CoRI (Collectors of Resource Information) or custom values generated by the SeD itself. The design of the estimation vector's subsystem is modular.

CoRI generates a basic set of performance-estimation values which are stored in the estimation vector and identified by system-defined tags. Information such as the number of cores, the total memory, the number of bogomips, and hard drive speed, etc., which are static, as well as dynamic information like the predicted time to solve a problem on the given resource, the average CPU load, is thus transferred from the Server Daemon to the scheduler agent in order to provide pertinent information for a better scheduling. As mentioned above, these are used in correlation with the application-driven scheduler possibility in DIET: the Server Daemon, which has a better understanding of the application needs, can request for a specific scheduling relaying on the information stored in this vector.

DIET data management

Three different data managers have been integrated into DIET:

  1. DTM from the University of Franche-Comté (not maintained);
  2. JuxMEM from the IRISA (not maintained); [4]
  3. DAGDA from École Normale Supérieure de Lyon.

DIET LRMS management

Parallel resources are generally accessible through a LRMS (Local Resource Management System), also called a batch system. DIET provides an interface with several existing LRMS to execute jobs: LoadLeveler (on IBM resources), OpenPBS (a fork of the well-known PBS system), and OAR (the batch scheduler used by the Grid'5000 research grid, developed by IMAG at Grenoble). Most of the submitted jobs are parallel jobs, coded using the MPI standard with an instantiation such as MPICH or LAM.

Cloud-resource management

A Cloud extension for DIET was created in 2009. [5] DIET is thus able to access Cloud resources through two existing Cloud providers:

  1. Eucalyptus, which is open-source software developed by the University of California, Santa Barbara.
  2. Amazon Elastic Compute Cloud, which is commercial software part of Amazon.com's cloud computing services.

Related Research Articles

<span class="mw-page-title-main">Client–server model</span> Distributed application structure in computing

The client–server model is a distributed application structure that partitions tasks or workloads between the providers of a resource or service, called servers, and service requesters, called clients. Often clients and servers communicate over a computer network on separate hardware, but both client and server may reside in the same system. A server host runs one or more server programs, which share their resources with clients. A client usually does not share any of its resources, but it requests content or service from a server. Clients, therefore, initiate communication sessions with servers, which await incoming requests. Examples of computer applications that use the client–server model are email, network printing, and the World Wide Web.

In distributed computing, a remote procedure call (RPC) is when a computer program causes a procedure (subroutine) to execute in a different address space, which is written as if it were a normal (local) procedure call, without the programmer explicitly writing the details for the remote interaction. That is, the programmer writes essentially the same code whether the subroutine is local to the executing program, or remote. This is a form of client–server interaction, typically implemented via a request–response message-passing system. In the object-oriented programming paradigm, RPCs are represented by remote method invocation (RMI). The RPC model implies a level of location transparency, namely that calling procedures are largely the same whether they are local or remote, but usually, they are not identical, so local calls can be distinguished from remote calls. Remote calls are usually orders of magnitude slower and less reliable than local calls, so distinguishing them is important.

In telecommunication, provisioning involves the process of preparing and equipping a network to allow it to provide new services to its users. In National Security/Emergency Preparedness telecommunications services, "provisioning" equates to "initiation" and includes altering the state of an existing priority service or capability.

Grid computing is the use of widely distributed computer resources to reach a common goal. A computing grid can be thought of as a distributed system with non-interactive workloads that involve many files. Grid computing is distinguished from conventional high-performance computing systems such as cluster computing in that grid computers have each node set to perform a different task/application. Grid computers also tend to be more heterogeneous and geographically dispersed than cluster computers. Although a single grid can be dedicated to a particular application, commonly a grid is used for a variety of purposes. Grids are often constructed with general-purpose grid middleware software libraries. Grid sizes can be quite large.

Middleware in the context of distributed applications is software that provides services beyond those provided by the operating system to enable the various components of a distributed system to communicate and manage data. Middleware supports and simplifies complex distributed applications. It includes web servers, application servers, messaging and similar tools that support application development and delivery. Middleware is especially integral to modern information technology based on XML, SOAP, Web services, and service-oriented architecture.

<span class="mw-page-title-main">Load balancing (computing)</span> Set of techniques to improve the distribution of workloads across multiple computing resources

In computing, load balancing is the process of distributing a set of tasks over a set of resources, with the aim of making their overall processing more efficient. Load balancing can optimize the response time and avoid unevenly overloading some compute nodes while other compute nodes are left idle.

<span class="mw-page-title-main">Inter-process communication</span> How computer operating systems enable data sharing

In computer science, inter-process communication (IPC), also spelled interprocess communication, are the mechanisms provided by an operating system for processes to manage shared data. Typically, applications can use IPC, categorized as clients and servers, where the client requests data and the server responds to client requests. Many applications are both clients and servers, as commonly seen in distributed computing.

Message-oriented middleware (MOM) is software or hardware infrastructure supporting sending and receiving messages between distributed systems. MOM allows application modules to be distributed over heterogeneous platforms and reduces the complexity of developing applications that span multiple operating systems and network protocols. The middleware creates a distributed communications layer that insulates the application developer from the details of the various operating systems and network interfaces. APIs that extend across diverse platforms and networks are typically provided by MOM.

UNICORE (UNiform Interface to COmputing REsources) is a grid computing technology for resources such as supercomputers or cluster systems and information stored in databases. UNICORE was developed in two projects funded by the German ministry for education and research (BMBF). In European-funded projects UNICORE evolved to a middleware system used at several supercomputer centers. UNICORE served as a basis in other research projects. The UNICORE technology is open source under BSD licence and available at SourceForge.

A problem solving environment (PSE) is a completed, integrated and specialised computer software for solving one class of problems, combining automated problem-solving methods with human-oriented tools for guiding the problem resolution. A PSE may also assist users in formulating problem resolution. A PSE may also assist users in formulating problems, selecting algorithm, simulating numerical value and viewing and analysing results.

Within cluster and parallel computing, a cluster manager is usually backend graphical user interface (GUI) or command-line interface (CLI) software that runs on a set of cluster nodes that it manages. The cluster manager works together with a cluster management agent. These agents run on each node of the cluster to manage and configure services, a set of services, or to manage and configure the complete cluster server itself In some cases the cluster manager is mostly used to dispatch work for the cluster to perform. In this last case a subset of the cluster manager can be a remote desktop application that is used not for configuration but just to send work and get back work results from a cluster. In other cases the cluster is more related to availability and load balancing than to computational or specific service clusters.

<span class="mw-page-title-main">BOINC client–server technology</span> BOINC volunteer computing client–server structure

BOINC client–server technology refers to the model under which BOINC works. The BOINC framework consists of two layers which operate under the client–server architecture. Once the BOINC software is installed in a machine, the server starts sending tasks to the client. The operations are performed client-side and the results are uploaded to the server-side.

<span class="mw-page-title-main">Volunteer computing</span> System where users donate computer resources to contribute to research

Volunteer computing is a type of distributed computing in which people donate their computers' unused resources to a research-oriented project, and sometimes in exchange for credit points. The fundamental idea behind it is that a modern desktop computer is sufficiently powerful to perform billions of operations a second, but for most users only between 10–15% of its capacity is used. Common tasks such as word processing or web browsing leave the computer mostly idle.

<span class="mw-page-title-main">LONI Pipeline</span> Scientific workflow software

The LONI Pipeline is a free distributed system for designing, executing, monitoring and sharing scientific workflows on grid computing architectures. Pipeline allows users to connect and run any number of different software tools, and conveniently visualize and download the results.

Techila Distributed Computing Engine is a commercial grid computing software product. It speeds up simulation, analysis and other computational applications by enabling scalability across the IT resources in user's on-premises data center and in the user's own cloud account. Techila Distributed Computing Engine is developed and licensed by Techila Technologies Ltd, a privately held company headquartered in Tampere, Finland. The product is also available as an on-demand solution in Google Cloud Launcher, the online marketplace created and operated by Google. According to IDC, the solution enables organizations to create HPC infrastructure without the major capital investments and operating expenses required by new HPC hardware.

GridRPC in distributed computing, is Remote Procedure Call over a grid. This paradigm has been proposed by the GridRPC working group of the Open Grid Forum (OGF), and an API has been defined in order for clients to access remote servers as simply as a function call. It is used among numerous Grid middleware for its simplicity of implementation, and has been standardized by the OGF in 2007. For interoperability reasons between the different existing middleware, the API has been followed by a document describing good use and behavior of the different GridRPC API implementations. Works have then been conducted on the GridRPC Data Management, which has been standardized in 2011.

Middleware is a type of computer software programme that provides services to software applications beyond those available from the operating system. It can be described as "software glue".

<span class="mw-page-title-main">NEOS Server</span>

The NEOS Server is an Internet-based client-server application that provides free access to a library of optimization solvers. Its library of solvers includes more than 60 commercial, free and open source solvers, which can be applied to mathematical optimization problems of more than 12 different types, including linear programming, integer programming and nonlinear optimization.

Conductor is a free and open-source microservice orchestration software platform originally developed by Netflix.

References

  1. Caron, Eddy; Desprez, Frédéric (2006). "DIET: A Scalable Toolbox to Build Network Enabled Servers on the Grid". International Journal of High Performance Computing Applications. 20 (3): 335–352. CiteSeerX   10.1.1.126.236 . doi:10.1177/1094342006067472. S2CID   1050715.
  2. Caniou, Yves; Caron, Eddy; Desprez, Frédéric; Nakada, Hidemoto; Seymour, Keith; Tanaka, Yoshio (2009). Grid Technology and Applications: Recent Developments. Chapter: High performance GridRPC middleware. Nova Science Publishers. ISBN   978-1-60692-768-7.
  3. Caron, Eddy; Chis, Andréea; Desprez, Frédéric; Su, Alan (January 2008). "Design of plug-in schedulers for a GridRPC environment". Future Generation Computer Systems. 24 (1): 46–57. doi:10.1016/j.future.2007.02.005.
  4. Antoniu, Gabriel; Bougé, Luc; Jan, Mathieu (November 2005). "JuxMem: An Adaptive Supportive Platform for Data Sharing on the Grid". Scalable Computing: Practice and Experience. 6 (3): 45–55.
  5. Caron, Eddy; Desprez, Frédéric; Loureiro, David; Muresan, Adrian (September 2009). "Cloud Computing Resource Management through a Grid Middleware: A Case Study with DIET and Eucalyptus" (PDF). 2009 IEEE International Conference on Cloud Computing. pp. 151–154. doi:10.1109/CLOUD.2009.70. ISBN   978-1-4244-5199-9. S2CID   18853964.