Kerrighed

Last updated

Kerrighed is an open source single-system image (SSI) cluster software project. The project started in October 1998 at the Paris research group The French National Institute for Research in Computer Science and Control. From 2006 to 2011, the project was mainly developed by Kerlabs. In January, 2012 the Linux clustering mission of Kerlabs was adopted by a new company: We Cluster, Inc. headquartered in Pacific Grove, California. January 18, 2012: Kerrighed 3.0 has been ported to Ubuntu 12.04 with Linux Kernel v3.2. [1]

Contents

Background

Kerrighed is implemented as an extension to the Linux operating system. It helps scientific applications such as numerical simulations to use more power. Such applications may be using OpenMP, Message Passing Interface, and/or a Posix multithreaded programming model. [2]

Kerrighed implements a set of global resource management services that aim at making resource distribution transparent to the applications, at managing resource sharing in and between applications and at taking benefit of the whole cluster resources for demanding applications. [3] Kerrighed provides a development framework allowing to easily implement dynamic scheduling policies without kernel modification. [4]

Kerrighed provides several features such as a distributed shared memory with a sequential consistency model, processes migration from one cluster node to another, and to a limited extent checkpointing.

Kerrighed uses a "container" concept: this entity is an abstraction of both files and memory. Each Kerrighed node starts a root container in which only local resources are available. To access resources globally, one must start the Kerrighed container on one node then add other nodes to the Kerrighed container. [5]

See also

Related Research Articles

<span class="mw-page-title-main">Beowulf cluster</span> Type of computing cluster

A Beowulf cluster is a computer cluster of what are normally identical, commodity-grade computers networked into a small local area network with libraries and programs installed which allow processing to be shared among them. The result is a high-performance parallel computing cluster from inexpensive personal computer hardware.

Checkpointing is a technique that provides fault tolerance for computing systems. It basically consists of saving a snapshot of the application's state, so that applications can restart from that point in case of failure. This is particularly important for long running applications that are executed in failure-prone computing systems.

MOSIX is a proprietary distributed operating system. Although early versions were based on older UNIX systems, since 1999 it focuses on Linux clusters and grids. In a MOSIX cluster/grid there is no need to modify or to link applications with any library, to copy files or login to remote nodes, or even to assign processes to different nodes – it is all done automatically, like in an SMP.

openMosix Distributed operating system

openMosix was a free cluster management system that provided single-system image (SSI) capabilities, e.g. automatic work distribution among nodes. It allowed program processes to migrate to machines in the node's network that would be able to run that process faster. It was particularly useful for running parallel applications having low to moderate input/output (I/O). It was released as a Linux kernel patch, but was also available on specialized Live CDs. openMosix development has been halted by its developers, but the LinuxPMI project is continuing development of the former openMosix code.

OpenSSI is an open-source single-system image clustering system. It allows a collection of computers to be treated as one large system, allowing applications running on any one machine access to the resources of all the machines in the cluster.

In distributed computing, a single system image (SSI) cluster is a cluster of machines that appears to be one single system. The concept is often considered synonymous with that of a distributed operating system, but a single image may be presented for more limited purposes, just job scheduling for instance, which may be achieved by means of an additional layer of software over conventional operating system images running on each node. The interest in SSI clusters is based on the perception that they may be simpler to use and administer than more specialized clusters.

OS-level virtualization is an operating system (OS) virtualization paradigm in which the kernel allows the existence of multiple isolated user space instances, called containers, zones, virtual private servers (OpenVZ), partitions, virtual environments (VEs), virtual kernels, or jails. Such instances may look like real computers from the point of view of programs running in them. A computer program running on an ordinary operating system can see all resources of that computer. However, programs running inside of a container can only see the container's contents and devices assigned to the container.

<span class="mw-page-title-main">Distributed Replicated Block Device</span> Distributed replicated storage system for Linux

Distributed Replicated Block Device (DRBD) is a distributed replicated storage system for the Linux platform. It is implemented as a kernel driver, several userspace management applications, and some shell scripts. DRBD is traditionally used in high availability (HA) computer clusters, but beginning with DRBD version 9, it can also be used to create larger software defined storage pools with a focus on cloud integration.

Operating systems use lock managers to organise and serialise the access to resources. A distributed lock manager (DLM) runs in every machine in a cluster, with an identical copy of a cluster-wide lock database. In this way a DLM provides software applications which are distributed across a cluster on multiple machines with a means to synchronize their accesses to shared resources.

<span class="mw-page-title-main">OpenVZ</span> Operating-system level virtualization technology

OpenVZ is an operating-system-level virtualization technology for Linux. It allows a physical server to run multiple isolated operating system instances, called containers, virtual private servers (VPSs), or virtual environments (VEs). OpenVZ is similar to Solaris Containers and LXC.

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop was originally designed for computer clusters built from commodity hardware, which is still the common use. It has since also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.

<span class="mw-page-title-main">Computer cluster</span> Set of computers configured in a distributed computing system

A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software. The newest manifestation of cluster computing is cloud computing.

A distributed operating system is system software over a collection of independent software, networked, communicating, and physically separate computational nodes. They handle jobs which are serviced by multiple CPUs. Each individual node holds a specific software subset of the global aggregate operating system. Each subset is a composite of two distinct service provisioners. The first is a ubiquitous minimal kernel, or microkernel, that directly controls that node's hardware. Second is a higher-level collection of system management components that coordinate the node's individual and collaborative activities. These components abstract microkernel functions and support user applications.

<span class="mw-page-title-main">Slurm Workload Manager</span> Free and open-source job scheduler for Linux and similar computers

The Slurm Workload Manager, formerly known as Simple Linux Utility for Resource Management (SLURM), or simply Slurm, is a free and open-source job scheduler for Linux and Unix-like kernels, used by many of the world's supercomputers and computer clusters.

BeeGFS is a parallel file system, developed and optimized for high-performance computing. BeeGFS includes a distributed metadata architecture for scalability and flexibility reasons. Its most used and widely known aspect is data throughput.

<span class="mw-page-title-main">PUPS P3</span>

PUPS/P3 is an implementation of an organic computing environment for Linux which provides support for the implementation of low level persistent software agents.

<span class="mw-page-title-main">OpenComRTOS</span> Real-time operating system

OpenComRTOS is a commercial network-centric, formally developed real-time operating system (RTOS), aimed mainly at the embedded system market.

<span class="mw-page-title-main">Supercomputer operating system</span> Use of Operative System by type of extremely powerful computer

A supercomputer operating system is an operating system intended for supercomputers. Since the end of the 20th century, supercomputer operating systems have undergone major transformations, as fundamental changes have occurred in supercomputer architecture. While early operating systems were custom tailored to each supercomputer to gain speed, the trend has been moving away from in-house operating systems and toward some form of Linux, with it running all the supercomputers on the TOP500 list in November 2017. In 2021, top 10 computers run for instance Red Hat Enterprise Linux (RHEL), or some variant of it or other Linux distribution e.g. Ubuntu.

Docker is a set of platform as a service (PaaS) products that use OS-level virtualization to deliver software in packages called containers. The service has both free and premium tiers. The software that hosts the containers is called Docker Engine. It was first released in 2013 and is developed by Docker, Inc.

Kubernetes is an open-source container orchestration system for automating software deployment, scaling, and management. Originally designed by Google, the project is now maintained by a worldwide community of contributors, and the trademark is held by the Cloud Native Computing Foundation.

References

  1. "Kerlabs.com · Home". www.kerlabs.com. Retrieved 2018-07-15.
  2. Morin, Christine. Kerrighed Linux-based SSI for clusters Archived 2006-07-23 at the Wayback Machine
  3. Morin, Christine; Lottiaux, Renaud; Vallée, Geoffroy; Gallard, Pascal; Utard, Gaël; Badrinath, R.; Rilling, Louis (2003), "Kerrighed: A Single System Image Cluster Operating System for High Performance Computing", Euro-Par 2003 Parallel Processing, Springer Berlin Heidelberg, pp. 1291–1294, doi:10.1007/978-3-540-45209-6_175, ISBN   9783540407881
  4. Vallee, G.; Morin, C.; Berthou, J.-Y.; Rilling, L. (2003). "A new approach to configurable dynamic scheduling in clusters based on single system image technologies". Proceedings International Parallel and Distributed Processing Symposium. IEEE Comput. Soc. p. 8. CiteSeerX   10.1.1.1.9711 . doi:10.1109/ipdps.2003.1213198. ISBN   978-0769519265. S2CID   18521922.
  5. "Kerrighed - Installation notes" . Retrieved July 14, 2018.