Single system image

Last updated

In distributed computing, a single system image (SSI) cluster is a cluster of machines that appears to be one single system. [1] [2] [3] The concept is often considered synonymous with that of a distributed operating system, [4] [5] but a single image may be presented for more limited purposes, just job scheduling for instance, which may be achieved by means of an additional layer of software over conventional operating system images running on each node. [6] The interest in SSI clusters is based on the perception that they may be simpler to use and administer than more specialized clusters.

Contents

Different SSI systems may provide a more or less complete illusion of a single system.

Features of SSI clustering systems

Different SSI systems may, depending on their intended usage, provide some subset of these features.

Process migration

Many SSI systems provide process migration. [7] Processes may start on one node and be moved to another node, possibly for resource balancing or administrative reasons. [note 1] As processes are moved from one node to another, other associated resources (for example IPC resources) may be moved with them.

Process checkpointing

Some SSI systems allow checkpointing of running processes, allowing their current state to be saved and reloaded at a later date. [note 2] Checkpointing can be seen as related to migration, as migrating a process from one node to another can be implemented by first checkpointing the process, then restarting it on another node. Alternatively checkpointing can be considered as migration to disk.

Single process space

Some SSI systems provide the illusion that all processes are running on the same machine - the process management tools (e.g. "ps", "kill" on Unix like systems) operate on all processes in the cluster.

Single root

Most SSI systems provide a single view of the file system. This may be achieved by a simple NFS server, shared disk devices or even file replication.

The advantage of a single root view is that processes may be run on any available node and access needed files with no special precautions. If the cluster implements process migration a single root view enables direct accesses to the files from the node where the process is currently running.

Some SSI systems provide a way of "breaking the illusion", having some node-specific files even in a single root. HP TruCluster provides a "context dependent symbolic link" (CDSL) which points to different files depending on the node that accesses it. HP VMScluster provides a search list logical name with node specific files occluding cluster shared files where necessary. This capability may be necessary to deal with heterogeneous clusters, where not all nodes have the same configuration. In more complex configurations such as multiple nodes of multiple architectures over multiple sites, several local disks may combine to form the logical single root.

Single I/O space

Some SSI systems allow all nodes to access the I/O devices (e.g. tapes, disks, serial lines and so on) of other nodes. There may be some restrictions on the kinds of accesses allowed (For example, OpenSSI can't mount disk devices from one node on another node).

Single IPC space

Some SSI systems allow processes on different nodes to communicate using inter-process communications mechanisms as if they were running on the same machine. On some SSI systems this can even include shared memory (can be emulated in software with distributed shared memory).

In most cases inter-node IPC will be slower than IPC on the same machine, possibly drastically slower for shared memory. Some SSI clusters include special hardware to reduce this slowdown.

Cluster IP address

Some SSI systems provide a "cluster IP address", a single address visible from outside the cluster that can be used to contact the cluster as if it were one machine. This can be used for load balancing inbound calls to the cluster, directing them to lightly loaded nodes, or for redundancy, moving the cluster address from one machine to another as nodes join or leave the cluster. [note 3]

Examples

Examples here vary from commercial platforms with scaling capabilities, to packages/frameworks for creating distributed systems, as well as those that actually implement a single system image.

SSI Properties of different clustering systems
Name Process migration Process checkpoint Single process space Single root Single I/O space Single IPC space Cluster IP address [t 1] Source Model Latest release date [t 2] Supported OS
Amoeba [t 3] Yes Yes Yes Yes Un­known Yes Un­known Open July 30, 1996Native
AIX TCF Un­known Un­known Un­known Yes Un­known Un­known Un­known Closed March 30, 1990 [8] AIX PS/2 1.2
NonStop Guardian [t 4] YesYes Yes Yes Yes Yes Yes Closed 2018 NonStop OS
Inferno NoNo No Yes Yes Yes Un­known Open March 4, 2015Native, Windows, Irix, Linux, OS X, FreeBSD, Solaris, Plan 9
Kerrighed Yes Yes Yes Yes Un­known Yes Un­known Open June 14, 2010 Linux 2.6.30
LinuxPMI [t 5] Yes Yes No Yes No No Un­known Open June 18, 2006 Linux 2.6.17
LOCUS [t 6] Yes Un­known Yes Yes Yes Yes [t 7] Un­known Closed 1988Native
MOSIX Yes Yes No Yes No No Un­known Closed October 24, 2017 Linux
openMosix [t 8] Yes Yes No Yes No No Un­known Open December 10, 2004 Linux 2.4.26
Open-Sharedroot [t 9] No No No Yes No No Yes Open September 1, 2011 [9] Linux
OpenSSI Yes No Yes Yes Yes Yes Yes Open February 18, 2010 Linux 2.6.10 (Debian, Fedora)
Plan 9 No [10] No No Yes Yes Yes Yes Open January 9, 2015Native
Sprite Yes Un­known No Yes Yes No Un­known Open 1992Native
TidalScale YesNo Yes Yes Yes Yes Yes Closed August 17, 2020 Linux, FreeBSD
TruCluster No Un­known No Yes No No Yes Closed October 1, 2010 Tru64
VMScluster No No Yes Yes Yes Yes Yes Closed January 25, 2024 OpenVMS
z/VM Yes No Yes No No Yes Un­known Closed September 16, 2022Native
UnixWare NonStop Clusters [t 10] Yes No Yes Yes Yes Yes Yes Closed June 2000 UnixWare
  1. Many of the Linux based SSI clusters can use the Linux Virtual Server to implement a single cluster IP address
  2. Green means software is actively developed
  3. Amoeba development is carried forward by Dr. Stefan Bosse at BSS Lab Archived 2009-02-03 at the Wayback Machine
  4. Guardian90 TR90.8 Based on R&D by Tandem Computers c/o Andrea Borr at
  5. LinuxPMI is a successor to openMosix
  6. LOCUS was used to create IBM AIX TCF
  7. LOCUS used named pipes for IPC
  8. openMosix was a fork of MOSIX
  9. Open-Sharedroot is a shared root Cluster from ATIX
  10. UnixWare NonStop Clusters was a base for OpenSSI

See also

Notes

  1. for example it may be necessary to move long running processes off a node that is to be closed down for maintenance
  2. Checkpointing is particularly useful in clusters used for high-performance computing, avoiding lost work in case of a cluster or node restart.
  3. "leaving a cluster" is often a euphemism for crashing

Related Research Articles

<span class="mw-page-title-main">Beowulf cluster</span> Type of computing cluster

A Beowulf cluster is a computer cluster of what are normally identical, commodity-grade computers networked into a small local area network with libraries and programs installed which allow processing to be shared among them. The result is a high-performance parallel computing cluster from inexpensive personal computer hardware.

Checkpointing is a technique that provides fault tolerance for computing systems. It basically consists of saving a snapshot of the application's state, so that applications can restart from that point in case of failure. This is particularly important for long running applications that are executed in failure-prone computing systems.

MOSIX is a proprietary distributed operating system. Although early versions were based on older UNIX systems, since 1999 it focuses on Linux clusters and grids. In a MOSIX cluster/grid there is no need to modify or to link applications with any library, to copy files or login to remote nodes, or even to assign processes to different nodes – it is all done automatically, like in an SMP.

openMosix Distributed operating system

openMosix was a free cluster management system that provided single-system image (SSI) capabilities, e.g. automatic work distribution among nodes. It allowed program processes to migrate to machines in the node's network that would be able to run that process faster. It was particularly useful for running parallel applications having low to moderate input/output (I/O). It was released as a Linux kernel patch, but was also available on specialized Live CDs. openMosix development has been halted by its developers, but the LinuxPMI project is continuing development of the former openMosix code.

OpenSSI is an open-source single-system image clustering system. It allows a collection of computers to be treated as one large system, allowing applications running on any one machine access to the resources of all the machines in the cluster.

In computing, the Global File System 2 or GFS2 is a shared-disk file system for Linux computer clusters. GFS2 allows all members of a cluster to have direct concurrent access to the same shared block storage, in contrast to distributed file systems which distribute data throughout the cluster. GFS2 can also be used as a local file system on a single computer.

Kerrighed is an open source single-system image (SSI) cluster software project. The project started in October 1998 at the Paris research group The French National Institute for Research in Computer Science and Control. From 2006 to 2011, the project was mainly developed by Kerlabs. In January, 2012 the Linux clustering mission of Kerlabs was adopted by a new company: We Cluster, Inc. headquartered in Pacific Grove, California. January 18, 2012: Kerrighed 3.0 has been ported to Ubuntu 12.04 with Linux Kernel v3.2.

Linux Terminal Server Project (LTSP) is a free and open-source terminal server for Linux that allows many people to simultaneously use the same computer. Applications run on the server with a terminal known as a thin client handling input and output. Generally, terminals are low-powered, lack a hard disk and are quieter and more reliable than desktop computers because they do not have any moving parts.

<span class="mw-page-title-main">Diskless node</span> Computer workstation operated without disk drives

A diskless node is a workstation or personal computer without disk drives, which employs network booting to load its operating system from a server.

High-availability clusters are groups of computers that support server applications that can be reliably utilized with a minimum amount of down-time. They operate by using high availability software to harness redundant computers in groups or clusters that provide continued service when system components fail. Without clustering, if a server running a particular application crashes, the application will be unavailable until the crashed server is fixed. HA clustering remedies this situation by detecting hardware/software faults, and immediately restarting the application on another system without requiring administrative intervention, a process known as failover. As part of this process, clustering software may configure the node before starting the application on it. For example, appropriate file systems may need to be imported and mounted, network hardware may have to be configured, and some supporting applications may need to be running as well.

GPFS is high-performance clustered file system software developed by IBM. It can be deployed in shared-disk or shared-nothing distributed parallel modes, or a combination of these. It is used by many of the world's largest commercial companies, as well as some of the supercomputers on the Top 500 List. For example, it is the filesystem of the Summit at Oak Ridge National Laboratory which was the #1 fastest supercomputer in the world in the November 2019 TOP500 list of supercomputers. Summit is a 200 Petaflops system composed of more than 9,000 POWER9 processors and 27,000 NVIDIA Volta GPUs. The storage filesystem called Alpine has 250 PB of storage using Spectrum Scale on IBM ESS storage hardware, capable of approximately 2.5TB/s of sequential I/O and 2.2TB/s of random I/O.

In database computing, Oracle Real Application Clusters (RAC) — an option for the Oracle Database software produced by Oracle Corporation and introduced in 2001 with Oracle9i — provides software for clustering and high availability in Oracle database environments. Oracle Corporation includes RAC with the Enterprise Edition, provided the nodes are clustered using Oracle Clusterware.

<span class="mw-page-title-main">OpenVZ</span> Operating-system level virtualization technology

OpenVZ is an operating-system-level virtualization technology for Linux. It allows a physical server to run multiple isolated operating system instances, called containers, virtual private servers (VPSs), or virtual environments (VEs). OpenVZ is similar to Solaris Containers and LXC.

The Parallel Virtual File System (PVFS) is an open-source parallel file system. A parallel file system is a type of distributed file system that distributes file data across multiple servers and provides for concurrent access by multiple tasks of a parallel application. PVFS was designed for use in large scale cluster computing. PVFS focuses on high performance access to large data sets. It consists of a server process and a client library, both of which are written entirely of user-level code. A Linux kernel module and pvfs-client process allow the file system to be mounted and used with standard utilities. The client library provides for high performance access via the message passing interface (MPI). PVFS is being jointly developed between The Parallel Architecture Research Laboratory at Clemson University and the Mathematics and Computer Science Division at Argonne National Laboratory, and the Ohio Supercomputer Center. PVFS development has been funded by NASA Goddard Space Flight Center, The DOE Office of Science Advanced Scientific Computing Research program, NSF PACI and HECURA programs, and other government and private agencies. PVFS is now known as OrangeFS in its newest development branch.

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop was originally designed for computer clusters built from commodity hardware, which is still the common use. It has since also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.

A diskless shared-root cluster is a way to manage several machines at the same time. Instead of each having its own operating system (OS) on its local disk, there is only one image of the OS available on a server, and all the nodes use the same image.

A clustered file system (CFS) is a file system which is shared by being simultaneously mounted on multiple servers. There are several approaches to clustering, most of which do not employ a clustered file system. Clustered file systems can provide features like location-independent addressing and redundancy which improve reliability or reduce the complexity of the other parts of the cluster. Parallel file systems are a type of clustered file system that spread data across multiple storage nodes, usually for redundancy or performance.

<span class="mw-page-title-main">Computer cluster</span> Set of computers configured in a distributed computing system

A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software. The newest manifestation of cluster computing is cloud computing.

LOCUS is a discontinued distributed operating system developed at UCLA during the 1980s. It was notable for providing an early implementation of the single-system image idea, where a cluster of machines appeared to be one larger machine.

NonStop Clusters (NSC) was an add-on package for SCO UnixWare that allowed creation of fault-tolerant single-system image clusters of machines running UnixWare. NSC was one of the first commercially available highly available clustering solutions for commodity hardware.

References

  1. Pfister, Gregory F. (1998), In search of clusters , Upper Saddle River, NJ: Prentice Hall PTR, ISBN   978-0-13-899709-0, OCLC   38300954
  2. Buyya, Rajkumar; Cortes, Toni; Jin, Hai (2001), "Single System Image" (PDF), International Journal of High Performance Computing Applications, 15 (2): 124, doi:10.1177/109434200101500205, S2CID   38921084
  3. Healy, Philip; Lynn, Theo; Barrett, Enda; Morrison, John P. (2016), "Single system image: A survey" (PDF), Journal of Parallel and Distributed Computing, 90–91: 35–51, doi:10.1016/j.jpdc.2016.01.004, hdl:10468/4932
  4. Coulouris, George F; Dollimore, Jean; Kindberg, Tim (2005), Distributed systems: concepts and design, Addison Wesley, p. 223, ISBN   978-0-321-26354-4
  5. Bolosky, William J.; Draves, Richard P.; Fitzgerald, Robert P.; Fraser, Christopher W.; Jones, Michael B.; Knoblock, Todd B.; Rashid, Rick (1997-05-05), "Operating System Directions for the Next Millennium", 6th Workshop on Hot Topics in Operating Systems (HotOS-VI), Cape Cod, MA, pp. 106–110, CiteSeerX   10.1.1.50.9538 , doi:10.1109/HOTOS.1997.595191, ISBN   978-0-8186-7834-9, S2CID   15380352 {{citation}}: CS1 maint: location missing publisher (link)
  6. Prabhu, C.S.R. (2009), Grid And Cluster Computing, Phi Learning, p. 256, ISBN   978-81-203-3428-1
  7. Smith, Jonathan M. (1988), "A survey of process migration mechanisms" (PDF), ACM SIGOPS Operating Systems Review, 22 (3): 28–40, CiteSeerX   10.1.1.127.8095 , doi:10.1145/47671.47673, S2CID   6611633
  8. "AIX PS/2 OS".
  9. "Open-Sharedroot GitHub repository". GitHub .
  10. Pike, Rob; Presotto, Dave; Thompson, Ken; Trickey, Howard (1990), "Plan 9 from Bell Labs", In Proceedings of the Summer 1990 UKUUG Conference, p. 8, Process migration is also deliberately absent from Plan 9.{{citation}}: Missing or empty |title= (help)