Beowulf cluster

Last updated
The Borg, a 52-node Beowulf cluster used by the McGill University pulsar group to search for pulsations from binary pulsars Beowulf-cluster-the-borg.jpg
The Borg, a 52-node Beowulf cluster used by the McGill University pulsar group to search for pulsations from binary pulsars
External image
Searchtool.svg The original Beowulf cluster built in 1994 by Thomas Sterling and Donald Becker at NASA. The cluster comprises 16 white box desktops each running a i486 DX4 processor clocked at 100 MHz, each containing a 500 MB hard disk drive, and each having 16 MB of RAM between them, leading to a total of roughly 8 GB of fixed disk storage and 256 MB of RAM shared within the cluster and a performance benchmark of 500 MFLOPS.

A Beowulf cluster is a computer cluster of what are normally identical, commodity-grade computers networked into a small local area network with libraries and programs installed which allow processing to be shared among them. The result is a high-performance parallel computing cluster from inexpensive personal computer hardware.

Contents

The name Beowulf originally referred to a specific computer built in 1994 by Thomas Sterling and Donald Becker at NASA. [1] The name "Beowulf" comes from the Old English epic poem of the same name. [2]

No particular piece of software defines a cluster as a Beowulf. Typically only free and open source software is used, both to save cost and to allow customization. Most Beowulf clusters run a Unix-like operating system, such as BSD, Linux, or Solaris. Commonly used parallel processing libraries include Message Passing Interface (MPI) and Parallel Virtual Machine (PVM). Both of these permit the programmer to divide a task among a group of networked computers, and collect the results of processing. Examples of MPI software include Open MPI or MPICH. There are additional MPI implementations available.

Beowulf systems operate worldwide, chiefly in support of scientific computing. Since 2017, every system on the Top500 list of the world's fastest supercomputers has used Beowulf software methods and a Linux operating system. At this level, however, most are by no means just assemblages of commodity hardware; custom design work is often required for the nodes (often blade servers), the networking and the cooling systems.

Development

Detail of the first Beowulf cluster at Barcelona Supercomputing Center BSC-Beowulf-cluster.JPG
Detail of the first Beowulf cluster at Barcelona Supercomputing Center

A description of the Beowulf cluster, from the original "how-to", which was published by Jacek Radajewski and Douglas Eadline under the Linux Documentation Project in 1998: [3]

Beowulf is a multi-computer architecture which can be used for parallel computations. It is a system which usually consists of one server node, and one or more client nodes connected via Ethernet or some other network. It is a system built using commodity hardware components, like any PC capable of running a Unix-like operating system, with standard Ethernet adapters, and switches. It does not contain any custom hardware components and is trivially reproducible. Beowulf also uses commodity software like the FreeBSD, Linux or Solaris operating system, Parallel Virtual Machine (PVM) and Message Passing Interface (MPI). The server node controls the whole cluster and serves files to the client nodes. It is also the cluster's console and gateway to the outside world. Large Beowulf machines might have more than one server node, and possibly other nodes dedicated to particular tasks, for example consoles or monitoring stations. In most cases, client nodes in a Beowulf system are dumb, the dumber the better. Nodes are configured and controlled by the server node, and do only what they are told to do. In a disk-less client configuration, a client node doesn't even know its IP address or name until the server tells it.

One of the main differences between Beowulf and a Cluster of Workstations (COW) is that Beowulf behaves more like a single machine rather than many workstations. In most cases client nodes do not have keyboards or monitors, and are accessed only via remote login or possibly serial terminal. Beowulf nodes can be thought of as a CPU + memory package which can be plugged into the cluster, just like a CPU or memory module can be plugged into a motherboard.

Beowulf is not a special software package, new network topology, or the latest kernel hack. Beowulf is a technology of clustering computers to form a parallel, virtual supercomputer. Although there are many software packages such as kernel modifications, PVM and MPI libraries, and configuration tools which make the Beowulf architecture faster, easier to configure, and much more usable, one can build a Beowulf class machine using a standard Linux distribution without any additional software. If you have two networked computers which share at least the /home file system via NFS, and trust each other to execute remote shells (rsh), then it could be argued that you have a simple, two node Beowulf machine.

Operating systems

A home-built Beowulf cluster composed of white box PCs Beowulf.jpg
A home-built Beowulf cluster composed of white box PCs

As of 2014 a number of Linux distributions, and at least one BSD, are designed for building Beowulf clusters. These include:

The following are no longer maintained:

A cluster can be set up by using Knoppix bootable CDs in combination with OpenMosix. The computers will automatically link together, without need for complex configurations, to form a Beowulf cluster using all CPUs and RAM in the cluster. A Beowulf cluster is scalable to a nearly unlimited number of computers, limited only by the overhead of the network.

Provisioning of operating systems and other software for a Beowulf Cluster can be automated using software, such as Open Source Cluster Application Resources. OSCAR installs on top of a standard installation of a supported Linux distribution on a cluster's head node.

See also

Related Research Articles

<span class="mw-page-title-main">Thin client</span> Non-powerful computer optimized for remote server access

In computer networking, a thin client is a simple (low-performance) computer that has been optimized for establishing a remote connection with a server-based computing environment. They are sometimes known as network computers, or in their simplest form as zero clients. The server does most of the work, which can include launching software programs, performing calculations, and storing data. This contrasts with a rich client or a conventional personal computer; the former is also intended for working in a client–server model but has significant local processing power, while the latter aims to perform its function mostly locally.

In computing, a virtual machine (VM) is the virtualization or emulation of a computer system. Virtual machines are based on computer architectures and provide the functionality of a physical computer. Their implementations may involve specialized hardware, software, or a combination of the two. Virtual machines differ and are organized by their function, shown here:

<span class="mw-page-title-main">Server (computing)</span> Computer to access a central resource or service on a network

In computing, a server is a piece of computer hardware or software that provides functionality for other programs or devices, called "clients". This architecture is called the client–server model. Servers can provide various functionalities, often called "services", such as sharing data or resources among multiple clients or performing computations for a client. A single server can serve multiple clients, and a single client can use multiple servers. A client process may run on the same device or may connect over a network to a server on a different device. Typical servers are database servers, file servers, mail servers, print servers, web servers, game servers, and application servers.

Parallel Virtual Machine (PVM) is a software tool for parallel networking of computers. It is designed to allow a network of heterogeneous Unix and/or Windows machines to be used as a single distributed parallel processor. Thus large computational problems can be solved more cost effectively by using the aggregate power and memory of many computers. The software is very portable; the source code, available free through netlib, has been compiled on everything from laptops to Crays.

<span class="mw-page-title-main">Quadrics (company)</span>

Quadrics was a supercomputer company formed in 1996 as a joint venture between Alenia Spazio and the technical team from Meiko Scientific. They produced hardware and software for clustering commodity computer systems into massively parallel systems. Their highpoint was in June 2003 when six out of the ten fastest supercomputers in the world were based on Quadrics' interconnect. They officially closed on June 29, 2009.

<span class="mw-page-title-main">Live CD</span> Complete, bootable computer installation that runs directly from a CD-ROM

A live CD is a complete bootable computer installation including operating system which runs directly from a CD-ROM or similar storage device into a computer's memory, rather than loading from a hard disk drive. A live CD allows users to run an operating system for any purpose without installing it or making any changes to the computer's configuration. Live CDs can run on a computer without secondary storage, such as a hard disk drive, or with a corrupted hard disk drive or file system, allowing data recovery.

MOSIX is a proprietary distributed operating system. Although early versions were based on older UNIX systems, since 1999 it focuses on Linux clusters and grids. In a MOSIX cluster/grid there is no need to modify or to link applications with any library, to copy files or login to remote nodes, or even to assign processes to different nodes – it is all done automatically, like in an SMP.

openMosix Distributed operating system

openMosix was a free cluster management system that provided single-system image (SSI) capabilities, e.g. automatic work distribution among nodes. It allowed program processes to migrate to machines in the node's network that would be able to run that process faster. It was particularly useful for running parallel applications having low to moderate input/output (I/O). It was released as a Linux kernel patch, but was also available on specialized Live CDs. openMosix development has been halted by its developers, but the LinuxPMI project is continuing development of the former openMosix code.

The Stone Soupercomputer was a Beowulf-style computer cluster built at the US Oak Ridge National Laboratory in the late 1990s.

<span class="mw-page-title-main">Diskless node</span> Computer workstation operated without disk drives

A diskless node is a workstation or personal computer without disk drives, which employs network booting to load its operating system from a server.

The Parallel Virtual File System (PVFS) is an open-source parallel file system. A parallel file system is a type of distributed file system that distributes file data across multiple servers and provides for concurrent access by multiple tasks of a parallel application. PVFS was designed for use in large scale cluster computing. PVFS focuses on high performance access to large data sets. It consists of a server process and a client library, both of which are written entirely of user-level code. A Linux kernel module and pvfs-client process allow the file system to be mounted and used with standard utilities. The client library provides for high performance access via the message passing interface (MPI). PVFS is being jointly developed between The Parallel Architecture Research Laboratory at Clemson University and the Mathematics and Computer Science Division at Argonne National Laboratory, and the Ohio Supercomputer Center. PVFS development has been funded by NASA Goddard Space Flight Center, The DOE Office of Science Advanced Scientific Computing Research program, NSF PACI and HECURA programs, and other government and private agencies. PVFS is now known as OrangeFS in its newest development branch.

Quantian OS was a remastering of Knoppix/Debian for computational sciences. The environment was self-configuring and directly bootable CD/DVD that turns any PC or laptop into a Linux workstation. Quantian also incorporated clusterKnoppix and added support for openMosix, including remote booting of light clients in an openMosix terminal server context permitting rapid setup of a SMP cluster computer.

A clustered file system (CFS) is a file system which is shared by being simultaneously mounted on multiple servers. There are several approaches to clustering, most of which do not employ a clustered file system. Clustered file systems can provide features like location-independent addressing and redundancy which improve reliability or reduce the complexity of the other parts of the cluster. Parallel file systems are a type of clustered file system that spread data across multiple storage nodes, usually for redundancy or performance.

<span class="mw-page-title-main">Computer cluster</span> Set of computers configured in a distributed computing system

A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software.

<span class="mw-page-title-main">History of computer clusters</span>

The history of computer clusters is best captured by a footnote in Greg Pfister's In Search of Clusters: "Virtually every press release from DEC mentioning clusters says ‘DEC, who invented clusters...’. IBM did not invent them either. Customers invented clusters, as soon as they could not fit all their work on one computer, or needed a backup. The date of the first is unknown, but it would be surprising if it was not in the 1960s, or even late 1950s."

<span class="mw-page-title-main">Supercomputer operating system</span> Use of Operative System by type of extremely powerful computer

A supercomputer operating system is an operating system intended for supercomputers. Since the end of the 20th century, supercomputer operating systems have undergone major transformations, as fundamental changes have occurred in supercomputer architecture. While early operating systems were custom tailored to each supercomputer to gain speed, the trend has been moving away from in-house operating systems and toward some form of Linux, with it running all the supercomputers on the TOP500 list in November 2017. In 2021, top 10 computers run for instance Red Hat Enterprise Linux (RHEL), or some variant of it or other Linux distribution e.g. Ubuntu.

<span class="mw-page-title-main">Message passing in computer clusters</span> Aspect of computer clusters

Message passing is an inherent element of all computer clusters. All computer clusters, ranging from homemade Beowulfs to some of the fastest supercomputers in the world, rely on message passing to coordinate the activities of the many nodes they encompass. Message passing in computer clusters built with commodity servers and switches is used by virtually every internet service.

OrangeFS is an open-source parallel file system, the next generation of Parallel Virtual File System (PVFS). A parallel file system is a type of distributed file system that distributes file data across multiple servers and provides for concurrent access by multiple tasks of a parallel application. OrangeFS was designed for use in large-scale cluster computing and is used by companies, universities, national laboratories and similar sites worldwide.

References

  1. Becker, Donald J; Sterling, Thomas; Savarese, Daniel; Dorband, John E; Ranawak, Udaya A; Packer, Charles V (1995). "BEOWULF: A parallel workstation for scientific computation". Proceedings, International Conference on Parallel Processing. 95.
  2. See Francis Barton Gummere's 1909 translation, reprinted (for example) in Beowulf. Translated by Francis B. Gummere. Hayes Barton Press. 1909. p. 20. ISBN   9781593773700 . Retrieved 2014-01-16.[ dead link ]
  3. Radajewski, Radajewski; Eadline, Douglas (22 November 1998). "Beowulf HOWTO". ibiblio.org. v1.1.1. Retrieved 8 June 2021.

Bibliography