Warewulf

Last updated

Warewulf is a computer cluster implementation toolkit that facilitates the process of installing a cluster and long term administration. [1]

Contents

Toolkit

Warewulf does this by changing the administration paradigm to make all of the slave node file systems manageable from one point, and automate the distribution of the node file system during node boot. It allows a central administration model for all slave nodes and includes the tools needed to build configuration files, monitor, and control the nodes. It is totally customizable and can be adapted to just about any type of cluster. From the software administration perspective it does not make much difference if you are running 2 nodes or 500 nodes. The procedure is still the same, which is why Warewulf is scalable from the admins perspective. Also, because it uses a standard chroot'able file system for every node, it is extremely configurable and lends itself to custom environments very easily.

While Warewulf was designed to be a high-performance computing (HPC) system, it is not an HPC system in itself. Warewulf is more along the lines of a distributed Linux distribution, or more specifically a system for replicating and managing small, lightweight Linux systems from one master. Using Warewulf, HPC packages such as LAM/MPI/MPICH, Sun Grid Engine, PVM, etc. can be easily deployed throughout the cluster.

Warewulf solves the problem of slave node management rather than being a strict HPC specific system (even though it was designed with HPC in mind). Because of this it is as flexible as a home grown cluster, but administratively scales very well. As a result of this flexibility and ease of customization, Warewulf has been used not only on production HPC implementations, but also development systems like KASY0 (the first system to break the one hundred dollar per GFLOPS barrier), and non HPC systems such as web server cluster farms, intrusion detection clusters, and high-availability clusters.

See also

Related Research Articles

<span class="mw-page-title-main">Beowulf cluster</span> Type of computing cluster

A Beowulf cluster is a computer cluster of what are normally identical, commodity-grade computers networked into a small local area network with libraries and programs installed which allow processing to be shared among them. The result is a high-performance parallel computing cluster from inexpensive personal computer hardware.

<span class="mw-page-title-main">Yellow Dog Linux</span> Linux distribution

Yellow Dog Linux (YDL) is a discontinued free and open-source operating system for high-performance computing on multi-core processor computer architectures, focusing on GPU systems and computers using the POWER7 processor. The original developer was Terra Soft Solutions, which was acquired by Fixstars in October 2008. Yellow Dog Linux was first released in the spring of 1999 for Apple Macintosh PowerPC-based computers. The most recent version, Yellow Dog Linux 7, was released on August 6, 2012. Yellow Dog Linux lent its name to the popular YUM Linux software updater, derived from YDL's YUP and thus called Yellowdog Updater, Modified.

MOSIX is a proprietary distributed operating system. Although early versions were based on older UNIX systems, since 1999 it focuses on Linux clusters and grids. In a MOSIX cluster/grid there is no need to modify or to link applications with any library, to copy files or login to remote nodes, or even to assign processes to different nodes – it is all done automatically, like in an SMP.

<span class="mw-page-title-main">High-performance computing</span> Computing with supercomputers and clusters

High-performance computing (HPC) uses supercomputers and computer clusters to solve advanced computation problems.

Oracle Grid Engine, previously known as Sun Grid Engine (SGE), CODINE or GRD, was a grid computing computer cluster software system, acquired as part of a purchase of Gridware, then improved and supported by Sun Microsystems and later Oracle. There have been open source versions and multiple commercial versions of this technology, initially from Sun, later from Oracle and then from Univa Corporation.

WebSphere Application Server (WAS) is a software product that performs the role of a web application server. More specifically, it is a software framework and middleware that hosts Java-based web applications. It is the flagship product within IBM's WebSphere software suite. It was initially created by Donald F. Ferguson, who later became CTO of Software for Dell. The first version was launched in 1998. This project was an offshoot from IBM HTTP Server team starting with the Domino Go web server.

Lustre is a type of parallel distributed file system, generally used for large-scale cluster computing. The name Lustre is a portmanteau word derived from Linux and cluster. Lustre file system software is available under the GNU General Public License and provides high performance file systems for computer clusters ranging in size from small workgroup clusters to large-scale, multi-site systems. Since June 2005, Lustre has consistently been used by at least half of the top ten, and more than 60 of the top 100 fastest supercomputers in the world, including the world's No. 1 ranked TOP500 supercomputer in November 2022, Frontier, as well as previous top supercomputers such as Fugaku, Titan and Sequoia.

<span class="mw-page-title-main">SUSE Linux Enterprise</span> Linux distribution

SUSE Linux Enterprise (SLE) is a Linux-based operating system developed by SUSE. It is available in two editions, suffixed with Server (SLES) for servers and mainframes, and Desktop (SLED) for workstations and desktop computers.

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop was originally designed for computer clusters built from commodity hardware, which is still the common use. It has since also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.

Rocks Cluster Distribution is a Linux distribution intended for high-performance computing (HPC) clusters. It was started by National Partnership for Advanced Computational Infrastructure and the San Diego Supercomputer Center (SDSC) in 2000. It was initially funded in part by an NSF grant (2000–07), but was funded by the follow-up NSF grant through 2011.

Ceph is a free and open-source software-defined storage platform that provides object storage, block storage, and file storage built on a common distributed cluster foundation. Ceph provides completely distributed operation without a single point of failure and scalability to the exabyte level, and is freely available. Since version 12 (Luminous), Ceph does not rely on any other conventional filesystem and directly manages HDDs and SSDs with its own storage backend BlueStore and can expose a POSIX filesystem.

<span class="mw-page-title-main">Computer cluster</span> Set of computers configured in a distributed computing system

A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software. The newest manifestation of cluster computing is cloud computing.

<span class="mw-page-title-main">BioSLAX</span>

BioSLAX is a Live CD/Live DVD/Live USB comprising a suite of more than 300 bioinformatics tools and application suites. It has been released by the Bioinformatics Resource Unit of the Life Sciences Institute (LSI), National University of Singapore (NUS) and is bootable from any PC that allows a CD/DVD or USB boot option and runs the compressed Slackware flavour of the Linux Operating System (OS), also known as Slax. Slax was created by Tomáš Matějíček in the Czech Republic using the Linux Live Scripts which he also developed. The BioSLAX derivative was created by Mark De Silva, Lim Kuan Siong and Tan Tin Wee.

<span class="mw-page-title-main">Slurm Workload Manager</span> Free and open-source job scheduler for Linux and similar computers

The Slurm Workload Manager, formerly known as Simple Linux Utility for Resource Management (SLURM), or simply Slurm, is a free and open-source job scheduler for Linux and Unix-like kernels, used by many of the world's supercomputers and computer clusters.

BeeGFS is a parallel file system, developed and optimized for high-performance computing. BeeGFS includes a distributed metadata architecture for scalability and flexibility reasons. Its most used and widely known aspect is data throughput.

Kubernetes is an open-source container orchestration system for automating software deployment, scaling, and management. Originally designed by Google, the project is now maintained by a worldwide community of contributors, and the trademark is held by the Cloud Native Computing Foundation.

<span class="mw-page-title-main">Hierarchical Cluster Engine Project</span>

Hierarchical Cluster Engine (HCE) is a FOSS complex solution for: construct custom network mesh or distributed network cluster structure with several relations types between nodes, formalize the data flow processing goes from upper node level central source point to down nodes and backward, formalize the management requests handling from multiple source points, support native reducing of multiple nodes results, internally support powerful full-text search engine and data storage, provide transactions-less and transactional requests processing, support flexible run-time changes of cluster infrastructure, have many languages bindings for client-side integration APIs in one product build on C++ language.

ONTAP or Data ONTAP or Clustered Data ONTAP (cDOT) or Data ONTAP 7-Mode is NetApp's proprietary operating system used in storage disk arrays such as NetApp FAS and AFF, ONTAP Select, and Cloud Volumes ONTAP. With the release of version 9.0, NetApp decided to simplify the Data ONTAP name and removed the word "Data" from it, removed the 7-Mode image, therefore, ONTAP 9 is the successor of Clustered Data ONTAP 8.

<span class="mw-page-title-main">Singularity (software)</span> Free, cross-platform and open-source computer program

Singularity is a free and open-source computer program that performs operating-system-level virtualization also known as containerization.

References

  1. Layton, Jeff. "Warewulf Cluster Manager – Howlingly Great". admin-magazine.com. Retrieved February 2, 2024.