Peloton (supercomputer)

Last updated

The Peloton supercomputer purchase is a program at the Lawrence Livermore National Laboratory intended to provide tera-FLOP computing capability using commodity Scalable Units (SUs). The Peloton RFP defines the system configurations. [1]

The Atlas cluster Atlas.440pix.jpg
The Atlas cluster

Appro was awarded the contract for Peloton which includes the following machines:

MachineNodesTPP (TFLops)
atlas115244.24
hopi802.92
minos86433.18
rhea57622.12
yana803.07
zeus28811.06

All of the machines run the CHAOS variant of Red Hat Enterprise Linux and the Moab resource management system. Under the project management of John Lee, the team at Synnex, Voltaire, Supermicro and other suppliers, the scientists were able to dramatically reduce the amount of time it took to go from starting the cluster build to actually having hardware at Livermore in production. In particular, it went from having four SUs on the floor on a Thursday, to bringing in two more SUs for the final cluster and by Saturday, having all of them wired up, burned in, and running Linpack.

Related Research Articles

Lawrence Livermore National Laboratory Federal research institute in Livermore, California, United States

Lawrence Livermore National Laboratory (LLNL) is a federal research facility in Livermore, California, United States, founded by the University of California, Berkeley in 1952. Originally a branch of the Lawrence Berkeley National Laboratory, the Lawrence Livermore laboratory became autonomous in 1971 and was designated a national laboratory in 1981.

Beowulf cluster

A Beowulf cluster is a computer cluster of what are normally identical, commodity-grade computers networked into a small local area network with libraries and programs installed which allow processing to be shared among them. The result is a high-performance parallel computing cluster from inexpensive personal computer hardware.

Quadrics (company)

Quadrics was a supercomputer company formed in 1996 as a joint venture between Alenia Spazio and the technical team from Meiko Scientific. They produced hardware and software for clustering commodity computer systems into massively parallel systems. Their highpoint was in June 2003 when six out of the ten fastest supercomputers in the world were based on Quadrics' interconnect. They officially closed on June 29, 2009.

openMosix comouter operating system

openMosix was a free cluster management system that provided single-system image (SSI) capabilities, e.g. automatic work distribution among nodes. It allowed program processes to migrate to machines in the node's network that would be able to run that process faster. It was particularly useful for running parallel applications having low to moderate input/output (I/O). It was released as a Linux kernel patch, but was also available on specialized Live CDs. openMosix development has been halted by its developers, but the LinuxPMI project is continuing development of the former openMosix code.

OpenSSI is an open-source single-system image clustering system. It allows a collection of computers to be treated as one large system, allowing applications running on any one machine access to the resources of all the machines in the cluster.

Linux Terminal Server Project (LTSP) is a free and open source terminal server for Linux that allows many people to simultaneously use the same computer. Applications run on the server with a terminal known as a thin client handling input and output. Generally, terminals are low-powered, lack a hard disk and are quieter and more reliable than desktop computers because they do not have any moving parts.

oneSIS is an open-source software tool developed at Sandia National Laboratories aimed at easing systems administration in large-scale, Linux cluster environments.

Rocks Cluster Distribution is a Linux distribution intended for high-performance computing (HPC) clusters. It was started by National Partnership for Advanced Computational Infrastructure and the San Diego Supercomputer Center (SDSC) in 2000. It was initially funded in part by an NSF grant (2000–07), but was funded by the follow-up NSF grant through 2011.

OpenSAF is an open-source service-orchestration system for automating computer application deployment, scaling, and management. OpenSAF is consistent with, and expands upon, Service Availability Forum (SAF) and SCOPE Alliance standards.

Ceph is an open-source software storage platform, implements object storage on a single distributed computer cluster, and provides 3-in-1 interfaces for object-, block- and file-level storage. Ceph aims primarily for completely distributed operation without a single point of failure, scalable to the exabyte level, and freely available.

High Performance Storage System (HPSS) is a flexible, scalable, policy-based Hierarchical Storage Management product developed by the HPSS Collaboration. It provides scalable hierarchical storage management (HSM), archive, and file system services using cluster, LAN and SAN technologies to aggregate the capacity and performance of many computers, disks, disk systems, tape drives and tape libraries.

oVirt

oVirt is a free, open-source virtualization management platform. It was founded by Red Hat as a community project on which Red Hat Enterprise Virtualization is based. It allows centralized management of virtual machines, compute, storage and networking resources, from an easy-to-use web-based front-end with platform independent access. KVM on x86-64 and PowerPC64 architecture are the only hypervisors supported, but there is an ongoing effort to support ARM architecture in the future releases.

Computer cluster

A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software.

Slurm Workload Manager Free and open-source job scheduler for Linux and similar computers

The Slurm Workload Manager, formerly known as Simple Linux Utility for Resource Management (SLURM), or simply Slurm, is a free and open-source job scheduler for Linux and Unix-like kernels, used by many of the world's supercomputers and computer clusters.

LXC

LXC is an operating-system-level virtualization method for running multiple isolated Linux systems (containers) on a control host using a single Linux kernel.

A supercomputer operating system is an operating system intended for supercomputers. Since the end of the 20th century, supercomputer operating systems have undergone major transformations, as fundamental changes have occurred in supercomputer architecture. While early operating systems were custom tailored to each supercomputer to gain speed, the trend has been moving away from in-house operating systems and toward some form of Linux, with it running all the supercomputers on the TOP500 list in November 2017.

Appro American technology company

Appro was a developer of supercomputing supporting High Performance Computing (HPC) markets focused on medium- to large-scale deployments. Appro was based in Milpitas, California with a computing center in Houston, Texas, and a manufacturing and support subsidiary in South Korea and Japan.

Linux kernel-based operating systems have been widely adopted in a very wide range of uses. All the advantages and benefits of free and open-source software apply to the Linux kernel, and to most of the rest of the system software.

Proxmox Virtual Environment Linux distribution for server virtualization

Proxmox Virtual Environment is an open-source server virtualization management platform. It is a Debian-based Linux distribution with a modified Ubuntu LTS kernel and allows deployment and management of virtual machines and containers. Proxmox VE includes a web console and command-line tools, and provides a REST API for third-party tools. Two types of virtualization are supported: container-based with LXC, and full virtualization with KVM. It comes with a bare-metal installer and includes a web-based management interface.

Container Linux was an open-source lightweight operating system based on the Linux kernel and designed for providing infrastructure to clustered deployments, while focusing on automation, ease of application deployment, security, reliability and scalability. As an operating system, Container Linux provided only the minimal functionality required for deploying applications inside software containers, together with built-in mechanisms for service discovery and configuration sharing.

References

  1. "Linux at Livermore". Archived from the original on 2020-12-15. Retrieved 2007-03-01.