SUNMOS

Last updated

SUNMOS (Sandia/UNM Operating System) is an operating system jointly developed by Sandia National Laboratories and the Computer Science Department at the University of New Mexico. The goal of the project, started in 1991, is to develop a highly portable, yet efficient, operating system for massively parallel-distributed memory systems. [1]

Contents

SUNMOS uses a single-tasking kernel and does not provide demand paging. It takes control of all nodes in the distributed system. Once an application is loaded and running, it can manage all the available memory on a node and use the full resources provided by the hardware. Applications are started and controlled from a process called yod that runs on the host node. Yod runs on a Sun frontend for the nCUBE 2, and on a service node on the Intel Paragon.

SUNMOS was developed as a reaction to the heavy weight version of OSF/1 that ran as a single-system image on the Paragon and consumed 8-12 MB of the 16 MB available on each node, leaving little memory available for the compute applications. In comparison, SUNMOS used 250 KB of memory per node. Additionally, the overhead of OSF/1 limited the network bandwidth to 35 MB/s, while SUNMOS was able to use 170 MB/s of the peak 200 MB/s available. [2]

The ideas in SUNMOS inspired PUMA, a multitasking variant that only ran on the i860 Paragon. Among the extensions in PUMA was the Portals API, a scalable, high performance message passing API. Intel ported PUMA and Portals to the Pentium Pro based ASCI Red system and named it Cougar. Cray ported Cougar to the Opteron based Cray XT3 and renamed it Catamount. A version of Catamount was released to the public named OpenCatamount.

In 2009, the Catamount lightweight kernel was selected for an R&D 100 Award. [3] [4]

See also

Related Research Articles

<span class="mw-page-title-main">Supercomputer</span> Type of extremely powerful computer

A supercomputer is a computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instructions per second (MIPS). Since 2017, there have existed supercomputers which can perform over 1017 FLOPS (a hundred quadrillion FLOPS, 100 petaFLOPS or 100 PFLOPS). For comparison, a desktop computer has performance in the range of hundreds of gigaFLOPS (1011) to tens of teraFLOPS (1013). Since November 2017, all of the world's fastest 500 supercomputers run on Linux-based operating systems. Additional research is being conducted in the United States, the European Union, Taiwan, Japan, and China to build faster, more powerful and technologically superior exascale supercomputers.

nCUBE was a series of parallel computing computers from the company of the same name. Early generations of the hardware used a custom microprocessor. With its final generations of servers, nCUBE no longer designed custom microprocessors for machines, but used server-class chips manufactured by a third party in massively parallel hardware deployments, primarily for the purposes of on-demand video.

In computer architecture, 64-bit integers, memory addresses, or other data units are those that are 64 bits wide. Also, 64-bit CPUs and ALUs are those that are based on processor registers, address buses, or data buses of that size. A computer that uses such a processor is a 64-bit computer.

OSF/1 is a variant of the Unix operating system developed by the Open Software Foundation during the late 1980s and early 1990s. OSF/1 is one of the first operating systems to have used the Mach kernel developed at Carnegie Mellon University, and is probably best known as the native Unix operating system for DEC Alpha architecture systems.

<span class="mw-page-title-main">ASCI Red</span> Supercomputer

ASCI Red was the first computer built under the Accelerated Strategic Computing Initiative (ASCI), the supercomputing initiative of the United States government created to help the maintenance of the United States nuclear arsenal after the 1992 moratorium on nuclear testing.

Lustre is a type of parallel distributed file system, generally used for large-scale cluster computing. The name Lustre is a portmanteau word derived from Linux and cluster. Lustre file system software is available under the GNU General Public License and provides high performance file systems for computer clusters ranging in size from small workgroup clusters to large-scale, multi-site systems. Since June 2005, Lustre has consistently been used by at least half of the top ten, and more than 60 of the top 100 fastest supercomputers in the world, including the world's No. 1 ranked TOP500 supercomputer in November 2022, Frontier, as well as previous top supercomputers such as Fugaku, Titan and Sequoia.

Red Storm is a supercomputer architecture designed for the US Department of Energy’s National Nuclear Security Administration Advanced Simulation and Computing Program. Cray, Inc developed it based on the contracted architectural specifications provided by Sandia National Laboratories. The architecture was later commercially produced as the Cray XT3.

<span class="mw-page-title-main">Cray XT3</span> Distributed memory massively parallel MIMD supercomputer

The Cray XT3 is a distributed memory massively parallel MIMD supercomputer designed by Cray Inc. with Sandia National Laboratories under the codename Red Storm. Cray turned the design into a commercial product in 2004. The XT3 derives much of its architecture from the previous Cray T3E system, and also from the Intel ASCI Red supercomputer.

<span class="mw-page-title-main">Intel Paragon</span>

The Intel Paragon is a discontinued series of massively parallel supercomputers that was produced by Intel in the 1990s. The Paragon XP/S is a productized version of the experimental Touchstone Delta system that was built at Caltech, launched in 1992. The Paragon superseded Intel's earlier iPSC/860 system, to which it is closely related.

<span class="mw-page-title-main">Intel iPSC</span>

The Intel Personal SuperComputer was a product line of parallel computers in the 1980s and 1990s. The iPSC/1 was superseded by the Intel iPSC/2, and then the Intel iPSC/860.

The Cray CX1 is a deskside high-performance workstation designed by Cray Inc., based on the x86-64 processor architecture. It was launched on September 16, 2008, and was discontinued in early 2012. It comprises a single chassis blade server design that supports a maximum of eight modular single-width blades, giving up to 96 processor cores. Computational load can be run independently on each blade and/or combined using clustering techniques.

Locus Computing Corporation was formed in 1982 by Gerald J. Popek, Charles S. Kline and Gregory I. Thiel to commercialize the technologies developed for the LOCUS distributed operating system at UCLA. Locus was notable for commercializing single-system image software and producing the Merge package which allowed the use of DOS and Windows 3.1 software on Unix systems.

The National Center for Computational Sciences (NCCS) is a United States Department of Energy (DOE) Leadership Computing Facility that houses the Oak Ridge Leadership Computing Facility (OLCF), a DOE Office of Science User Facility charged with helping researchers solve challenging scientific problems of global interest with a combination of leading high-performance computing (HPC) resources and international expertise in scientific computing.

Intel oneAPI Math Kernel Library is a library of optimized math routines for science, engineering, and financial applications. Core math functions include BLAS, LAPACK, ScaLAPACK, sparse solvers, fast Fourier transforms, and vector math.

Portals is a low-level network API for high-performance networking on high-performance computing systems developed by Sandia National Laboratories and the University of New Mexico. Portals is currently the lowest-level network programming interface on the commercially successful XT line of supercomputers from Cray.

A lightweight kernel (LWK) operating system is one used in a large computer with many processor cores, termed a parallel computer.

<span class="mw-page-title-main">Slurm Workload Manager</span> Free and open-source job scheduler for Linux and similar computers

The Slurm Workload Manager, formerly known as Simple Linux Utility for Resource Management (SLURM), or simply Slurm, is a free and open-source job scheduler for Linux and Unix-like kernels, used by many of the world's supercomputers and computer clusters.

<span class="mw-page-title-main">CNK operating system</span>

Compute Node Kernel (CNK) is the node level operating system for the IBM Blue Gene series of supercomputers.

<span class="mw-page-title-main">Catamount (operating system)</span> Operating system for supercomputers

Catamount is an operating system for supercomputers.

<span class="mw-page-title-main">Supercomputer operating system</span> Use of Operative System by type of extremely powerful computer

A supercomputer operating system is an operating system intended for supercomputers. Since the end of the 20th century, supercomputer operating systems have undergone major transformations, as fundamental changes have occurred in supercomputer architecture. While early operating systems were custom tailored to each supercomputer to gain speed, the trend has been moving away from in-house operating systems and toward some form of Linux, with it running all the supercomputers on the TOP500 list in November 2017. In 2021, top 10 computers run for instance Red Hat Enterprise Linux (RHEL), or some variant of it or other Linux distribution e.g. Ubuntu.

References

  1. Rolf Riesen, Lee Ann Fisk; et al. "SUNMOS?" . Retrieved 2006-05-19.a paper that explains what SUNMOS is (CiteSeer cached copy)
  2. Rolf Riesen; et al. "Designing and implementing lightweight kernels for capability computing". Archived from the original on 2013-01-05. Retrieved 2009-10-12.
  3. "Operating system boosts high performance computing". R&D Magazine. 2009-07-30. Archived from the original on 2013-02-01. Retrieved 2009-11-10.
  4. "Sandia wins five R&D 100 awards, plays role in sixth" (Press release). Sandia National Laboratories. 2009-07-20. Retrieved 2009-11-10.