Supercomputer operating systems

Last updated April 25, 2020

Since the end of the 20th century, supercomputer operating systems have undergone major transformations, as fundamental changes have occurred in supercomputer architecture.^[1] While early operating systems were custom tailored to each supercomputer to gain speed, the trend has been moving away from in-house operating systems and toward some form of Linux,^[2] with it running all the supercomputers on the TOP500 list in November 2017.

Given that modern massively parallel supercomputers typically separate computations from other services by using multiple types of nodes, they usually run different operating systems on different nodes, e.g., using a small and efficient lightweight kernel such as Compute Node Kernel (CNK) or Compute Node Linux (CNL) on compute nodes, but a larger system such as a Linux-derivative on server and input/output (I/O) nodes.^[3]^[4]

While in a traditional multi-user computer system job scheduling is in effect a tasking problem for processing and peripheral resources, in a massively parallel system, the job management system needs to manage the allocation of both computational and communication resources, as well as gracefully dealing with inevitable hardware failures when tens of thousands of processors are present.^[5]

Although most modern supercomputers use the Linux operating system,^[6] each manufacturer has made its own specific changes to the Linux-derivative they use, and no industry standard exists, partly because the differences in hardware architectures require changes to optimize the operating system to each hardware design.^[1]^[7]

Context and overview

In the early days of supercomputing, the basic architectural concepts were evolving rapidly, and system software had to follow hardware innovations that usually took rapid turns.^[1] In the early systems, operating systems were custom tailored to each supercomputer to gain speed, yet in the rush to develop them, serious software quality challenges surfaced and in many cases the cost and complexity of system software development became as much an issue as that of hardware.^[1]

In the 1980s the cost for software development at Cray came to equal what they spent on hardware and that trend was partly responsible for a move away from the in-house operating systems to the adaptation of generic software.^[2] The first wave in operating system changes came in the mid 1980s, as vendor specific operating systems were abandoned in favor of Unix. Despite early skepticism, this transition proved successful.^[1]^[2]

By the early 1990s, major changes were occurring in supercomputing system software.^[1] By this time, the growing use of Unix had begun to change the way system software was viewed. The use of a high level language (C) to implement the operating system, and the reliance on standardized interfaces was in contrast to the assembly language oriented approaches of the past.^[1] As hardware vendors adapted Unix to their systems, new and useful features were added to Unix, e.g., fast file systems and tunable process schedulers.^[1] However, all the companies that adapted Unix made unique changes to it, rather than collaborating on an industry standard to create "Unix for supercomputers". This was partly because differences in their architectures required these changes to optimize Unix to each architecture.^[1]

Thus as general purpose operating systems became stable, supercomputers began to borrow and adapt the critical system code from them and relied on the rich set of secondary functions that came with them, not having to reinvent the wheel.^[1] However, at the same time the size of the code for general purpose operating systems was growing rapidly. By the time Unix-based code had reached 500,000 lines long, its maintenance and use was a challenge.^[1] This resulted in the move to use microkernels which used a minimal set of the operating system functions. Systems such as Mach at Carnegie Mellon University and ChorusOS at INRIA were examples of early microkernels.^[1]

The separation of the operating system into separate components became necessary as supercomputers developed different types of nodes, e.g., compute nodes versus I/O nodes. Thus modern supercomputers usually run different operating systems on different nodes, e.g., using a small and efficient lightweight kernel such as CNK or CNL on compute nodes, but a larger system such as a Linux-derivative on server and I/O nodes.^[3]^[4]

Early systems

The CDC 6600, generally considered the first supercomputer in the world, ran the Chippewa Operating System, which was then deployed on various other CDC 6000 series computers.^[9] The Chippewa was a rather simple job control oriented system derived from the earlier CDC 3000, but it influenced the later KRONOS and SCOPE systems.^[9]^[10]

The first Cray-1 was delivered to the Los Alamos Lab with no operating system, or any other software.^[11] Los Alamos developed the application software for it, and the operating system.^[11] The main timesharing system for the Cray 1, the Cray Time Sharing System (CTSS), was then developed at the Livermore Labs as a direct descendant of the Livermore Time Sharing System (LTSS) for the CDC 6600 operating system from twenty years earlier.^[11]

In developing supercomputers, rising software costs soon became dominant, as evidenced by the 1980s cost for software development at Cray growing to equal their cost for hardware.^[2] That trend was partly responsible for a move away from the in-house Cray Operating System to UNICOS system based on Unix.^[2] In 1985, the Cray-2 was the first system to ship with the UNICOS operating system.^[12]

Around the same time, the EOS operating system was developed by ETA Systems for use in their ETA10 supercomputers.^[13] Written in Cybil, a Pascal-like language from Control Data Corporation, EOS highlighted the stability problems in developing stable operating systems for supercomputers and eventually a Unix-like system was offered on the same machine.^[13]^[14] The lessons learned from developing ETA system software included the high level of risk associated with developing a new supercomputer operating system, and the advantages of using Unix with its large extant base of system software libraries.^[13]

By the middle 1990s, despite the extant investment in older operating systems, the trend was toward the use of Unix-based systems, which also facilitated the use of interactive graphical user interfaces (GUIs) for scientific computing across multiple platforms.^[15] The move toward a commodity OS had opponents, who cited the fast pace and focus of Linux development as a major obstacle against adoption.^[16] As one author wrote "Linux will likely catch up, but we have large-scale systems now". Nevertheless, that trend continued to gain momentum and by 2005, virtually all supercomputers used some Unix-like OS.^[17] These variants of Unix included IBM AIX, the open source Linux system, and other adaptations such as UNICOS from Cray.^[17] By the end of the 20th century, Linux was estimated to command the highest share of the supercomputing pie.^[1]^[18]

Modern approaches

The IBM Blue Gene supercomputer uses the CNK operating system on the compute nodes, but uses a modified Linux-based kernel called I/O Node Kernel (INK) on the I/O nodes.^[3]^[19] CNK is a lightweight kernel that runs on each node and supports a single application running for a single user on that node. For the sake of efficient operation, the design of CNK was kept simple and minimal, with physical memory being statically mapped and the CNK neither needing nor providing scheduling or context switching.^[3] CNK does not even implement file I/O on the compute node, but delegates that to dedicated I/O nodes.^[19] However, given that on the Blue Gene multiple compute nodes share a single I/O node, the I/O node operating system does require multi-tasking, hence the selection of the Linux-based operating system.^[3]^[19]

While in traditional multi-user computer systems and early supercomputers, job scheduling was in effect a task scheduling problem for processing and peripheral resources, in a massively parallel system, the job management system needs to manage the allocation of both computational and communication resources.^[5] It is essential to tune task scheduling, and the operating system, in different configurations of a supercomputer. A typical parallel job scheduler has a master scheduler which instructs some number of slave schedulers to launch, monitor, and control parallel jobs, and periodically receives reports from them about the status of job progress.^[5]

Some, but not all supercomputer schedulers attempt to maintain locality of job execution. The PBS Pro scheduler used on the Cray XT3 and Cray XT4 systems does not attempt to optimize locality on its three-dimensional torus interconnect, but simply uses the first available processor.^[20] On the other hand, IBM's scheduler on the Blue Gene supercomputers aims to exploit locality and minimize network contention by assigning tasks from the same application to one or more midplanes of an 8x8x8 node group.^[20] The Slurm Workload Manager scheduler uses a best fit algorithm, and performs Hilbert curve scheduling to optimize locality of task assignments.^[20] Several modern supercomputers such as the Tianhe-2 use Slurm, which arbitrates contention for resources across the system. Slurm is open source, Linux-based, very scalable, and can manage thousands of nodes in a computer cluster with a sustained throughput of over 100,000 jobs per hour.^[21]^[22]

Related Research Articles

An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs.

A supercomputer is a computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instructions per second (MIPS). Since 2017, there are supercomputers which can perform over a hundred quadrillion FLOPS (petaFLOPS). Since November 2017, all of the world's fastest 500 supercomputers run Linux-based operating systems. Additional research is being conducted in China, the United States, the European Union, Taiwan and Japan to build faster, more powerful and technologically superior exascale supercomputers.

UNICOS is the name of a range of Unix-like operating system variants developed by Cray for its supercomputers. UNICOS is the successor of the Cray Operating System (COS). It provides network clustering and source code compatibility layers for some other Unixes. UNICOS was originally introduced in 1985 with the Cray-2 system and later ported to other Cray models. The original UNICOS was based on UNIX System V Release 2, and had numerous BSD features added to it.

Beowulf cluster type of parallel computing cluster

A Beowulf cluster is a computer cluster of what are normally identical, commodity-grade computers networked into a small local area network with libraries and programs installed which allow processing to be shared among them. The result is a high-performance parallel computing cluster from inexpensive personal computer hardware.

Cray Inc., a subsidiary of Hewlett Packard Enterprise, is an American supercomputer manufacturer headquartered in Seattle, Washington. It also manufactures systems for data storage and analytics. Several Cray supercomputer systems are listed in the TOP500, which ranks the most powerful supercomputers in the world.

ETA Systems was a supercomputer company spun off from Control Data Corporation (CDC) in the early 1980s in order to regain a footing in the supercomputer business. They successfully delivered the ETA-10, but lost money continually while doing so. CDC management eventually gave up and folded the company.

The ETA10 is a line of vector supercomputers designed, manufactured, and marketed by ETA Systems, a spin-off division of Control Data Corporation (CDC). The ETA10 was announced in 1986, with the first deliveries made in early 1987. The system was an evolution of the CDC Cyber 205, which can trace its origins back to the CDC STAR-100.

ASCI Red was the first computer built under the Accelerated Strategic Computing Initiative (ASCI), the supercomputing initiative of the United States government created to help the maintenance of the United States nuclear arsenal after the 1992 moratorium on nuclear testing.

NEC SX describes a series of vector supercomputers designed, manufactured, and marketed by NEC. This computer series is notable for providing the first computer to exceed 1 gigaflop, as well as the fastest supercomputer in the world between 1992-1993, and 2002-2004. The current model, as of 2018, is the SX-Aurora TSUBASA.

The NASA Advanced Supercomputing (NAS) Division is located at NASA Ames Research Center, Moffett Field in the heart of Silicon Valley in Mountain View, California. It has been the major supercomputing and modeling and simulation resource for NASA missions in aerodynamics, space exploration, studies in weather patterns and ocean currents, and space shuttle and aircraft design and development for over thirty years.

The Parallel Virtual File System (PVFS) is an open-source parallel file system. A parallel file system is a type of distributed file system that distributes file data across multiple servers and provides for concurrent access by multiple tasks of a parallel application. PVFS was designed for use in large scale cluster computing. PVFS focuses on high performance access to large data sets. It consists of a server process and a client library, both of which are written entirely of user-level code. A Linux kernel module and pvfs-client process allow the file system to be mounted and used with standard utilities. The client library provides for high performance access via the message passing interface (MPI). PVFS is being jointly developed between The Parallel Architecture Research Laboratory at Clemson University and the Mathematics and Computer Science Division at Argonne National Laboratory, and the Ohio Supercomputer Center. PVFS development has been funded by NASA Goddard Space Flight Center, The DOE Office of Science Advanced Scientific Computing Research program, NSF PACI and HECURA programs, and other government and private agencies. PVFS is now known as OrangeFS in its newest development branch.

A computer cluster is a set of loosely or tightly connected computers that work together so that, in many respects, they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software.

A lightweight kernel (LWK) operating system is one used in a large computer with many processor cores, termed a parallel computer.

The Slurm Workload Manager, or Slurm, is a free and open-source job scheduler for Linux and Unix-like kernels, used by many of the world's supercomputers and computer clusters.

India's Supercomputer Programme was started in late 1980s, precisely during the 3rd quarter of 1987, in New Delhi for Software, in Bangalore for Hardware, and in Pune for Firmware, while Sam Pitroda, Advisor to C-DOT, and C-DOT's Indigenous Architecture and Design Team constituted by its Senior Member Technical Staff / Senior Programme Managers including Mohan C. Subramaniyam alias Mohan Rose Ali, Periasamy Muthiah, and Leslie D'Souza had all worked hard at the Centre for Development of Telematics (C-DOT), after successfully completing their 3 years mission on designing the Nation's first ever indigenous C-DOT Digital Switching System - DSS, to create C-DOT's Indigenous Super-computing Machine called CHIPPS - C-DOT High-Performance Parallel Processing System, because the contracted Cray X-MP Supercomputers were denied for export to India which was under the Statesmanship and Stewardship of Mr. Rajiv Gandhi, the then Prime Minister of India, due to an arms embargo imposed by US on India during Ronald Reagan's Presidential Administration, for it was a dual-use technology and it could be used for developing indigenous Strategic Defense Systems by India.

History of supercomputing aspect of history

The main credit to supercomputers goes to the inventor of CDC -6600, Seymour Cray. The history of supercomputing goes back to the early 1920s in the United States with the IBM tabulators at Columbia University and a series of computers at Control Data Corporation (CDC), designed by Seymour Cray to use innovative designs and parallelism to achieve superior computational peak performance. The CDC 6600, released in 1964, is generally considered the first supercomputer. However, some earlier computers were considered supercomputers for their day, such as the 1954 IBM NORC, the 1960 UNIVAC LARC, and the IBM 7030 Stretch and the Atlas, both in 1962.

Approaches to supercomputer architecture have taken dramatic turns since the earliest systems were introduced in the 1960s. Early supercomputer architectures pioneered by Seymour Cray relied on compact innovative designs and local parallelism to achieve superior computational peak performance. However, in time the demand for increased computational power ushered in the age of massively parallel systems.

Compute Node Kernel (CNK) is the node level operating system for the IBM Blue Gene series of supercomputers.

Catamount is an operating system for supercomputers.

The high performance supercomputing program started in mid-to-late 1980s in Pakistan. Supercomputing is a recent area of Computer science in which Pakistan has made progress, driven in part by the growth of the information technology age in the country. Developing on the ingenious supercomputer program started in 1980s when the deployment of the Cray supercomputers was initially denied.

References

1 2 3 4 5 6 7 8 9 10 11 12 13 Encyclopedia of Parallel Computing by David Padua 2011 ISBN 0-387-09765-1 pages 426-429
1 2 3 4 5 Knowing machines: essays on technical change by Donald MacKenzie 1998 ISBN 0-262-63188-1 page 149-151
1 2 3 4 5 Euro-Par 2004 Parallel Processing: 10th International Euro-Par Conference 2004, by Marco Danelutto, Marco Vanneschi and Domenico Laforenza ISBN 3-540-22924-8 pages 835
1 2 An Evaluation of the Oak Ridge National Laboratory Cray XT3 by Sadaf R. Alam, et al., International Journal of High Performance Computing Applications, February 2008 vol. 22 no. 1 52-80
1 2 3 Open Job Management Architecture for the Blue Gene/L Supercomputer by Yariv Aridor et al in Job scheduling strategies for parallel processing by Dror G. Feitelson 2005 ISBN 978-3-540-31024-2 pages 95-101
↑ Vaughn-Nichols, Steven J. (June 18, 2013). "Linux continues to rule supercomputers". ZDNet . Retrieved June 20, 2013.
↑ "Top500 OS chart". Top500.org. Archived from the original on 2012-03-05. Retrieved 2010-10-31.
↑ Targeting the computer: government support and international competition by Kenneth Flamm 1987 ISBN 0-8157-2851-4 page 82
1 2 The computer revolution in Canada by John N. Vardalas 2001 ISBN 0-262-22064-4 page 258
↑ Design of a computer: the Control Data 6600 by James E. Thornton, Scott, Foresman Press 1970 page 163
1 2 3 Targeting the computer: government support and international competition by Kenneth Flamm 1987 ISBN 0-8157-2851-4 pages 81-83
↑ Lester T. Davis, The balance of power, a brief history of Cray Research hardware architectures in "High performance computing: technology, methods, and applications" by J. J. Dongarra 1995 ISBN 0-444-82163-5 page 126
1 2 3 Lloyd M. Thorndyke, The Demise of the ETA Systems in "Frontiers of Supercomputing II by Karyn R. Ames, Alan Brenner 1994 ISBN 0-520-08401-2 pages 489-497
↑ Past, present, parallel: a survey of available parallel computer systems by Arthur Trew 1991 ISBN 3-540-19664-1 page 326
↑ Frontiers of Supercomputing II by Karyn R. Ames, Alan Brenner 1994 ISBN 0-520-08401-2 page 356
↑ Brightwell,Ron Riesen,Rolf Maccabe,Arthur. "On the Appropriateness of Commodity Operating Systems for Large-Scale, Balanced Computing Systems" (PDF). Retrieved January 29, 2013.CS1 maint: multiple names: authors list (link)
1 2 Getting up to speed: the future of supercomputing by Susan L. Graham, Marc Snir, Cynthia A. Patterson, National Research Council 2005 ISBN 0-309-09502-6 page 136
↑ Forbes magazine, 03.15.05: Linux Rules Supercomputers
1 2 3 Euro-Par 2006 Parallel Processing: 12th International Euro-Par Conference, 2006, by Wolfgang E. Nagel, Wolfgang V. Walter and Wolfgang Lehner ISBN 3-540-37783-2 page
1 2 3 Job Scheduling Strategies for Parallel Processing: by Eitan Frachtenberg and Uwe Schwiegelshohn 2010 ISBN 3-642-04632-0 pages 138-144
↑ SLURM at SchedMD
↑ Jette, M. and M. Grondona, SLURM: Simple Linux Utility for Resource Management in the Proceedings of ClusterWorld Conference, San Jose, California, June 2003

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Padua426-1] 1 2 3 4 5 6 7 8 9 10 11 12 13 Encyclopedia of Parallel Computing by David Padua 2011 ISBN 0-387-09765-1 pages 426-429

[MacKenzie-2] 1 2 3 4 5 Knowing machines: essays on technical change by Donald MacKenzie 1998 ISBN 0-262-63188-1 page 149-151

[EuroPar2004-3] 1 2 3 4 5 Euro-Par 2004 Parallel Processing: 10th International Euro-Par Conference 2004, by Marco Danelutto, Marco Vanneschi and Domenico Laforenza ISBN 3-540-22924-8 pages 835

[Alam-4] 1 2 An Evaluation of the Oak Ridge National Laboratory Cray XT3 by Sadaf R. Alam, et al., International Journal of High Performance Computing Applications, February 2008 vol. 22 no. 1 52-80

[Yariv-5] 1 2 3 Open Job Management Architecture for the Blue Gene/L Supercomputer by Yariv Aridor et al in Job scheduling strategies for parallel processing by Dror G. Feitelson 2005 ISBN 978-3-540-31024-2 pages 95-101

[6] Vaughn-Nichols, Steven J. (June 18, 2013). "Linux continues to rule supercomputers". ZDNet . Retrieved June 20, 2013.

[7] "Top500 OS chart". Top500.org. Archived from the original on 2012-03-05. Retrieved 2010-10-31.

[8] Targeting the computer: government support and international competition by Kenneth Flamm 1987 ISBN 0-8157-2851-4 page 82

[Vardalas-9] 1 2 The computer revolution in Canada by John N. Vardalas 2001 ISBN 0-262-22064-4 page 258

[10] Design of a computer: the Control Data 6600 by James E. Thornton, Scott, Foresman Press 1970 page 163

[Flamm-11] 1 2 3 Targeting the computer: government support and international competition by Kenneth Flamm 1987 ISBN 0-8157-2851-4 pages 81-83

[Power-12] Lester T. Davis, The balance of power, a brief history of Cray Research hardware architectures in "High performance computing: technology, methods, and applications" by J. J. Dongarra 1995 ISBN 0-444-82163-5 page 126

[Thorndyke-13] 1 2 3 Lloyd M. Thorndyke, The Demise of the ETA Systems in "Frontiers of Supercomputing II by Karyn R. Ames, Alan Brenner 1994 ISBN 0-520-08401-2 pages 489-497

[14] Past, present, parallel: a survey of available parallel computer systems by Arthur Trew 1991 ISBN 3-540-19664-1 page 326

[15] Frontiers of Supercomputing II by Karyn R. Ames, Alan Brenner 1994 ISBN 0-520-08401-2 page 356

[16] Brightwell,Ron Riesen,Rolf Maccabe,Arthur. "On the Appropriateness of Commodity Operating Systems for Large-Scale, Balanced Computing Systems" (PDF). Retrieved January 29, 2013.CS1 maint: multiple names: authors list (link)

[National136-17] 1 2 Getting up to speed: the future of supercomputing by Susan L. Graham, Marc Snir, Cynthia A. Patterson, National Research Council 2005 ISBN 0-309-09502-6 page 136

[18] Forbes magazine, 03.15.05: Linux Rules Supercomputers

[EuroPar2006-19] 1 2 3 Euro-Par 2006 Parallel Processing: 12th International Euro-Par Conference, 2006, by Wolfgang E. Nagel, Wolfgang V. Walter and Wolfgang Lehner ISBN 3-540-37783-2 page

[Eitan-20] 1 2 3 Job Scheduling Strategies for Parallel Processing: by Eitan Frachtenberg and Uwe Schwiegelshohn 2010 ISBN 3-642-04632-0 pages 138-144

[21] SLURM at SchedMD

[22] Jette, M. and M. Grondona, SLURM: Simple Linux Utility for Resource Management in the Proceedings of ClusterWorld Conference, San Jose, California, June 2003

v t e Parallel computing
General	Distributed computing Parallel computing Massively parallel Cloud computing High-performance computing Multiprocessing Manycore processor GPGPU Computer network Systolic array
Levels	Bit Instruction Thread Task Data Memory Loop Pipeline
Multithreading	Temporal Simultaneous (SMT) Speculative (SpMT) Preemptive Cooperative Clustered Multi-Thread (CMT) Hardware scout
Theory	PRAM model PEM Model Analysis of parallel algorithms Amdahl's law Gustafson's law Cost efficiency Karp–Flatt metric Slowdown Speedup
Elements	Process Thread Fiber Instruction window Array data structure
Coordination	Multiprocessing Memory coherency Cache coherency Cache invalidation Barrier Synchronization Application checkpointing
Programming	Stream processing Dataflow programming Models Implicit parallelism Explicit parallelism Concurrency Non-blocking algorithm
Hardware	Flynn's taxonomy SISD SIMD SIMT MISD MIMD Dataflow architecture Pipelined processor Superscalar processor Vector processor Multiprocessor symmetric asymmetric Memory shared distributed distributed shared UMA NUMA COMA Massively parallel computer Computer cluster Grid computer Hardware acceleration
APIs	Ateji PX Boost.Thread Chapel HPX Charm++ Cilk Coarray Fortran CUDA HIP Dryad C++ AMP Global Arrays MPI OpenMP OpenCL OpenHMPP OpenACC TPL PLINQ PVM POSIX Threads RaftLib UPC TBB ZPL
Problems	Automatic parallelization Deadlock Livelock Deterministic algorithm Embarrassingly parallel Parallel slowdown Race condition Software lockout Scalability Starvation
Category: parallel computing Media related to Parallel computing at Wikimedia Commons