IBM Spectrum LSF

Last updated
LSF
Developer(s) IBM (current)
Platform Computing (former)
Stable release
10.1.0 (10.1.0.14 [1] ) / June 2023
Operating system AIX, HP-UX, Linux, Windows, macOS, Solaris
Type Job scheduler
License Proprietary
Website IBM Spectrum LSF

IBM Spectrum LSF (LSF, originally Platform Load Sharing Facility) is a workload management platform, job scheduler, for distributed high performance computing (HPC) by IBM.

Contents

Details

It can be used to execute batch jobs on networked Unix and Windows systems on many different architectures. [2] [3] LSF was based on the Utopia research project at the University of Toronto. [4]

In 2007, Platform released Platform Lava, which is a simplified version of LSF based on an old version of LSF release, licensed under GNU General Public License v2. [5] The project was discontinued in 2011, succeeded by OpenLava.

In January, 2012, Platform Computing was acquired by IBM. [6] The product is now called IBM Spectrum LSF.

IBM Spectrum LSF Community Edition is a no-charge community edition of the IBM Spectrum LSF workload management platform.

Related Research Articles

Grid computing is the use of widely distributed computer resources to reach a common goal. A computing grid can be thought of as a distributed system with non-interactive workloads that involve many files. Grid computing is distinguished from conventional high-performance computing systems such as cluster computing in that grid computers have each node set to perform a different task/application. Grid computers also tend to be more heterogeneous and geographically dispersed than cluster computers. Although a single grid can be dedicated to a particular application, commonly a grid is used for a variety of purposes. Grids are often constructed with general-purpose grid middleware software libraries. Grid sizes can be quite large.

<span class="mw-page-title-main">IBM Db2</span> Relational model database server

Db2 is a family of data management products, including database servers, developed by IBM. It initially supported the relational model, but was extended to support object–relational features and non-relational structures like JSON and XML. The brand name was originally styled as DB/2, then DB2 until 2017 and finally changed to its present form.

MOSIX is a proprietary distributed operating system. Although early versions were based on older UNIX systems, since 1999 it focuses on Linux clusters and grids. In a MOSIX cluster/grid there is no need to modify or to link applications with any library, to copy files or login to remote nodes, or even to assign processes to different nodes – it is all done automatically, like in an SMP.

HTCondor is an open-source high-throughput computing software framework for coarse-grained distributed parallelization of computationally intensive tasks. It can be used to manage workload on a dedicated cluster of computers, or to farm out work to idle desktop computers – so-called cycle scavenging. HTCondor runs on Linux, Unix, Mac OS X, FreeBSD, and Microsoft Windows operating systems. HTCondor can integrate both dedicated resources and non-dedicated desktop machines into one computing environment.

WebSphere Application Server (WAS) is a software product that performs the role of a web application server. More specifically, it is a software framework and middleware that hosts Java-based web applications. It is the flagship product within IBM's WebSphere software suite. It was initially created by Donald F. Ferguson, who later became CTO of Software for Dell. The first version was launched in 1998. This project was an offshoot from IBM HTTP Server team starting with the Domino Go web server.

<span class="mw-page-title-main">United Devices</span> A privately held, commercial volunteer computing company

United Devices, Inc. was a privately held, commercial volunteer computing company that focused on the use of grid computing to manage high-performance computing systems and enterprise cluster management. Its products and services allowed users to "allocate workloads to computers and devices throughout enterprises, aggregating computing power that would normally go unused." It operated under the name Univa UD for a time, after merging with Univa on September 17, 2007.

Job Submission Description Language is an extensible XML specification from the Global Grid Forum for the description of simple tasks to non-interactive computer execution systems. Currently at version 1.0, the specification focuses on the description of computational task submissions to traditional high-performance computer systems like batch schedulers.

<span class="mw-page-title-main">Platform Computing</span> Software company in Canada

Platform Computing was a privately held software company primarily known for its job scheduling product, Load Sharing Facility (LSF). It was founded in 1992 in Toronto, Ontario, Canada and headquartered in Markham, Ontario with 11 branch offices across the United States, Europe and Asia.

GPFS is high-performance clustered file system software developed by IBM. It can be deployed in shared-disk or shared-nothing distributed parallel modes, or a combination of these. It is used by many of the world's largest commercial companies, as well as some of the supercomputers on the Top 500 List. For example, it is the filesystem of the Summit at Oak Ridge National Laboratory which was the #1 fastest supercomputer in the world in the November 2019 TOP500 list of supercomputers. Summit is a 200 Petaflops system composed of more than 9,000 POWER9 processors and 27,000 NVIDIA Volta GPUs. The storage filesystem called Alpine has 250 PB of storage using Spectrum Scale on IBM ESS storage hardware, capable of approximately 2.5TB/s of sequential I/O and 2.2TB/s of random I/O.

Distributed Resource Management Application API (DRMAA) is a high-level Open Grid Forum (OGF) API specification for the submission and control of jobs to a distributed resource management (DRM) system, such as a cluster or grid computing infrastructure. The scope of the API covers all the high level functionality required for applications to submit, control, and monitor jobs on execution resources in the DRM system.

Meta-scheduling or super scheduling is a computer software technique of optimizing computational workloads by combining an organization's multiple job schedulers into a single aggregated view, allowing batch jobs to be directed to the best location for execution.

IBM Spectrum Symphony, previously known as IBM Platform Symphony and Platform Symphony, is a high-performance computing (HPC) software system developed by Platform Computing, the company that developed Load Sharing Facility (LSF). Focusing on financial services, Symphony is designed to deliver scalability and enhances performance for computationally intensive risk and analytical applications. The product lets users run applications using distributed computing.

<span class="mw-page-title-main">Computer cluster</span> Set of computers configured in a distributed computing system

A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software.

Many-task computing (MTC) in computational science is an approach to parallel computing that aims to bridge the gap between two computing paradigms: high-throughput computing (HTC) and high-performance computing (HPC).

SynfiniWay was middleware with which a virtualised IT framework can be created that provides a uniform and global view of resources within a department, a company, or a company with its suppliers. This virtualised IT framework is service-oriented, meaning that applications are run as services, which are a system-independent view of applications. Several applications can be linked in a workflow, and data exchange between the applications participating in the workflow is implicitly managed by the IT framework. SynfiniWay is platform-independent, allowing almost any distributed heterogeneous platform to be linked into its virtualised IT framework.

<span class="mw-page-title-main">Slurm Workload Manager</span> Free and open-source job scheduler for Linux and similar computers

The Slurm Workload Manager, formerly known as Simple Linux Utility for Resource Management (SLURM), or simply Slurm, is a free and open-source job scheduler for Linux and Unix-like kernels, used by many of the world's supercomputers and computer clusters.

<span class="mw-page-title-main">OpenNebula</span> Cloud-computing platform for managing heterogeneous distributed infrastructure

OpenNebula is an open source cloud computing platform for managing heterogeneous data center, public cloud and edge computing infrastructure resources. OpenNebula manages on-premise and remote virtual infrastructure to build private, public, or hybrid implementations of Infrastructure as a Service and multi-tenant Kubernetes deployments. The two primary uses of the OpenNebula platform are data center virtualization and cloud deployments based on the KVM hypervisor, LXD/LXC system containers, and AWS Firecracker microVMs. The platform is also capable of offering the cloud infrastructure necessary to operate a cloud on top of existing VMware infrastructure. In early June 2020, OpenNebula announced the release of a new Enterprise Edition for corporate users, along with a Community Edition. OpenNebula CE is free and open-source software, released under the Apache License version 2. OpenNebula CE comes with free access to patch releases containing critical bug fixes but with no access to the regular EE maintenance releases. Upgrades to the latest minor/major version is only available for CE users with non-commercial deployments or with significant open source contributions to the OpenNebula Community. OpenNebula EE is distributed under a closed-source license and requires a commercial Subscription.

OpenLava is a workload job scheduler for a cluster of computers. OpenLava was pirated from an early version of Platform LSF. Its configuration file syntax, application program interface (API), and command-line interface (CLI) have been kept unchanged. Therefore, OpenLava is mostly compatible with Platform LSF.

Werner G. Krebs is an American data scientist. He is currently CEO of data science and artificial intelligence startup Acculation, Inc. and has previously held positions at what are now Virtu Financial, Bank of America, and the San Diego Supercomputer Center.

References

  1. "What's new in IBM Spectrum LSF Version 10.1 Fix Pack 14" . Retrieved 2023-07-02.
  2. Mike Ault; Madhu Tumma (2004). Oracle 10g Grid & Real Application Clusters. Rampant TechPress. p. 24. ISBN   978-0-9744355-4-1.
  3. Goering, Richard (March 8, 1999). "Load sharing brings kudos". EE Times Online. Archived from the original on 2011-05-16. Retrieved 2007-11-12. LSF ... enables load sharing by distributing jobs to available CPUs in heterogeneous networks ... but don't tell them that; they'll just want to raise their prices
  4. Zhou, Songnian; Wang, Jingwen; Zheng, Xiaohu; Delisle, Pierre (1993). "Utopia: A Load Sharing Facility for Large, Heterogeneous Distributed Computer Systems". CiteSeerX   10.1.1.121.1434 .
  5. "Platform Lava". Archived from the original on 2011-04-21. Retrieved 2011-03-25.
  6. IBM Closes on Acquisition of Platform Computing

Also See