Platform LSF

Last updated
LSF
Developer(s) IBM (current)
Platform Computing (former)
Stable release
10.2.0 (10.2.0.7 [1] ) / October 2017 (January 16, 2018)
Operating system Unix, Linux, Windows
Type Job scheduler
License Proprietary
Website IBM Platform Computing

Platform Load Sharing Facility (or simply LSF) is a workload management platform, job scheduler, for distributed high performance computing. It can be used to execute batch jobs on networked Unix and Windows systems on many different architectures. [2] [3] LSF was based on the Utopia research project at the University of Toronto. [4]

In 2007, Platform released Platform Lava, which is a simplified version of LSF based on an old version of LSF release, licensed under GNU General Public License v2. [5] The project was discontinued in 2011, succeeded by OpenLava.

In January, 2012, Platform Computing was acquired by IBM. [6] The product is now called IBM Spectrum LSF.

Related Research Articles

Grid computing is the use of widely distributed computer resources to reach a common goal. A computing grid can be thought of as a distributed system with non-interactive workloads that involve many files. Grid computing is distinguished from conventional high-performance computing systems such as cluster computing in that grid computers have each node set to perform a different task/application. Grid computers also tend to be more heterogeneous and geographically dispersed than cluster computers. Although a single grid can be dedicated to a particular application, commonly a grid is used for a variety of purposes. Grids are often constructed with general-purpose grid middleware software libraries. Grid sizes can be quite large.

This article presents a timeline of events in the history of computer operating systems from 1951 to the current day. For a narrative explaining the overall developments, see the History of operating systems.

Project Athena

Project Athena was a joint project of MIT, Digital Equipment Corporation, and IBM to produce a campus-wide distributed computing environment for educational use. It was launched in 1983, and research and development ran until June 30, 1991, eight years after it began. As of 2020, Athena is still in production use at MIT. It works as software that makes a machine a thin client, that will download educational applications from the MIT servers on demand.

In software development, distcc is a tool for speeding up compilation of source code by using distributed computing over a computer network. With the right configuration, distcc can dramatically reduce a project's compilation time.

MOSIX is a proprietary distributed operating system. Although early versions were based on older UNIX systems, since 1999 it focuses on Linux clusters and grids. In a MOSIX cluster/grid there is no need to modify or to link applications with any library, to copy files or login to remote nodes, or even to assign processes to different nodes – it is all done automatically, like in an SMP.

United Devices

United Devices, Inc. was a privately held, commercial distributed computing company that focused on the use of grid computing to manage high-performance computing systems and enterprise cluster management. Its products and services allowed users to "allocate workloads to computers and devices throughout enterprises, aggregating computing power that would normally go unused." It operated under the name Univa UD for a time, after merging with Univa on September 17, 2007.

A job scheduler is a computer application for controlling unattended background program execution of jobs. This is commonly called batch scheduling, as execution of non-interactive jobs is often called batch processing, though traditional job and batch are distinguished and contrasted; see that page for details. Other synonyms include batch system, distributed resource management system (DRMS), distributed resource manager (DRM), and, commonly today, workload automation (WLA). The data structure of jobs to run is known as the job queue.

Platform Computing

Platform Computing was a privately held software company primarily known for its job scheduling product, Load Sharing Facility (LSF). It was founded in 1992 in Toronto, Ontario, Canada and headquartered in Markham, Ontario with 11 branch offices across the United States, Europe and Asia.

GPFS, the General Parallel File System is high-performance clustered file system software developed by IBM. It can be deployed in shared-disk or shared-nothing distributed parallel modes, or a combination of these. It is used by many of the world's largest commercial companies, as well as some of the supercomputers on the Top 500 List. For example, it is the filesystem of the Summit at Oak Ridge National Laboratory which was the #1 fastest supercomputer in the world in the November 2019 top500 list of supercomputers. Summit is a 200 Petaflops system composed of more than 9,000 IBM POWER microprocessors and 27,000 NVIDIA Volta GPUs. The storage filesystem called Alpine has 250 PB of storage using Spectrum Scale on IBM ESS storage hardware, capable of approximately 2.5TB/s of sequential I/O and 2.2TB/s of random I/O.

Distributed Resource Management Application API (DRMAA) is a high-level Open Grid Forum (OGF) API specification for the submission and control of jobs to a distributed resource management (DRM) system, such as a cluster or grid computing infrastructure. The scope of the API covers all the high level functionality required for applications to submit, control, and monitor jobs on execution resources in the DRM system.

IBM Spectrum Symphony, previously known as IBM Platform Symphony and Platform Symphony, is a high-performance computing (HPC) software system developed by Platform Computing, the company that developed Load Sharing Facility (LSF). Focusing on financial services, Symphony is designed to deliver scalability and enhances performance for computationally intensive risk and analytical applications. The product lets users run applications using distributed computing.

Computer cluster

A computer cluster is a set of loosely or tightly connected computers that work together so that, in many aspects, they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software.

A reliable multicast protocol is a computer networking protocol that provides a reliable sequence of packets to multiple recipients simultaneously, making it suitable for applications such as multi-receiver file transfer.

Cloud computing Form of Internet-based computing that provides shared processing resources and data to computers and other devices on demand

Cloud computing is the on-demand availability of computer system resources, especially data storage and computing power, without direct active management by the user. The term is generally used to describe data centers available to many users over the Internet. Large clouds, predominant today, often have functions distributed over multiple locations from central servers. If the connection to the user is relatively close, it may be designated an edge server.

Slurm Workload Manager Free and open-source job scheduler for Linux and similar computers

The Slurm Workload Manager, formerly known as Simple Linux Utility for Resource Management (SLURM), or simply Slurm, is a free and open-source job scheduler for Linux and Unix-like kernels, used by many of the world's supercomputers and computer clusters.

OpenNebula Cloud computing platform for managing heterogeneous distributed data center infrastructures

OpenNebula is a cloud computing platform for managing heterogeneous distributed data center infrastructures. The OpenNebula platform manages a data center's virtual infrastructure to build private, public and hybrid implementations of Infrastructure as a Service. The two primary uses of the OpenNebula platform are data center virtualization and cloud deployments based on the KVM hypervisor, LXD system containers, and AWS Firecracker microVMs. The platform is also capable of offering the cloud infrastructure necessary to operate a cloud on top of existing VMware infrastructure. In early June 2020, OpenNebula announced the release of a new Enterprise Edition for corporate users, along with a Community Edition. OpenNebula CE is free and open-source software, released under the Apache License version 2. OpenNebula CE comes with free access to maintenance releases but with upgrades to new minor/major versions only available for users with non-commercial deployments or with significant contributions to the OpenNebula Community. OpenNebula EE is distributed under a closed-source license and requires a commercial Subscription.

In computing, a distributed file system (DFS) or network file system is any file system that allows access to files from multiple hosts sharing via a computer network. This makes it possible for multiple users on multiple machines to share files and storage resources.

Heterogeneous computing refers to systems that use more than one kind of processor or cores. These systems gain performance or energy efficiency not just by adding the same type of processors, but by adding dissimilar coprocessors, usually incorporating specialized processing capabilities to handle particular tasks.

OpenLava is an open source workload job scheduling software for a cluster of computers. OpenLava was derived from an early version of Platform LSF. Its configuration file syntax, API, and CLI have been kept unchanged. Therefore, OpenLava is mostly compatible with Platform LSF.

Werner G. Krebs is an American data scientist. He is currently CEO of data science and artificial intelligence startup Acculation, Inc. and has previously held positions at what are now Virtu Financial, Bank of America, and the San Diego Supercomputer Center.

References

  1. "IBM Spectrum LSF Process Manager V10.2.0 Fix Pack 7 (509662) Readme" . Retrieved 2019-04-17.
  2. Michael R. Ault, Mike Ault, Madhu Tumma, and Ranko Mosic (2004). Oracle 10g Grid & Real Application Clusters. Rampant TechPress. p. 24. ISBN   9780974435541.CS1 maint: multiple names: authors list (link)
  3. Goering, Richard (March 8, 1999). "Load sharing brings kudos". EE Times Online. Retrieved 2007-11-12. LSF ... enables load sharing by distributing jobs to available CPUs in heterogeneous networks ... but don't tell them that; they'll just want to raise their prices
  4. "Utopia: A Load Sharing Facility for Large, Heterogeneous Distributed Computer Systems". John Wiley & Sons. CiteSeerX   10.1.1.121.1434 .Cite journal requires |journal= (help)
  5. "Platform Lava". Archived from the original on 2011-04-21. Retrieved 2011-03-25.
  6. IBM Closes on Acquisition of Platform Computing