Job scheduler

Last updated

A job scheduler is a computer application for controlling unattended background program execution of jobs. [1] This is commonly called batch scheduling, as execution of non-interactive jobs is often called batch processing, though traditional job and batch are distinguished and contrasted; see that page for details. Other synonyms include batch system, distributed resource management system (DRMS), distributed resource manager (DRM), and, commonly today, workload automation (WLA). The data structure of jobs to run is known as the job queue.

Contents

Modern job schedulers typically provide a graphical user interface and a single point of control for definition and monitoring of background executions in a distributed network of computers. Increasingly, job schedulers are required to orchestrate the integration of real-time business activities with traditional background IT processing across different operating system platforms and business application environments.

Job scheduling should not be confused with process scheduling, which is the assignment of currently running processes to CPUs by the operating system.

Overview

Basic features expected of job scheduler software include:

If software from a completely different area includes all or some of those features, this software can be considered to have job scheduling capabilities.

Most operating systems, such as Unix and Windows, provide basic job scheduling capabilities, notably by at and batch, cron, and the Windows Task Scheduler. Web hosting services provide job scheduling capabilities through a control panel or a webcron solution. Many programs such as DBMS, backup, ERPs, and BPM also include relevant job-scheduling capabilities. Operating system ("OS") or point program supplied job-scheduling will not usually provide the ability to schedule beyond a single OS instance or outside the remit of the specific program. Organizations needing to automate unrelated IT workload may also leverage further advanced features from a job scheduler, such as:

These advanced capabilities can be written by in-house developers but are more often provided by suppliers who specialize in systems-management software.

Main concepts

There are many concepts that are central to almost every job scheduler implementation and that are widely recognized with minimal variations: Jobs, Dependencies, Job Streams, and Users.

Beyond the basic, single OS instance scheduling tools there are two major architectures that exist for Job Scheduling software.

Batch queuing for HPC clusters

An important niche for job schedulers is managing the job queue for a cluster of computers. Typically, the scheduler will schedule jobs from the queue as sufficient resources (cluster nodes) become idle. Some widely used cluster batch systems are

History

Job Scheduling has a long history. Job Schedulers have been one of the major components of IT infrastructure since the early mainframe systems. At first, stacks of punched cards were processed one after the other, hence the term "batch processing".

From a historical point of view, we can distinguish two main eras about Job Schedulers:

  1. The mainframe era
    • Job Control Language (JCL) on IBM mainframes. Initially based on JCL functionality to handle dependencies, this era is typified by the development of sophisticated scheduling solutions (such as Job Entry Subsystem 2/3) forming part of the systems management and automation toolset on the mainframe.
  2. The open systems era
    • Modern schedulers on a variety of architectures and operating systems. With standard scheduling tools limited to commands such as at and batch, the need for mainframe standard job schedulers has grown with the increased adoption of distributed computing environments.

In terms of the type of scheduling there are also distinct eras:

  1. Batch processing - the traditional date and time based execution of background tasks based on a defined period during which resources were available for batch processing (the batch window). In effect the original mainframe approach transposed onto the open systems environment.
  2. Event-driven process automation - where background processes cannot be simply run at a defined time, either because the nature of the business demands that workload is based on the occurrence of external events (such as the arrival of an order from a customer or a stock update from a store branch), or because there is no / insufficient batch window.
  3. Service Oriented job scheduling - recent developments in Service Oriented Architecture (SOA) have seen a move towards deploying job scheduling as a reusable IT infrastructure service that can play a role in the integration of existing business application workload with new Web Services based real-time applications.

Scheduling

Various schemes are used to decide which particular job to run. Parameters that might be considered include:

See also

Related Research Articles

Client–server model Distributed application structure in computing

Client–server model is a distributed application structure that partitions tasks or workloads between the providers of a resource or service, called servers, and service requesters, called clients. Often clients and servers communicate over a computer network on separate hardware, but both client and server may reside in the same system. A server host runs one or more server programs, which share their resources with clients. A client usually does not share any of its resources, but it requests content or service from a server. Clients, therefore, initiate communication sessions with servers, which await incoming requests. Examples of computer applications that use the client-server model are email, network printing, and the World Wide Web.

Mainframe computer Computers used primarily by large organizations for business-critical applications

A mainframe computer, informally called a mainframe or big iron, is a computer used primarily by large organizations for critical applications, bulk data processing. A mainframe computer is larger and has more processing power than some other classes of computers, such as minicomputers, servers, workstations, and personal computers. Most large-scale computer-system architectures were established in the 1960s, but they continue to evolve. Mainframe computers are often used as servers.

Computerized batch processing is the running of "jobs that can run without end user interaction, or can be scheduled to run as resources permit."

Computer operating systems (OSes) provide a set of functions needed and used by most application programs on a computer, and the links needed to control and synchronize computer hardware. On the first computers, with no operating system, every program needed the full hardware specification to run correctly and perform standard tasks, and its own drivers for peripheral devices like printers and punched paper card readers. The growing complexity of hardware and application programs eventually made operating systems a necessity for everyday use.

IBM Db2 Family Relational model database server

Db2 is a family of data management products, including database servers, developed by IBM. They initially supported the relational model, but were extended to support object–relational features and non-relational structures like JSON and XML. The brand name was originally styled as DB/2, then DB2 until 2017 and finally changed to its present form.

In computing, scheduling is the method by which work is assigned to resources that complete the work. The work may be virtual computation elements such as threads, processes or data flows, which are in turn scheduled onto hardware resources such as processors, network links or expansion cards.

Tuxedo is a middleware platform used to manage distributed transaction processing in distributed computing environments. Tuxedo is a transaction processing system or transaction-oriented middleware, or enterprise application server for a variety of systems and programming languages. Developed by AT&T in the 1980s, it became a software product of Oracle Corporation in 2008 when they acquired BEA Systems. Tuxedo is now part of the Oracle Fusion Middleware.

The Job Entry Subsystem (JES) is a component of IBM's mainframe operating systems that is responsible for managing batch workloads. In modern times, there are two distinct implementations of the Job Entry System called JES2 and JES3. They are designed to provide efficient execution of batch jobs.

HTCondor is an open-source high-throughput computing software framework for coarse-grained distributed parallelization of computationally intensive tasks. It can be used to manage workload on a dedicated cluster of computers, or to farm out work to idle desktop computers – so-called cycle scavenging. HTCondor runs on Linux, Unix, Mac OS X, FreeBSD, and Microsoft Windows operating systems. HTCondor can integrate both dedicated resources and non-dedicated desktop machines into one computing environment.

The System Display and Search Facility (SDSF) component of IBM's mainframe operating system, z/OS, is an interactive user interface that allows users and administrators to view and control various aspects of the mainframe's operation and system resources. Some of the information displayed in SDSF includes Batch job output, Unix processes, scheduling environments, and status of external devices such as printers and network lines. SDSF is primarily used to access the batch and system log files and dumps.

The Terascale Open-source Resource and QUEue Manager (TORQUE) is a distributed resource manager providing control over batch jobs and distributed compute nodes. TORQUE can integrate with the non-commercial Maui Cluster Scheduler or the commercial Moab Workload Manager to improve overall utilization, scheduling and administration on a cluster.

In system software, a job queue, is a data structure maintained by job scheduler software containing jobs to run.

In computing job control refers to the control of multiple tasks or jobs on a computer system, ensuring that they each have access to adequate resources to perform correctly, that competition for limited resources does not cause a deadlock where two or more jobs are unable to complete, resolving such situations where they do occur, and terminating jobs that, for any reason, are not performing as expected.

Computer cluster

A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software.

gLite

gLite is a middleware computer software project for grid computing used by the CERN LHC experiments and other scientific domains. It was implemented by collaborative efforts of more than 80 people in 12 different academic and industrial research centers in Europe. gLite provides a framework for building applications tapping into distributed computing and storage resources across the Internet. The gLite services were adopted by more than 250 computing centres, and used by more than 15000 researchers in Europe and around the world.

OS 2200 is the operating system for the Unisys ClearPath Dorado family of mainframe systems. The operating system kernel of OS 2200 is a lineal descendant of Exec 8 for the UNIVAC 1108. Documentation and other information on current and past Unisys systems can be found on the Unisys public support website.

Advanced Systems Concepts, Inc. (ASCI) provides job scheduling, scripting and command language, and data replication and recovery software. Founded in 1981 in Hoboken, the company is now based in Morristown, New Jersey. Initially, the company was focused on the development of products for former Digital Equipment Corporation’s (DEC) OpenVMS operating system (OS) product; now they can be used across different platforms and technologies, including Microsoft Windows, Linux, UNIX, and OpenVMS. Its products include ActiveBatch, XLNT, and RemoteSHADOW.

Slurm Workload Manager Free and open-source job scheduler for Linux and similar computers

The Slurm Workload Manager, formerly known as Simple Linux Utility for Resource Management (SLURM), or simply Slurm, is a free and open-source job scheduler for Linux and Unix-like kernels, used by many of the world's supercomputers and computer clusters.

Data-intensive computing is a class of parallel computing applications which use a data parallel approach to process large volumes of data typically terabytes or petabytes in size and typically referred to as big data. Computing applications which devote most of their execution time to computational requirements are deemed compute-intensive, whereas computing applications which require large volumes of data and devote most of their processing time to I/O and manipulation of data are deemed data-intensive.

Univa Grid Engine (UGE) is a batch-queuing system, forked from Sun Grid Engine (SGE). The software schedules resources in a data center applying user-configurable policies to help improve resource sharing and throughput by maximizing resource utilization. The product can be deployed to run on-premises, using IaaS cloud computing or in a hybrid cloud environment.

References

  1. Effect of Job Size Characteristics on Job Scheduling Performance