Job scheduler

Last updated September 24, 2024

A job scheduler is a computer application for controlling unattended background program execution of jobs.^[1] This is commonly called batch scheduling, as execution of non-interactive jobs is often called batch processing, though traditional job and batch are distinguished and contrasted; see that page for details. Other synonyms include batch system, distributed resource management system (DRMS), distributed resource manager (DRM), and, commonly today, workload automation (WLA). The data structure of jobs to run is known as the job queue.

Modern job schedulers typically provide a graphical user interface and a single point of control for definition and monitoring of background executions in a distributed network of computers. Increasingly, job schedulers are required to orchestrate the integration of real-time business activities with traditional background IT processing across different operating system platforms and business application environments.

Job scheduling should not be confused with process scheduling, which is the assignment of currently running processes to CPUs by the operating system.

Overview

Basic features expected of job scheduler software include:

interfaces which help to define workflows and/or job dependencies
automatic submission of executions
interfaces to monitor the executions
priorities and/or queues to control the execution order of unrelated jobs

If software from a completely different area includes all or some of those features, this software can be considered to have job scheduling capabilities.

Most operating systems, such as Unix and Windows, provide basic job scheduling capabilities, notably by at and batch, cron, and the Windows Task Scheduler. Web hosting services provide job scheduling capabilities through a control panel or a webcron solution. Many programs such as DBMS, backup, ERPs, and BPM also include relevant job-scheduling capabilities. Operating system ("OS") or point program supplied job-scheduling will not usually provide the ability to schedule beyond a single OS instance or outside the remit of the specific program. Organizations needing to automate unrelated IT workload may also leverage further advanced features from a job scheduler, such as:

real-time scheduling based on external, unpredictable events
automatic restart and recovery in event of failures
alerting and notification to operations personnel
generation of incident reports
audit trails for regulatory compliance purposes

These advanced capabilities can be written by in-house developers but are more often provided by suppliers who specialize in systems-management software.

Main concepts

There are many concepts that are central to almost every job scheduler implementation and that are widely recognized with minimal variations: Jobs, Dependencies, Job Streams, and Users.

Beyond the basic, single OS instance scheduling tools there are two major architectures that exist for Job Scheduling software.

Master/Agent architecture — the historic architecture for Job Scheduling software. The Job Scheduling software is installed on a single machine (Master), while on production machines only a very small component (Agent) is installed that awaits commands from the Master, executes them, then returns the exit code back to the Master.
Cooperative architecture — a decentralized model where each machine is capable of helping with scheduling and can offload locally scheduled jobs to other cooperating machines. This enables dynamic workload balancing to maximize hardware resource utilization and high availability to ensure service delivery.

History

Job Scheduling has a long history. Job Schedulers have been one of the major components of IT infrastructure since the early mainframe systems. At first, stacks of punched cards were processed one after the other, hence the term "batch processing".

From a historical point of view, we can distinguish two main eras about Job Schedulers:

The mainframe era
- Job Control Language (JCL) on IBM mainframes. Initially based on JCL functionality to handle dependencies, this era is typified by the development of sophisticated scheduling solutions (such as Job Entry Subsystem 2/3) forming part of the systems management and automation toolset on the mainframe.
The open systems era
- Modern schedulers on a variety of architectures and operating systems. With standard scheduling tools limited to commands such as at and batch, the need for mainframe standard job schedulers has grown with the increased adoption of distributed computing environments.

In terms of the type of scheduling there are also distinct eras:

Batch processing - the traditional date and time based execution of background tasks based on a defined period during which resources were available for batch processing (the batch window). In effect the original mainframe approach transposed onto the open systems environment.
Event-driven process automation - where background processes cannot be simply run at a defined time, either because the nature of the business demands that workload is based on the occurrence of external events (such as the arrival of an order from a customer or a stock update from a store branch), or because there is no / insufficient batch window.
Service Oriented job scheduling - recent developments in Service Oriented Architecture (SOA) have seen a move towards deploying job scheduling as a reusable IT infrastructure service that can play a role in the integration of existing business application workload with new Web Services based real-time applications.

Scheduling

Various schemes are used to decide which particular job to run. Parameters that might be considered include:

Job priority
Computer resource availability
License key if job is using licensed software
Execution time allocated to user
Number of simultaneous jobs allowed for a user
Estimated execution time
Elapsed execution time
Availability of peripheral devices
Occurrence of prescribed events
Job dependency
File dependency
Operator prompt dependency

Related Research Articles

A mainframe computer, informally called a mainframe or big iron, is a computer used primarily by large organizations for critical applications like bulk data processing for tasks such as censuses, industry and consumer statistics, enterprise resource planning, and large-scale transaction processing. A mainframe computer is large but not as large as a supercomputer and has more processing power than some other classes of computers, such as minicomputers, servers, workstations, and personal computers. Most large-scale computer-system architectures were established in the 1960s, but they continue to evolve. Mainframe computers are often used as servers.

Computerized batch processing is a method of running software programs called jobs in batches automatically. While users are required to submit the jobs, no other interaction by the user is required to process the batch. Batches may automatically be run at scheduled times as well as being run contingent on the availability of computer resources.

Computer operating systems (OSes) provide a set of functions needed and used by most application programs on a computer, and the links needed to control and synchronize computer hardware. On the first computers, with no operating system, every program needed the full hardware specification to run correctly and perform standard tasks, and its own drivers for peripheral devices like printers and punched paper card readers. The growing complexity of hardware and application programs eventually made operating systems a necessity for everyday use.

Db2 is a family of data management products, including database servers, developed by IBM. It initially supported the relational model, but was extended to support object–relational features and non-relational structures like JSON and XML. The brand name was originally styled as DB2 until 2017, when it changed to its present form.

In computing, scheduling is the action of assigning resources to perform tasks. The resources may be processors, network links or expansion cards. The tasks may be threads, processes or data flows.

IBM CICS is a family of mixed-language application servers that provide online transaction management and connectivity for applications on IBM mainframe systems under z/OS and z/VSE.

Tuxedo is a middleware platform used to manage distributed transaction processing in distributed computing environments. Tuxedo is a transaction processing system or transaction-oriented middleware, or enterprise application server for a variety of systems and programming languages. Developed by AT&T in the 1980s, it became a software product of Oracle Corporation in 2008 when they acquired BEA Systems. Tuxedo is now part of the Oracle Fusion Middleware.

The Job Entry Subsystem (JES) is a component of IBM's MVS mainframe operating systems that is responsible for managing batch workloads. In modern times, there are two distinct implementations of the Job Entry System called JES2 and JES3. They are designed to provide efficient execution of batch jobs.

HTCondor is an open-source high-throughput computing software framework for coarse-grained distributed parallelization of computationally intensive tasks. It can be used to manage workload on a dedicated cluster of computers, or to farm out work to idle desktop computers – so-called cycle scavenging. HTCondor runs on Linux, Unix, Mac OS X, FreeBSD, and Microsoft Windows operating systems. HTCondor can integrate both dedicated resources and non-dedicated desktop machines into one computing environment.

In computing, a task is a unit of execution or a unit of work. The term is ambiguous; precise alternative terms include process, light-weight process, thread, step, request, or query. In the adjacent diagram, there are queues of incoming work to do and outgoing completed work, and a thread pool of threads to perform this work. Either the work units themselves or the threads that perform the work can be referred to as "tasks", and these can be referred to respectively as requests/responses/threads, incoming tasks/completed tasks/threads, or requests/responses/tasks.

The System Display and Search Facility (SDSF) is a component of IBM's mainframe operating system, z/OS, which allows users and administrators to view and control various aspects of the mainframe's operation and system resources using an interactive user interface. Some of the information displayed in SDSF includes Batch job output, Unix processes, scheduling environments, and the status of external devices such as printers and network lines, batch and system log files and dumps.

In system software, a job queue, is a data structure maintained by job scheduler software containing jobs to run.

OS/360, officially known as IBM System/360 Operating System, is a discontinued batch processing operating system developed by IBM for their then-new System/360 mainframe computer, announced in 1964; it was influenced by the earlier IBSYS/IBJOB and Input/Output Control System (IOCS) packages for the IBM 7090/7094 and even more so by the PR155 Operating System for the IBM 1410/7010 processors. It was one of the earliest operating systems to require the computer hardware to include at least one direct access storage device.

In computing, job control refers to the control of multiple tasks or jobs on a computer system, ensuring that they each have access to adequate resources to perform correctly, that competition for limited resources does not cause a deadlock where two or more jobs are unable to complete, resolving such situations where they do occur, and terminating jobs that, for any reason, are not performing as expected.

A process is a program in execution, and an integral part of any modern-day operating system (OS). The OS must allocate resources to processes, enable processes to share and exchange information, protect the resources of each process from other processes and enable synchronization among processes. To meet these requirements, the OS must maintain a data structure for each process, which describes the state and resource ownership of that process, and which enables the OS to exert control over each process.

<span class="mw-page-title-main">Scripting language</span> Programming language designed for scripting

In computing, a script is a relatively short and simple set of instructions that typically automate an otherwise manual process. The act of writing a script is called scripting. Scripting language or script language describes a programming language that is used for scripting.

gLite is a middleware computer software project for grid computing used by the CERN LHC experiments and other scientific domains. It was implemented by collaborative efforts of more than 80 people in 12 different academic and industrial research centers in Europe. gLite provides a framework for building applications tapping into distributed computing and storage resources across the Internet. The gLite services were adopted by more than 250 computing centres, and used by more than 15000 researchers in Europe and around the world.

OS 2200 is the operating system for the Unisys ClearPath Dorado family of mainframe systems. The operating system kernel of OS 2200 is a lineal descendant of Exec 8 for the UNIVAC 1108 and was previously known as OS 1100. Documentation and other information on current and past Unisys systems can be found on the Unisys public support website.

The Slurm Workload Manager, formerly known as Simple Linux Utility for Resource Management (SLURM), or simply Slurm, is a free and open-source job scheduler for Linux and Unix-like kernels, used by many of the world's supercomputers and computer clusters.

References

↑ Effect of Job Size Characteristics on Job Scheduling Performance

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Effect of Job Size Characteristics on Job Scheduling Performance

[1]