Computerized batch processing is a method of running software programs called jobs in batches automatically. While users are required to submit the jobs, no other interaction by the user is required to process the batch. Batches may automatically be run at scheduled times as well as being run contingent on the availability of computer resources.
The term "batch processing" originates in the traditional classification of methods of production as job production (one-off production), batch production (production of a "batch" of multiple items at once, one stage at a time), and flow production (mass production, all stages in process at once).
Early computers were capable of running only one program at a time. Each user had sole control of the machine for a scheduled period of time. They would arrive at the computer with program and data, often on punched paper cards and magnetic or paper tape, and would load their program, run and debug it, and carry off their output when done.
As computers became faster the setup and takedown time became a larger percentage of available computer time. Programs called monitors, the forerunners of operating systems, were developed which could process a series, or "batch", of programs, often from magnetic tape prepared offline. The monitor would be loaded into the computer and run the first job of the batch. At the end of the job it would regain control and load and run the next until the batch was complete. Often the output of the batch would be written to magnetic tape and printed or punched offline. Examples of monitors were IBM's Fortran Monitor System, SOS (Share Operating System), and finally IBSYS for IBM's 709x systems in 1960. [1] [2]
Third-generation computers [ clarification needed ] [3] capable of multiprogramming began to appear in the 1960s. Instead of running one batch job at a time, these systems can have multiple batch programs running at the same time in order to keep the system as busy as possible. One or more programs might be awaiting input, one actively running on the CPU, and others generating output. Instead of offline input and output, programs called spoolers read jobs from cards, disk, or remote terminals and place them in a job queue to be run. In order to prevent deadlocks the job scheduler needs to know each job's resource requirements—memory, magnetic tapes, mountable disks, etc., so various scripting languages were developed to supply this information in a structured way. Probably the most well-known is IBM's Job Control Language (JCL). Job schedulers select jobs to run according to a variety of criteria, including priority, memory size, etc. Remote batch is a procedure for submitting batch jobs from remote terminals, often equipped with a punch card reader and a line printer. [4] Sometimes asymmetric multiprocessing is used to spool batch input and output for one or more large computers using an attached smaller and less-expensive system, as in the IBM System/360 Attached Support Processor. [lower-alpha 1]
The first general purpose time sharing system, Compatible Time-Sharing System (CTSS), was compatible with batch processing. This facilitated transitioning from batch processing to interactive computing. [5]
From the late 1960s onwards, interactive computing such as via text-based computer terminal interfaces (as in Unix shells or read-eval-print loops), and later graphical user interfaces became common. Non-interactive computation, both one-off jobs such as compilation, and processing of multiple items in batches, became retrospectively referred to as batch processing, and the term batch job (in early use often "batch of jobs") became common. Early use is particularly found at the University of Michigan, around the Michigan Terminal System (MTS). [6]
Although timesharing did exist, its use was not robust enough for corporate data processing; none of this was related to the earlier unit record equipment, which was human-operated.
Non-interactive computation remains pervasive in computing, both for general data processing and for system "housekeeping" tasks (using system software). A high-level program (executing multiple programs, with some additional "glue" logic) is today most often called a script, and written in scripting languages, particularly shell scripts for system tasks; in IBM PC DOS and MS-DOS this is instead known as a batch file. That includes UNIX-based computers, Microsoft Windows, macOS (whose foundation is the BSD Unix kernel), and even smartphones. A running script, particularly one executed from an interactive login session, is often known as a job, but that term is used very ambiguously.
"There is no direct counterpart to z/OS batch processing in PC or UNIX systems. Batch jobs are typically executed at a scheduled time or on an as-needed basis. Perhaps the closest comparison is with processes run by an at or cron command in UNIX, although the differences are significant." [7]
Batch applications are still critical in most organizations in large part because many common business processes are amenable to batch processing. While online systems can also function when manual intervention is not desired, they are not typically optimized to perform high-volume, repetitive tasks. Therefore, even new systems usually contain one or more batch applications for updating information at the end of the day, generating reports, printing documents, and other non-interactive tasks that must complete reliably within certain business deadlines.
Some applications are amenable to flow processing, namely those that only need data from a single input at once (not totals, for instance): start the next step for each input as it completes the previous step. In this case flow processing lowers latency for individual inputs, allowing them to be completed without waiting for the entire batch to finish. However, many applications require data from all records, notably computations such as totals. In this case the entire batch must be completed before one has a usable result: partial results are not usable.
Modern batch applications make use of modern batch frameworks such as Jem The Bee, Spring Batch [8] or implementations of JSR 352 [9] written for Java, and other frameworks for other programming languages, to provide the fault tolerance and scalability required for high-volume processing. In order to ensure high-speed processing, batch applications are often integrated with grid computing solutions to partition a batch job over a large number of processors, although there are significant programming challenges in doing so. High volume batch processing places particularly heavy demands on system and application architectures as well. Architectures that feature strong input/output performance and vertical scalability, including modern mainframe computers, tend to provide better batch performance than alternatives.
Scripting languages became popular as they evolved along with batch processing. [10]
A batch window is "a period of less-intensive online activity", [11] when the computer system is able to run batch jobs without interference from, or with, interactive online systems.
A bank's end-of-day (EOD) jobs require the concept of cutover, where transaction and data are cut off for a particular day's batch activity ("deposits after 3 PM will be processed the next day").
As requirements for online systems uptime expanded to support globalization, the Internet, and other business needs, the batch window shrank [12] [13] and increasing emphasis was placed on techniques that would require online data to be available for a maximum amount of time.
The batch size refers to the number of work units to be processed within one batch operation. Some examples are:
The IBM mainframe z/OS operating system or platform has arguably the most highly refined and evolved set of batch processing facilities owing to its origins, long history, and continuing evolution. Today such systems commonly support hundreds or even thousands of concurrent online and batch tasks within a single operating system image. Technologies that aid concurrent batch and online processing include Job Control Language (JCL), scripting languages such as REXX, Job Entry Subsystem (JES2 and JES3), Workload Manager (WLM), Automatic Restart Manager (ARM), Resource Recovery Services (RRS), IBM Db2 data sharing, Parallel Sysplex, unique performance optimizations such as HiperDispatch, I/O channel architecture, and several others.
The Unix programs cron
, at
, and batch
(today batch
is a variant of at
) allow for complex scheduling of jobs. Windows has a job scheduler. Most high-performance computing clusters use batch processing to maximize cluster usage. [15]
In computing, multitasking is the concurrent execution of multiple tasks over a certain period of time. New tasks can interrupt already started ones before they finish, instead of waiting for them to end. As a result, a computer executes segments of multiple tasks in an interleaved manner, while the tasks share common processing resources such as central processing units (CPUs) and main memory. Multitasking automatically interrupts the running program, saving its state and loading the saved state of another program and transferring control to it. This "context switch" may be initiated at fixed time intervals, or the running program may be coded to signal to the supervisory software when it can be interrupted.
Multiple Virtual Storage, more commonly called MVS, is the most commonly used operating system on the System/370, System/390 and IBM Z IBM mainframe computers. IBM developed MVS, along with OS/VS1 and SVS, as a successor to OS/360. It is unrelated to IBM's other mainframe operating system lines, e.g., VSE, VM, TPF.
A mainframe computer, informally called a mainframe or big iron, is a computer used primarily by large organizations for critical applications like bulk data processing for tasks such as censuses, industry and consumer statistics, enterprise resource planning, and large-scale transaction processing. A mainframe computer is large but not as large as a supercomputer and has more processing power than some other classes of computers, such as minicomputers, servers, workstations, and personal computers. Most large-scale computer-system architectures were established in the 1960s, but they continue to evolve. Mainframe computers are often used as servers.
In computing, time-sharing is the concurrent sharing of a computing resource among many tasks or users by giving each task or user a small slice of processing time. This quick switch between tasks or users gives the illusion of simultaneous execution. It enables multi-tasking by a single user or enables multiple-user sessions.
Computer operating systems (OSes) provide a set of functions needed and used by most application programs on a computer, and the links needed to control and synchronize computer hardware. On the first computers, with no operating system, every program needed the full hardware specification to run correctly and perform standard tasks, and its own drivers for peripheral devices like printers and punched paper card readers. The growing complexity of hardware and application programs eventually made operating systems a necessity for everyday use.
The Conversational Monitor System is a simple interactive single-user operating system. CMS was originally developed as part of IBM's CP/CMS operating system, which went into production use in 1967. CMS is part of IBM's VM family, which runs on IBM mainframe computers. VM was first announced in 1972, and is still in use today as z/VM.
In computing, Interactive System Productivity Facility (ISPF) is a software product for many historic IBM mainframe operating systems and currently the z/OS and z/VM operating systems that run on IBM mainframes. It includes a screen editor, the user interface of which was emulated by some microcomputer editors sold commercially starting in the late 1980s, including SPF/PC.
Time Sharing Option (TSO) is an interactive time-sharing environment for IBM mainframe operating systems, including OS/360 MVT, OS/VS2 (SVS), MVS, OS/390, and z/OS.
The Compatible Time-Sharing System (CTSS) was the first general purpose time-sharing operating system. Compatible Time Sharing referred to time sharing which was compatible with batch processing; it could offer both time sharing and batch processing concurrently.
Job Control Language (JCL) is a scripting language used on IBM mainframe operating systems to instruct the system on how to run a batch job or start a subsystem. The purpose of JCL is to say which programs to run, using which files or devices for input or output, and at times to also indicate under what conditions to skip a step. Parameters in the JCL can also provide accounting information for tracking the resources used by a job as well as which machine the job should run on.
In computing, spooling is a specialized form of multi-programming for the purpose of copying data between different devices. In contemporary systems, it is usually used for mediating between a computer application and a slow peripheral, such as a printer. Spooling allows programs to "hand off" work to be done by the peripheral and then proceed to other tasks, or to not begin until input has been transcribed. A dedicated program, the spooler, maintains an orderly sequence of jobs for the peripheral and feeds it data at its own rate. Conversely, for slow input peripherals, such as a card reader, a spooler can maintain a sequence of computational jobs waiting for data, starting each job when all of the relevant input is available; see batch processing. The spool itself refers to the sequence of jobs, or the storage area where they are held. In many cases, the spooler is able to drive devices at their full rated speed with minimal impact on other processing.
The Symbolic Stream Generator is a software productivity aid by Unisys for their mainframe computers of the former UNIVAC 1100/2200 series.
In computing, a shell is a computer program that exposes an operating system's services to a human user or other programs. In general, operating system shells use either a command-line interface (CLI) or graphical user interface (GUI), depending on a computer's role and particular operation. It is named a shell because it is the outermost layer around the operating system.
A job scheduler is a computer application for controlling unattended background program execution of jobs. This is commonly called batch scheduling, as execution of non-interactive jobs is often called batch processing, though traditional job and batch are distinguished and contrasted; see that page for details. Other synonyms include batch system, distributed resource management system (DRMS), distributed resource manager (DRM), and, commonly today, workload automation (WLA). The data structure of jobs to run is known as the job queue.
The System Display and Search Facility (SDSF) is a component of IBM's mainframe operating system, z/OS, which allows users and administrators to view and control various aspects of the mainframe's operation and system resources using an interactive user interface. Some of the information displayed in SDSF includes Batch job output, Unix processes, scheduling environments, and the status of external devices such as printers and network lines, batch and system log files and dumps.
In a non-interactive computer system, particularly IBM mainframes, a job stream, jobstream, or simply job is the sequence of job control language statements (JCL) and data that comprise a single "unit of work for an operating system". The term job traditionally means a one-off piece of work, and is contrasted with a batch, but non-interactive computation has come to be called "batch processing", and thus a unit of batch processing is often called a job, or by the oxymoronic term batch job; see job for details. Performing a job consists of executing one or more programs. Each program execution, called a job step, jobstep, or step, is usually related in some way to the others in the job. Steps in a job are executed sequentially, possibly depending on the results of previous steps, particularly in batch processing.
In computing, job control refers to the control of multiple tasks or jobs on a computer system, ensuring that they each have access to adequate resources to perform correctly, that competition for limited resources does not cause a deadlock where two or more jobs are unable to complete, resolving such situations where they do occur, and terminating jobs that, for any reason, are not performing as expected.
In computing, a script is a relatively short and simple set of instructions that typically automate an otherwise manual process. The act of writing a script is called scripting. Scripting language or script language describes a programming language that is used for scripting.
A command-line interface (CLI) is a means of interacting with a computer program by inputting lines of text called command-lines. Command-line interfaces emerged in the mid-1960s, on computer terminals, as an interactive and more user-friendly alternative to the non-interactive interface available with punched cards.
A supercomputer operating system is an operating system intended for supercomputers. Since the end of the 20th century, supercomputer operating systems have undergone major transformations, as fundamental changes have occurred in supercomputer architecture. While early operating systems were custom tailored to each supercomputer to gain speed, the trend has been moving away from in-house operating systems and toward some form of Linux, with it running all the supercomputers on the TOP500 list in November 2017. In 2021, top 10 computers run for instance Red Hat Enterprise Linux (RHEL), or some variant of it or other Linux distribution e.g. Ubuntu.
IBSYS was an operating system for the 7090 that evolved from SOS (SHARE Operating System)
CTSS was called "compatible" in the sense that FMS could be run in B-core as a "back-ground" user, nearly as efficiently as on a bare machine, and also because programs compiled for FMS batch could be loaded and executed in the "foreground" time-sharing environment (with some limitations). ... This feature allowed the Computation Center to make the transition from batch to timesharing gradually
JSR 352, the open standard specification for Java batch processing. ... The programming languages used evolved over time based on what was available
a multi-user, shared and smart batch processing system improves the scale ..... Most HPC clusters are in Linux