Workload Manager

Last updated November 15, 2024

In IBM mainframes, Workload Manager (WLM) is a base component of MVS/ESA mainframe operating system, and its successors up to and including z/OS. It controls the access to system resources for the work executing on z/OS based on administrator-defined goals. Workload Manager components also exist for other operating systems. For example, an IBM Workload Manager is also a software product for AIX operating system.

Workload Manager

On a mainframe computer many differents applications execute at the same time. The expectations for executing work are consistent execution times and predictable access to databases. On z/OS the Workload Manager (WLM) component fulfills these needs by controlling work's access to system resources based on external specifications by the system administrator...

The system administrator classifies work to service classes. The classification mechanism uses work attributes like transaction names, user identifications or program names which specific applications are known to use. In addition the system administrator defines goals and importance levels for the service classes representing the application work. The goals define performance expectations for the work. Goals can be expressed as response times, a relative speed (termed velocity) or as discretionary if no specific requirement exists. The response time describes the duration for the work requests after they entered the system and until the application signals to WLM that the execution is completed. WLM is now interested to assure that the average response time of a set of work requests ends in the expected time or that a percentage of work requests fulfill the expectations of the end user.

The definition of a response time also requires that the applications communicate with WLM. If this is not possible a relative speed measure – named execution velocity - is used to describe the end user expectation to the system.

Definition of Execution Velocity
${\text{Execution Velocity}}=100\cdot {\frac {\text{Total Using Samples}}{{\text{Total Using Samples}}+{\text{Total Delay Samples}}}}$

This measurement is based on system states which are continuously collected. The system states describe when a work request uses a system resource and when it must wait for it because it is used by other work. The latter is named a delay state. The quotient of all using states to all productive states (using and delay states) multiplied by 100 is the execution velocity. This measurement does not require any communication of the application with the WLM component but it is also more abstract than a response time goal.

Finally the system administrator assigns an importance to each service class to tell WLM which service classes should get preferred access to system resources if the system load is too high to allow all work to execute. The service classes and goal definitions are organized in service policies together with other constructs for reporting and further controlling and saved as a service definition for access to WLM. The active service definition is saved on a couple data set which allows all z/OS systems of a Parallel Sysplex cluster to access and execute towards the same performance goals.

WLM is a closed control mechanism which continuously collects data about the work and system resources; compares the collected and aggregated measurements with the user definitions from the service definition and adjusts the access of the work to the system resources if the user expectations have not been achieved. This mechanism runs continuously in pre-defined time intervals. In order to compare the collected data with the goal definitions a performance index is calculated.

Definition of Performance Index
${\text{for Response Time: }}PI={\frac {\text{Actual Achieved Response Time}}{\text{Response Time Goal}}}$ ${\text{for Execution Velocity: }}PI={\frac {\text{Execution Velocity Goal}}{\text{Achieved Execution Velocity}}}$

The performance index for a service class is a single number which tells whether the goal definition could be met, has been overachieved or was missed. WLM modifies the access of the service classes based on the achieved performance index and importance. For this it uses the collected data to project the possibility and result of a change. The change is executed if the forecast comes to the result that it is beneficial for the work based on the defined customer expectations. WLM uses a data base ranging from 20 seconds to 20 minutes to contain a statistically relevant basis of samples for its calculations. Also in one decision interval a change is performed for the benefit of one service class to maintain a controlled and predictable system.

WLM controls the access of the work to the system processors, the I/O units, the system storage and starts and stops processes for work execution. The access to the system processors for example is controlled by a dispatch priority which defines a relative ranking between the units of work which want to execute. The same dispatch priority is assigned to all units of work which were classified to the same service class. As already stated the dispatch priority is not fixed and not simply derived from the importance of the service class. It changes based on goal achievement, system utilization and demand of the work for the system processors. Similar mechanisms exist for controlling all other system resources. This way of z/OS Workload Manager controlling the access of work to system resources is named goal oriented workload management and is in contrast to resource entitlement based workload management which defines a much more static relationship how work can access the system resources. Resource entitlement based workload management is found on larger UNIX operating systems for example.

A major difference to workload management components on other operating systems is the close cooperation between z/OS Workload Manager and the major applications; middleware and subsystems executing on z/OS. WLM offers interfaces which allow the subsystems to tell WLM when a unit of work starts and ends in the system and to pass classification attributes which can be used by the system administrator to classify the work on the system. In addition WLM offers interfaces which allow load balancing components to place work requests on the best suited system in a parallel sysplex cluster. Additional instrumentation exists which helps database and resource managers to signal contention situations to WLM so that WLM can help the delayed work by promoting the holder of resource locks and latches.

Over time z/OS Workload Manager became the central control component for all performance related aspects in a z/OS operating system. In a Parallel Sysplex cluster the z/OS Workload Manager components work together to provide a single image view for the executing applications on the cluster. On a System z with multiple virtual partitions z/OS WLM allows to interoperate with the LPAR Hypervisor to influence the weighting of the z/OS partitions and to control the amount of CPU capacity which can be consumed by the logical partitions.

Literature

Paola Bari et al.: System Programmer's Guide to: Workload Management. IBM Redbook, SG24-6472

External links

Official z/OS WLM Homepage

Related Research Articles

z/OS is a 64-bit operating system for IBM z/Architecture mainframes, introduced by IBM in October 2000. It derives from and is the successor to OS/390, which in turn was preceded by a string of MVS versions. Like OS/390, z/OS combines a number of formerly separate, related products, some of which are still optional. z/OS has the attributes of modern operating systems but also retains much of the older functionality that originated in the 1960s and is still in regular use—z/OS is designed for backward compatibility.

In software quality assurance, performance testing is in general a testing practice performed to determine how a system performs in terms of responsiveness and stability under a particular workload. It can also serve to investigate, measure, validate or verify other quality attributes of the system, such as scalability, reliability and resource usage.

Time Sharing Option (TSO) is an interactive time-sharing environment for IBM mainframe operating systems, including OS/360 MVT, OS/VS2 (SVS), MVS, OS/390, and z/OS.

IBM CICS is a family of mixed-language application servers that provide online transaction management and connectivity for applications on IBM mainframe systems under z/OS and z/VSE.

Adabas, a contraction of “adaptable database system," is a database package that was developed by Software AG to run on IBM mainframes. It was launched in 1971 as a non-relational database. As of 2019, Adabas is marketed for use on a wider range of platforms, including Linux, Unix, and Windows.

WebSphere Application Server (WAS) is a software product that performs the role of a web application server. More specifically, it is a software framework and middleware that hosts Java-based web applications. It is the flagship product within IBM's WebSphere software suite. It was initially created by Donald F. Ferguson, who later became CTO of Software for Dell. The first version was launched in 1998. This project was an offshoot from IBM HTTP Server team starting with the Domino Go web server.

In computing, a Parallel Sysplex is a cluster of IBM mainframes acting together as a single system image with z/OS. Used for disaster recovery, Parallel Sysplex combines data sharing and parallel computing to allow a cluster of up to 32 systems to share a workload for high performance and high availability.

A logical partition (LPAR) is a subset of a computer's hardware resources, virtualized as a separate computer. In effect, a physical machine can be partitioned into multiple logical partitions, each hosting a separate instance of an operating system.

A job scheduler is a computer application for controlling unattended background program execution of jobs. This is commonly called batch scheduling, as execution of non-interactive jobs is often called batch processing, though traditional job and batch are distinguished and contrasted; see that page for details. Other synonyms include batch system, distributed resource management system (DRMS), distributed resource manager (DRM), and, commonly today, workload automation (WLA). The data structure of jobs to run is known as the job queue.

The System Display and Search Facility (SDSF) is a component of IBM's mainframe operating system, z/OS, which allows users and administrators to view and control various aspects of the mainframe's operation and system resources using an interactive user interface. Some of the information displayed in SDSF includes Batch job output, Unix processes, scheduling environments, and the status of external devices such as printers and network lines, batch and system log files and dumps.

IBM Z is a family name used by IBM for all of its z/Architecture mainframe computers. In July 2017, with another generation of products, the official family was changed to IBM Z from IBM z Systems; the IBM Z family now includes the newest model, the IBM z16, as well as the z15, the z14, and the z13, the IBM zEnterprise models, the IBM System z10 models, the IBM System z9 models and IBM eServer zSeries models.

Teleprocessing Network Simulator (TPNS) is an IBM licensed program, first released in 1976 as a test automation tool to simulate the end-user activity of network terminal(s) to a mainframe computer system, for functional testing, regression testing, system testing, capacity management, benchmarking and stress testing.

Global Resource Serialization (GRS) is the component within the IBM z/OS operating system responsible for enabling fair access to serially reusable computing resources, such as datasets and tape drives or virtual resources, such as lists, queues, and control blocks. Programs can request exclusive access to a resource, usually requested when a program needs to update the resource or shared access, usually requested when a program only needs to query the state of the resource. GRS manages all requests in FIFO order.

On IBM mainframes running the z/OS operating system, Intelligent Resource Director (IRD) is software that automates the management of CPU resources and certain I/O resources.

In IBM mainframe operating systems OS/360 and its successors, a Unit Control Block (UCB) is a memory structure, or a control block, that describes any single input/output peripheral device (unit), or an exposure (alias), to the operating system. Certain data within the UCB also instructs the Input/Output Supervisor (IOS) to use certain closed subroutines in addition to normal IOS processing for additional physical device control.

HiperDispatch is a workload dispatching feature found in recent IBM mainframe models running recent releases of z/OS. HiperDispatch was introduced in February 2008. Support was added to z/VM in its V6R3 release on July 26, 2013.

SynfiniWay was middleware with which a virtualised IT framework can be created that provides a uniform and global view of resources within a department, a company, or a company with its suppliers. This virtualised IT framework is service-oriented, meaning that applications are run as services, which are a system-independent view of applications. Several applications can be linked in a workflow, and data exchange between the applications participating in the workflow is implicitly managed by the IT framework. SynfiniWay is platform-independent, allowing almost any distributed heterogeneous platform to be linked into its virtualised IT framework.

IBM WebSphere Application Server for z/OS is one of the platform implementations of IBM's WebSphere Application Server family. The latest version is Version 9.0.

IBM Z System Automation is a policy-based automation solution to ensure the availability of applications and system resources. It runs within IBM Z NetView, and uses its capabilities to interact with z/OS.

Kubernetes is an open-source container orchestration system for automating software deployment, scaling, and management. Originally designed by Google, the project is now maintained by a worldwide community of contributors, and the trademark is held by the Cloud Native Computing Foundation.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

Workload Manager

Contents