Workload Manager

Last updated

In IBM mainframes, Workload Manager (WLM) is a base component of MVS/ESA mainframe operating system, and its successors up to and including z/OS. It controls the access to system resources for the work executing on z/OS based on administrator-defined goals. Workload Manager components also exist for other operating systems. For example, an IBM Workload Manager is also a software product for AIX operating system.

Contents

Workload Manager

On a mainframe computer many different applications execute at the same time. The expectations for executing work are consistent execution times and predictable access to databases. On z/OS the Workload Manager (WLM) component fulfills these needs by controlling work's access to system resources based on external specifications by the system administrator.

The system administrator classifies work to service classes. The classification mechanism uses work attributes like transaction names, user identifications or program names which specific applications are known to use. In addition the system administrator defines goals and importance levels for the service classes representing the application work. The goals define performance expectations for the work. Goals can be expressed as response times, a relative speed (termed velocity) or as discretionary if no specific requirement exists. The response time describes the duration for the work requests after they entered the system and until the application signals to WLM that the execution is completed. WLM is now interested to assure that the average response time of a set of work requests ends in the expected time or that a percentage of work requests fulfill the expectations of the end user.

The definition of a response time also requires that the applications communicate with WLM. If this is not possible a relative speed measure – named execution velocity - is used to describe the end user expectation to the system.

Definition of Execution Velocity

This measurement is based on system states which are continuously collected. The system states describe when a work request uses a system resource and when it must wait for it because it is used by other work. The latter is named a delay state. The quotient of all using states to all productive states (using and delay states) multiplied by 100 is the execution velocity. This measurement does not require any communication of the application with the WLM component but it is also more abstract than a response time goal.

Finally the system administrator assigns an importance to each service class to tell WLM which service classes should get preferred access to system resources if the system load is too high to allow all work to execute. The service classes and goal definitions are organized in service policies together with other constructs for reporting and further controlling and saved as a service definition for access to WLM. The active service definition is saved on a couple data set which allows all z/OS systems of a Parallel Sysplex cluster to access and execute towards the same performance goals.

WLM is a closed control mechanism which continuously collects data about the work and system resources; compares the collected and aggregated measurements with the user definitions from the service definition and adjusts the access of the work to the system resources if the user expectations have not been achieved. This mechanism runs continuously in pre-defined time intervals. In order to compare the collected data with the goal definitions a performance index is calculated.

Definition of Performance Index


The performance index for a service class is a single number which tells whether the goal definition could be met, has been overachieved or was missed. WLM modifies the access of the service classes based on the achieved performance index and importance. For this it uses the collected data to project the possibility and result of a change. The change is executed if the forecast comes to the result that it is beneficial for the work based on the defined customer expectations. WLM uses a data base ranging from 20 seconds to 20 minutes to contain a statistically relevant basis of samples for its calculations. Also in one decision interval a change is performed for the benefit of one service class to maintain a controlled and predictable system.

WLM controls the access of the work to the system processors, the I/O units, the system storage and starts and stops processes for work execution. The access to the system processors for example is controlled by a dispatch priority which defines a relative ranking between the units of work which want to execute. The same dispatch priority is assigned to all units of work which were classified to the same service class. As already stated the dispatch priority is not fixed and not simply derived from the importance of the service class. It changes based on goal achievement, system utilization and demand of the work for the system processors. Similar mechanisms exist for controlling all other system resources. This way of z/OS Workload Manager controlling the access of work to system resources is named goal oriented workload management and is in contrast to resource entitlement based workload management which defines a much more static relationship how work can access the system resources. Resource entitlement based workload management is found on larger UNIX operating systems for example.

A major difference to workload management components on other operating systems is the close cooperation between z/OS Workload Manager and the major applications; middleware and subsystems executing on z/OS. WLM offers interfaces which allow the subsystems to tell WLM when a unit of work starts and ends in the system and to pass classification attributes which can be used by the system administrator to classify the work on the system. In addition WLM offers interfaces which allow load balancing components to place work requests on the best suited system in a parallel sysplex cluster. Additional instrumentation exists which helps database and resource managers to signal contention situations to WLM so that WLM can help the delayed work by promoting the holder of resource locks and latches.

Over time z/OS Workload Manager became the central control component for all performance related aspects in a z/OS operating system. In a Parallel Sysplex cluster the z/OS Workload Manager components work together to provide a single image view for the executing applications on the cluster. On a System z with multiple virtual partitions z/OS WLM allows to interoperate with the LPAR Hypervisor to influence the weighting of the z/OS partitions and to control the amount of CPU capacity which can be consumed by the logical partitions.

Literature

See also

Related Research Articles

Mainframe computer Computers used primarily by large organizations for business-critical applications

A mainframe computer, informally called a mainframe or big iron, is a computer used primarily by large organizations for critical applications, bulk data processing. A mainframe computer is larger and has more processing power than some other classes of computers, such as minicomputers, servers, workstations, and personal computers. Most large-scale computer-system architectures were established in the 1960s, but they continue to evolve. Mainframe computers are often used as servers.

z/OS 64-bit operating system for IBM mainframes

z/OS is a 64-bit operating system for IBM z/Architecture mainframes, introduced by IBM in October 2000. It derives from and is the successor to OS/390, which in turn followed a string of MVS versions. Like OS/390, z/OS combines a number of formerly separate, related products, some of which are still optional. z/OS has the attributes of modern operating systems, but also retains much of the older functionality originated since the 1960s and still in regular use—z/OS is designed for backward compatibility.

Multiprocessing is the use of two or more central processing units (CPUs) within a single computer system. The term also refers to the ability of a system to support more than one processor or the ability to allocate tasks between them. There are many variations on this basic theme, and the definition of multiprocessing can vary with context, mostly as a function of how CPUs are defined.

CICS IBM mainframe transaction monitor

IBM CICS is a family of mixed-language application servers that provide online transaction management and connectivity for applications on IBM mainframe systems under z/OS and z/VSE.

In computing, a Parallel Sysplex is a cluster of IBM mainframes acting together as a single system image with z/OS. Used for disaster recovery, Parallel Sysplex combines data sharing and parallel computing to allow a cluster of up to 32 systems to share a workload for high performance and high availability.

A logical partition (LPAR) is a subset of a computer's hardware resources, virtualized as a separate computer. In effect, a physical machine can be partitioned into multiple logical partitions, each hosting a separate instance of an operating system.

A job scheduler is a computer application for controlling unattended background program execution of jobs. This is commonly called batch scheduling, as execution of non-interactive jobs is often called batch processing, though traditional job and batch are distinguished and contrasted; see that page for details. Other synonyms include batch system, distributed resource management system (DRMS), distributed resource manager (DRM), and, commonly today, workload automation (WLA). The data structure of jobs to run is known as the job queue.

The System Display and Search Facility (SDSF) component of IBM's mainframe operating system, z/OS, is an interactive user interface that allows users and administrators to view and control various aspects of the mainframe's operation and system resources. Some of the information displayed in SDSF includes Batch job output, Unix processes, scheduling environments, and status of external devices such as printers and network lines. SDSF is primarily used to access the batch and system log files and dumps.

IBM Z Family name used by IBM for its non-POWER mainframe computers from the Z900 on

IBM Z is a family name used by IBM for all of its z/Architecture mainframe computers from the z900 on. In July 2017, with another generation of products, the official family was changed to IBM Z from IBM z Systems; the IBM Z family now includes the newest model the IBM z15, as well as the z14 and the z13, the IBM zEnterprise models, the IBM System z10 models, the IBM System z9 models and IBM eServer zSeries models.

Teleprocessing Network Simulator (TPNS) is an IBM licensed program, first released in 1976 as a test automation tool to simulate one or many network terminal(s) to a mainframe computer system, for functional testing, regression testing, system testing, capacity management, benchmarking and stress testing. In 2002, IBM re-packaged TPNS and released Workload Simulator for z/OS and S/390 (WSim) as a successor product.

Global Resource Serialization (GRS) is the component within the IBM z/OS operating system responsible for enabling fair access to serially reusable computing resources, such as datasets and tape drives or virtual resources, such as lists, queues, and control blocks. Programs can request exclusive access to a resource, usually requested when a program needs to update the resource or shared access, usually requested when a program only needs to query the state of the resource. GRS manages all requests in FIFO order.

On IBM mainframes running the z/OS operating system, Intelligent Resource Director (IRD) is software that automates the management of CPU resources and certain I/O resources.

In IBM mainframe operating systems from the OS/360 and successors line, a Unit Control Block (UCB) is a memory structure, or a control block, that describes any single input/output peripheral device (unit), or an exposure (alias), to the operating system. Certain data within the UCB also instructs the Input/Output Supervisor (IOS) to use certain closed subroutines in addition to normal IOS processing for additional physical device control.

HiperDispatch is a workload dispatching feature found in the newest IBM mainframe models running recent releases of z/OS. HiperDispatch was introduced in February 2008. Support was added to z/VM in its V6R3 release on July 26, 2013.

SynfiniWay was middleware with which a virtualised IT framework can be created that provides a uniform and global view of resources within a department, a company, or a company with its suppliers. This virtualised IT framework is service-oriented, meaning that applications are run as services, which are a system-independent view of applications. Several applications can be linked in a workflow, and data exchange between the applications participating in the workflow is implicitly managed by the IT framework. SynfiniWay is platform-independent, allowing almost any distributed heterogeneous platform to be linked into its virtualised IT framework.

Endevor is a source code management and release management tool for mainframe computers running z/OS . It is part of a family of administration tools by CA Technologies, which is used to maintain software applications and track their versions as well as automate lifecycle activities like builds and deployments.

IBM WebSphere Application Server for z/OS is one of the platform implementations of IBM's WebSphere Application Server family. The latest version is Version 9.0.

Linux on IBM Z

Linux on IBM Z is the collective term for the Linux operating system compiled to run on IBM mainframes, especially IBM Z and IBM LinuxONE servers. Similar terms which imply the same meaning are Linux on zEnterprise, Linux on zSeries, Linux/390, Linux/390x, etc.

IBM Z System Automation is a policy-based automation solution to ensure the availability of applications and system resources. It runs within IBM Z NetView, and uses its capabilities to interact with z/OS.

Kubernetes is an open-source container-orchestration system for automating computer application deployment, scaling, and management. It was originally designed by Google and is now maintained by the Cloud Native Computing Foundation. It aims to provide a "platform for automating deployment, scaling, and operations of application containers across clusters of hosts". It works with a range of container tools and runs containers in a cluster, often with images built using Docker. Kubernetes originally interfaced with the Docker runtime through a "Dockershim"; however, the shim has since been deprecated in favor of directly interfacing with the container through containerd, or replacing Docker with a runtime that is compliant with the Container Runtime Interface (CRI) introduced by Kubernetes in 2016.