Workload Manager

In IBM mainframes, Workload Manager (WLM) is a base component of MVS/ESA mainframe operating system, and its successors up to and including z/OS. It controls the access to system resources for the work executing on z/OS based on user defined goals. Workload Manager components also exist for other operating systems. For example an IBM Workload Manager is also a software product for AIX operating system.

z/OS Workload Manager

On a mainframe computer many different applications execute at the same time. The expectation for the work which executes is consistent execution times and predictable access to databases. On z/OS the component Workload Manager (WLM) fulfills these needs by controlling the access to the system resources of the work based on external specifications by the administrator of the system.

The system administrator classifies the work to service classes. The classification mechanism uses work attributes like transaction names, user identifications or program names which are known to the applications. In addition the system administrator defines goals and importance levels for the service classes which represent the application work. The goals define user expectation for the work and can be expressed as response times; a relative speed or as discretionary if no specific requirement exists. The response time describes the duration for the work requests after they entered the system and until the application signals to WLM that the execution is completed. WLM is now interested to assure that the average response time of a set of work requests ends in the expected time or that a percentage of work requests fulfill the expectations of the end user.

The definition of a response time also requires that the applications communicate with WLM. If this is not possible a relative speed measure – named execution velocity - is used to describe the end user expectation to the system. This measurement is based on system states which are continuously collected. The system states describe when a work request uses a system resource and when it must wait for it because it is used by other work. The latter is named a delay state. The quotient of all using states to all productive states (using and delay states) multiplied by 100 is the execution velocity. This measurement does not require any communication of the application with the WLM component but it is also more abstract than a response time goal.

Finally the system administrator assigns an importance to each service class to tell WLM which service classes should get preferred access to system resources if the system load is too high to allow all work to execute. The service classes and goal definitions are organized in service policies together with other constructs for reporting and further controlling and saved as a service definition for access to WLM. The active service definition is saved on a couple data set which allows all z/OS systems of a parallel sysplex cluster to access and execute towards the same performance goals.

WLM is a closed control mechanism which collects continuously data about the work and system resources; compares the collected and aggregated measurements with the user definitions from the service definition and adjusts the access of the work to the system resources if the user expectations have not been achieved. This mechanism runs continuously in pre-defined time intervals. In order to compare the collected data with the goal definitions a performance index is calculated. The performance index for a service class is a single number which tells whether the goal definition could be met, has been overachieved or was missed. WLM modifies the access of the service classes based on the achieved performance index and importance. For this it uses the collected data to project the possibility and result of a change. The change is executed if the forecast comes to the result that it is beneficial for the work based on the defined customer expectations. It should be noted that WLM uses a data base ranging from 20 seconds to 20 minutes to contain a statistically relevant basis of samples for its calculations. Also in one decision interval a change is performed for the benefit of one service class to maintain a controlled and predictable system.

WLM controls the access of the work to the system processors, the I/O units, the system storage and starts and stops processes for work execution. The access to the system processors for example is controlled by a dispatch priority which defines a relative ranking between the units of work which want to execute. The same dispatch priority is assigned to all units of work which were classified to the same service class. As already stated the dispatch priority is not fixed and not simply derived from the importance of the service class. It changes based on goal achievement, system utilization and demand of the work for the system processors. Similar mechanisms exist for controlling all other system resources. This way of z/OS Workload Manager controlling the access of work to system resources is named goal oriented workload management and is in contrast to resource entitlement based workload management which defines a much more static relationship how work can access the system resources. Resource entitlement based workload management is found on larger UNIX operating systems for example.

Literature

Paola Bari et al: System Programmer's Guide to: Workload Management. IBM Redbook, SG24-6472

Weblinks

Official z/OS WLM Homepage

z/OS Workload Manager

Literature

Weblinks

See also