Mainframe sort merge

Last updated

The Sort/Merge utility is a mainframe program to sort records in a file into a specified order, merge pre-sorted files into a sorted file, or copy selected records. Internally, these utilities use one or more of the standard sorting algorithms, often with proprietary fine-tuned code.

Contents

Mainframes were originally supplied with limited main memory by today's standards and the amount of data to be sorted was frequently very large. Because of this, unlike more recent sort programs, early Sort/Merge programs placed great emphasis on efficient techniques for sorting data on secondary storage, typically tape [lower-alpha 1] or disk. In 1968 the OS/360 Sort/Merge program provided five different "sequence distribution techniques" that could be used depending on the number and type of devices available. [1]

Historically, the "alias" SORT has been used to refer to an installation's preferred sort program, IBM's Sort/Merge, and third party Sort/Merge programs (i.e., SYNCSORT, CASORT). DFSORT is often referred to by its program name, ICEMAN (component ICE; the original OS/360 Sort/Merge program name was IERRCO00, component IER, also with "alias" SORT).

Virtual storage systems

Prior to the System/370, all IBM mainframe operating systems included sort/merge utilities. [lower-alpha 2] With the announcement of virtual storage operating systems, DOS/VS and OS/VS, IBM unbundled much of the software and offered chargeable sort/merge program products. For OS/VS IBM offered 5734-SM1, OS Sort/Merge, and later offered 5740-SM1, OS/VS Sort/Merge, subsequently renamed Data Facility Sort (DFSORT).

In 1990 IBM introduced a new merge algorithm called BLOCKSET in DFSORT the successor to OS/360 Sort/Merge. [2] Of historical note, the BLOCKSET algorithm was invented by an IBM Systems Engineer in 1963 and was discovered in IBM's archives and implemented in 1990. [3]

Usage

Sort/Merge is very frequently used; often the most commonly used application program in a mainframe shop generally consuming about twenty percent of the processing power of the shop.

Modern Sort/Merge programs also can copy files, select or omit certain records, summarize records, remove duplicates, reformat records, append new data and produce reports. Indeed, most Sort/Merge applications use the wide range of additional processing capabilities, rather than purely sorting or merging records: the Sort/Merge product is a very fast way of performing input to and output from these functions. Quite a number of "user exits" are supported, and these may be load modules (i.e., a member of a library), or object decks (i.e., the output of an assembler), with the Sort/Merge application loading (load modules) or linking (object decks; termed "dynamic link editing" in DFSORT) the exit, as specified and required. Working storage datasets (i.e., SORTWK01, ..., SORTWKnn) may be disk or tape, although the BLOCKSET algorithm is restricted to disk working storage; more working storage datasets generally improves performance.

Competition

Sort/merge is important enough that there are multiple companies each selling their own sort/merge package for IBM mainframes and their z/OS, z/VM and z/VSE operating systems. These programs are largely compatible with IBM's SORT programs, often with some extensions. The major Sort/Merge packages are:

(Some of these companies also sell versions for other platforms, such as Unix, Linux, or Windows.)

Migration

Sort/Merge is a critical component of many mainframe environments. When migrating from the mainframe to other platforms such as Unix, Linux or Windows, a Sort/Merge utility is needed; [4] MFSORT from Micro Focus and AHLSORT [5] emulate the functions of DFSORT outside of the Mainframe environment.

IBM OS/360 SORT

Prior to virtual storage operating systems, "The input data set [was] almost always too large to be brought into main storage and sorted all at once." SORT used a replacement selection technique to reduce storage usage. [1] The program placed emphasis on sequence distribution techniques, which could be defaulted depending on the number and type of devices available, or could be specified by the user, for making best use of secondary storage "sort work" (SORTWK) files. These techniques were methods of distributing partially sorted sequences of records most efficiently.

There were five distribution techniques available to the OS/360 SORT: [1]

IBM OS/VS SORT

The distribution techniques listed for tape sorts were retained by the OS/VS SORT program, now called "conventional techniques." The disk sort techniques were replaced by four new ones: [6]

See also

Notes

  1. In the 1950s and 1960s, tape was the most common medium for sort work files, but as the cost of direct access storage devices dropped, use of tape became rare except for extremely large files.
  2. CMS used the DOS/360 sort program

Related Research Articles

A disk operating system (DOS) is a computer operating system that resides on and can use a disk storage device, such as a floppy disk, hard disk drive, or optical disc. A disk operating system provides a file system for organizing, reading, and writing files on the storage disk, and a means for loading and running programs stored on that disk. Strictly, this definition does not include any other functionality, so it does not apply to more complex OSes, such as Microsoft Windows, and is more appropriately used only for older generations of operating systems.

<span class="mw-page-title-main">MVS</span> Operating system for IBM mainframes

Multiple Virtual Storage, more commonly called MVS, is the most commonly used operating system on the System/370, System/390 and IBM Z IBM mainframe computers. IBM developed MVS, along with OS/VS1 and SVS, as a successor to OS/360. It is unrelated to IBM's other mainframe operating system lines, e.g., VSE, VM, TPF.

<span class="mw-page-title-main">IBM System/360</span> IBM mainframe computer family (1964–1977)

The IBM System/360 (S/360) is a family of mainframe computer systems that was announced by IBM on April 7, 1964, and delivered between 1965 and 1978. It was the first family of computers designed to cover both commercial and scientific applications and a complete range of applications from small to large. The design distinguished between architecture and implementation, allowing IBM to release a suite of compatible designs at different prices. All but the only partially compatible Model 44 and the most expensive systems use microcode to implement the instruction set, featuring 8-bit byte addressing and binary, decimal and hexadecimal floating-point calculations.

A direct-access storage device (DASD) is a secondary storage device in which "each physical record has a discrete location and a unique address". The term was coined by IBM to describe devices that allowed random access to data, the main examples being drum memory and hard disk drives. Later, optical disc drives and flash memory units are also classified as DASD.

Disk formatting is the process of preparing a data storage device such as a hard disk drive, solid-state drive, floppy disk, memory card or USB flash drive for initial use. In some cases, the formatting operation may also create one or more new file systems. The first part of the formatting process that performs basic medium preparation is often referred to as "low-level formatting". Partitioning is the common term for the second part of the process, dividing the device into several sub-devices and, in some cases, writing information to the device allowing an operating system to be booted from it. The third part of the process, usually termed "high-level formatting" most often refers to the process of generating a new file system. In some operating systems all or parts of these three processes can be combined or repeated at different levels and the term "format" is understood to mean an operation in which a new disk medium is fully prepared to store files. Some formatting utilities allow distinguishing between a quick format, which does not erase all existing data and a long option that does erase all existing data.

Virtual Storage Access Method (VSAM) is an IBM direct-access storage device (DASD) file storage access method, first used in the OS/VS1, OS/VS2 Release 1 (SVS) and Release 2 (MVS) operating systems, later used throughout the Multiple Virtual Storage (MVS) architecture and now in z/OS. Originally a record-oriented filesystem, VSAM comprises four data set organizations: key-sequenced (KSDS), relative record (RRDS), entry-sequenced (ESDS) and linear (LDS). The KSDS, RRDS and ESDS organizations contain records, while the LDS organization simply contains a sequence of pages with no intrinsic record structure, for use as a memory-mapped file.

Job Control Language (JCL) is a name for scripting languages used on IBM mainframe operating systems to instruct the system on how to run a batch job or start a subsystem. The purpose of JCL is to say which programs to run, using which files or devices for input or output, and at times to also indicate under what conditions to skip a step. Parameters in the JCL can also provide accounting information for tracking the resources used by a job as well as which machine the job should run on.

Disk Operating System/360, also DOS/360, or simply DOS, is the discontinued first member of a sequence of operating systems for IBM System/360, System/370 and later mainframes. It was announced by IBM on the last day of 1964, and it was first delivered in June 1966. In its time, DOS/360 was the most widely used operating system in the world.

In the context of IBM mainframe computers in the S/360 line, a data set or dataset is a computer file having a record organization. Use of this term began with, e.g., DOS/360, OS/360, and is still used by their successors, including the current z/OS. Documentation for these systems historically preferred this term rather than file.

This article discusses support programs included in or available for OS/360 and successors. IBM categorizes some of these programs as utilities and others as service aids; the boundaries are not always consistent or obvious. Many, but not all, of these programs match the types in utility software.

In computer science, a record-oriented filesystem is a file system where data is stored as collections of records. This is in contrast to a byte-oriented filesystem, where the data is treated as an unformatted stream of bytes. There are several different possible record formats; the details vary depending on the particular system. In general the formats can be fixed-length or variable length, with different physical organizations or padding mechanisms; metadata may be associated with the file records to define the record length, or the data may be part of the record. Different access methods for records may be provided, for example records may be retrieved in sequential order, by key, or by record number.

In computing, channel I/O is a high-performance input/output (I/O) architecture that is implemented in various forms on a number of computer architectures, especially on mainframe computers. In the past, channels were generally implemented with custom devices, variously named channel, I/O processor, I/O controller, I/O synchronizer, or DMA controller.

Count key data (CKD) is a direct-access storage device (DASD) data recording format introduced in 1964, by IBM with its IBM System/360 and still being emulated on IBM mainframes. It is a self-defining format with each data record represented by a Count Area that identifies the record and provides the number of bytes in an optional Key Area and an optional Data Area. This is in contrast to devices using fixed sector size or a separate format track.

In IBM mainframe operating systems OS/360 and its successors, a Unit Control Block (UCB) is a memory structure, or a control block, that describes any single input/output peripheral device (unit), or an exposure (alias), to the operating system. Certain data within the UCB also instructs the Input/Output Supervisor (IOS) to use certain closed subroutines in addition to normal IOS processing for additional physical device control.

An access method is a function of a mainframe operating system that enables access to data on disk, tape or other external devices. Access methods were present in several mainframe operating systems since the late 1950s, under a variety of names; the name access method was introduced in 1963 in the IBM OS/360 operating system. Access methods provide an application programming interface (API) for programmers to transfer data to or from device, and could be compared to device drivers in non-mainframe operating systems, but typically provide a greater level of functionality.

The history of IBM mainframe operating systems is significant within the history of mainframe operating systems, because of IBM's long-standing position as the world's largest hardware supplier of mainframe computers. IBM mainframes run operating systems supplied by IBM and by third parties.

<span class="mw-page-title-main">OS/360 and successors</span> Operating system for IBM S/360 and later mainframes

OS/360, officially known as IBM System/360 Operating System, is a discontinued batch processing operating system developed by IBM for their then-new System/360 mainframe computer, announced in 1964; it was influenced by the earlier IBSYS/IBJOB and Input/Output Control System (IOCS) packages for the IBM 7090/7094 and even more so by the PR155 Operating System for the IBM 1410/7010 processors. It was one of the earliest operating systems to require the computer hardware to include at least one direct access storage device.

Basic Direct Access Method, or BDAM is an access method for IBM's OS/360 and successors computer operating systems on System/360 and later mainframes. BDAM "consists of routines used in retrieving data from, and storing data onto, direct access devices." BDAM is available on OS/360, OS/VS2, MVS, z/OS, and related high-end operating systems.

<span class="mw-page-title-main">IBM System/360 Model 20</span> Low-end IBM computer model from 1960s

The IBM System/360 Model 20 is the smallest member of the IBM System/360 family announced in November 1964. The Model 20 supports only a subset of the System/360 instruction set, with binary numbers limited to 16 bits and no floating point. In later years it would have been classified as a 16-bit minicomputer rather than a mainframe, but the term "minicomputer" was not current, and in any case IBM wanted to emphasize the compatibility of the Model 20 rather than its differences from the rest of the System/360 line. It does, however, have the full System/360 decimal instruction set, that allows for addition, subtraction, product, and dividend of up to 31 decimal digits.

References

  1. 1 2 3 IBM Corporation (1968). IBM System/360 Operating System Sort/Merge (GC28-6435-5) (PDF). pp. 16–17.
  2. "z/OS DFSORT Tuning Guide". 28 September 2013. Retrieved October 2, 2014.
  3. "Key Tag Sort". IBM Technical Information Exchange. June 22, 1963.
  4. Long, Larry. "The Top Five Mainframe Migration Issues Facing IT Leaders". Forbes.
  5. "IFL – A Cost Efficient zSeries Platform?". September 3, 2014.
  6. IBM Corporation (Sep 1979). OS/VS Sort/Merge Logic (PDF). p. 2. Retrieved June 21, 2021.