Data set (IBM mainframe)

Last updated

In the context of IBM mainframe computers in the S/360 line, a data set (IBM preferred) or dataset is a computer file having a record organization. Use of this term began with, e.g., DOS/360, OS/360, and is still used by their successors, including the current z/OS. Documentation for these systems historically preferred this term rather than file .

Contents

A data set is typically stored on a direct access storage device (DASD) or magnetic tape, [1] however unit record devices, such as punch card readers, card punches, line printers and page printers can provide input/output (I/O) for a data set (file). [2]

Data sets are not unstructured streams of bytes, but rather are organized in various logical record [3] and block structures determined by the DSORG (data set organization), RECFM (record format), and other parameters. These parameters are specified at the time of the data set allocation (creation), for example with Job Control Language DD statements. Within a running program they are stored in the Data Control Block (DCB) or Access Control Block (ACB), which are data structures used to access data sets using access methods.

Records in a data set may be fixed, variable, or “undefined” length. [4]

Data set organization

For OS/360, the DCB's DSORG parameter specifies how the data set is organized. It may be [5]

CQ
Queued Telecommunications Access Method (QTAM) in Message Control Program (MCP)
CX
Communications line group
DA
Basic Direct Access Method (BDAM)
GS
Graphics device for Graphics Access Method(GAM)
IS
Indexed Sequential Access Method (ISAM)
MQ
QTAM message queue in application
PO
Partitioned Organization
PS
Physical Sequential

among others. Data sets on tape may only be DSORG=PS. The choice of organization depends on how the data is to be accessed, and in particular, how it is to be updated.

Programmers utilize various access methods (such as QSAM or VSAM) in programs for reading and writing data sets. Access method depends on the given data set organization.

Record format (RECFM)

Regardless of organization, the physical structure of each record is essentially the same, and is uniform throughout the data set. This is specified in the DCB RECFM parameter. RECFM=F means that the records are of fixed length, specified via the LRECL parameter. RECFM=V specifies a variable-length record. V records when stored on media are prefixed by a Record Descriptor Word (RDW) containing the integer length of the record in bytes and flag bits. With RECFM=FB and RECFM=VB, multiple logical records are grouped together into a single physical block on tape or DASD. FB and VB are fixed-blocked, and variable-blocked, respectively. RECFM=U (undefined) is also variable length, but the length of the record is determined by the length of the block rather than by a control field.

The BLKSIZE parameter specifies the maximum length of the block. RECFM=FBS [6] could be also specified, meaning fixed-blocked standard, meaning all the blocks except the last one were required to be in full BLKSIZE length. RECFM=VBS, or variable-blocked spanned, means a logical record could be spanned across two or more blocks, with flags in the RDW indicating whether a record segment is continued into the next block and/or was continued from the previous one.

This mechanism eliminates the need for using any "delimiter" byte value to separate records. Thus data can be of any type, including binary integers, floating-point, or characters, without introducing a false end-of-record condition. The data set is an abstraction of a collection of records, in contrast to files as unstructured streams of bytes.

Partitioned data set

A partitioned data set (PDS) [7] is a data set containing multiple members, each of which holds a separate sub-data set, similar to a directory in other types of file systems. This type of data set is often used to hold load modules (old format bound executable programs), source program libraries (especially Assembler macro definitions), ISPF screen definitions, and Job Control Language. A PDS may be compared to a Zip file or COM Structured Storage.

A Partitioned Data Set can only be allocated on a single volume and have a maximum size of 65,535 tracks.

Besides members, a PDS contains also a directory. Each member can be accessed indirectly via the directory structure. Once a member is located, the data stored in that member are handled in the same manner as a PS (sequential) data set.

Whenever a member is deleted, the space it occupied is unusable for storing other data. Likewise, if a member is re-written, it is stored in a new spot at the back of the PDS and leaves wasted “dead” space in the middle. The only way to recover “dead” space is to perform file compression. [8] Compression, which is done using the IEBCOPY utility, [9] moves all members to the front of the data space and leaves free usable space at the back. (Note that in modern parlance, this kind of operation might be called defragmentation or garbage collection; data compression nowadays refers to a different, more complicated concept.) PDS files can only reside on DASD, not on magnetic tape, in order to use the directory structure to access individual members. Partitioned data sets are most often used for storing multiple job control language files, utility control statements, and executable modules.

An improvement of this scheme is a Partitioned Data Set Extended (PDSE or PDS/E, sometimes just libraries) introduced with DFSMSdfp for MVS/XA and MVS/ESA systems. A PDS/E library can store program objects or other types of members, but not both. BPAM cannot process a PDS/E containing program objects.

PDS/E structure is similar to PDS and is used to store the same types of data. However, PDS/E files have a better directory structure which does not require pre-allocation of directory blocks when the PDS/E is defined (and therefore does not run out of directory blocks if not enough were specified). Also, PDS/E automatically stores members in such a way that compression operation is not needed to reclaim "dead" space. [8] PDS/E files can only reside on DASD in order to use the directory structure to access individual members.

Generation Data Group

A Generation Data Group [10] (GDG) [11] is a group of non-VSAM data sets [12] that are successive generations of historically-related data [13] stored on an IBM mainframe (running OS or DOS/VSE). [14]

A GDG is usually cataloged. [13]

An individual member of the GDG collection is called a "Generation Data Set." [13] [15] The latter may be identified by an absolute number, ACCTG.OURGDG(1234), or a relative number: (-1) for the previous generation, (0) for the current one, and (+1) the next generation. [16]

GDG JCL & features

Generation Data Groups are defined using either the BLDG statement [17] of the IEHPROGM utility or the DEFINE GENERATIONGROUP statement [18] of the newer IDCAMS utility, [19] which allows setting various parameters.

IDCAMS can also delete (and optionally uncatalog) a GDG. [20]

Related Research Articles

<span class="mw-page-title-main">MVS</span> Operating system for IBM mainframes

Multiple Virtual Storage, more commonly called MVS, is the most commonly used operating system on the System/370, System/390 and IBM Z IBM mainframe computers. IBM developed MVS, along with OS/VS1 and SVS, as a successor to OS/360. It is unrelated to IBM's other mainframe operating system lines, e.g., VSE, VM, TPF.

A direct-access storage device (DASD) is a secondary storage device in which "each physical record has a discrete location and a unique address". The term was coined by IBM to describe devices that allowed random access to data, the main examples being drum memory and hard disk drives. Later, optical disc drives and flash memory units are also classified as DASD.

ISAM, an acronym for Indexed Sequential Access Method, is a method for creating, maintaining, and manipulating computer files of data so that records can be retrieved sequentially or randomly by one or more keys. Indexes of key fields are maintained to achieve fast retrieval of required file records in indexed files. IBM originally developed ISAM for mainframe computers, but implementations are available for most computer systems.

Disk formatting is the process of preparing a data storage device such as a hard disk drive, solid-state drive, floppy disk, memory card or USB flash drive for initial use. In some cases, the formatting operation may also create one or more new file systems. The first part of the formatting process that performs basic medium preparation is often referred to as "low-level formatting". Partitioning is the common term for the second part of the process, dividing the device into several sub-devices and, in some cases, writing information to the device allowing an operating system to be booted from it. The third part of the process, usually termed "high-level formatting" most often refers to the process of generating a new file system. In some operating systems all or parts of these three processes can be combined or repeated at different levels and the term "format" is understood to mean an operation in which a new disk medium is fully prepared to store files. Some formatting utilities allow distinguishing between a quick format, which does not erase all existing data and a long option that does erase all existing data.

Virtual Storage Access Method (VSAM) is an IBM direct-access storage device (DASD) file storage access method, first used in the OS/VS1, OS/VS2 Release 1 (SVS) and Release 2 (MVS) operating systems, later used throughout the Multiple Virtual Storage (MVS) architecture and now in z/OS. Originally a record-oriented filesystem, VSAM comprises four data set organizations: key-sequenced (KSDS), relative record (RRDS), entry-sequenced (ESDS) and linear (LDS). The KSDS, RRDS and ESDS organizations contain records, while the LDS organization simply contains a sequence of pages with no intrinsic record structure, for use as a memory-mapped file.

Job Control Language (JCL) is a name for scripting languages used on IBM mainframe operating systems to instruct the system on how to run a batch job or start a subsystem. The purpose of JCL is to say which programs to run, using which files or devices for input or output, and at times to also indicate under what conditions to skip a step. Parameters in the JCL can also provide accounting information for tracking the resources used by a job as well as which machine the job should run on.

This article discusses support programs included in or available for OS/360 and successors. IBM categorizes some of these programs as utilities and others as service aids; the boundaries are not always consistent or obvious. Many, but not all, of these programs match the types in utility software.

In computer science, a record-oriented filesystem is a file system where data is stored as collections of records. This is in contrast to a byte-oriented filesystem, where the data is treated as an unformatted stream of bytes. There are several different possible record formats; the details vary depending on the particular system. In general the formats can be fixed-length or variable length, with different physical organizations or padding mechanisms; metadata may be associated with the file records to define the record length, or the data may be part of the record. Different access methods for records may be provided, for example records may be retrieved in sequential order, by key, or by record number.

In the IBM System/360 storage architecture, the Volume Table of Contents (VTOC), is a data structure that provides a way of locating the data sets that reside on a particular DASD volume. With the exception of the IBM Z compatible disk layout in Linux on Z, it is the functional equivalent of the MS/PC DOS File Allocation Table (FAT), the NTFS Master File Table (MFT), and an inode table in a file system for a Unix-like system. The VTOC is not used to contain any IPLTEXT and does not have any role in the IPL process, therefore does not have any data used by or functionally equivalent to the MBR. It lists the names of each data set on the volume as well as size, location, and permissions. Additionally, it contains an entry for every area of contiguous free space on the volume. The third record on the first track of the first cylinder of any DASD volume is known as the volume label and must contain a pointer to the location of the VTOC. The location of the VTOC may be specified when the volume is initialized. For performance reasons it may be located as close to the center of the volume as possible, since it is referenced frequently. A VTOC is added to a DASD volume when it is initialized using the Device Support Facilities program, ICKDSF, in current systems.

<span class="mw-page-title-main">Disk sector</span> Logical or physical division of storage media

In computer disk storage, a sector is a subdivision of a track on a magnetic disk or optical disc. For most disks, each sector stores a fixed amount of user-accessible data, traditionally 512 bytes for hard disk drives (HDDs) and 2048 bytes for CD-ROMs and DVD-ROMs. Newer HDDs and SSDs use 4096-byte (4 KiB) sectors, which are known as the Advanced Format (AF).

<span class="mw-page-title-main">Hierarchical file system</span> Computer files in nested directories

In computing, a hierarchical file system is a file system that uses directories to organize files into a tree structure.

An access method is a function of a mainframe operating system that enables access to data on disk, tape or other external devices. Access methods were present in several mainframe operating systems since the late 1950s, under a variety of names; the name access method was introduced in 1963 in the IBM OS/360 operating system. Access methods provide an application programming interface (API) for programmers to transfer data to or from device, and could be compared to device drivers in non-mainframe operating systems, but typically provide a greater level of functionality.

In IBM mainframe operating systems, basic partitioned access method (BPAM) is an access method for libraries, called partitioned datasets (PDSes) in IBM terminology. BPAM is used in OS/360, OS/VS2, MVS, z/OS, and others.

In IBM mainframe operating systems, such as OS/360, MVS, z/OS, a Data Control Block (DCB) is a description of a dataset in a program. A DCB is coded in Assembler programs using the DCB macro instruction. High level language programmers use library routines containing DCBs.

In IBM mainframe operating systems, Basic sequential access method (BSAM) is an access method to read and write datasets sequentially. BSAM is available on OS/360, OS/VS2, MVS, z/OS, and related operating systems.

In IBM mainframe operating systems, Execute Channel Program (EXCP) is a macro generating a system call, implemented as a Supervisor Call instruction, for low-level device access, where the programmer is responsible for providing a channel program—a list of device-specific commands (CCWs)—to be executed by I/O channels, control units and devices. EXCP for OS/360 and successors is more specifically described in the OS System Programmer's Guide.; EXCP for DOS/360 and successors is more specifically described in DOS Supervisor and I/O Macros. This article mostly reflects OS/360 through z/OS; some details are different for TOS/360 and DOS/360 through z/VSE.

<span class="mw-page-title-main">OS/360 and successors</span> Operating system for IBM S/360 and later mainframes

OS/360, officially known as IBM System/360 Operating System, is a discontinued batch processing operating system developed by IBM for their then-new System/360 mainframe computer, announced in 1964; it was influenced by the earlier IBSYS/IBJOB and Input/Output Control System (IOCS) packages for the IBM 7090/7094 and even more so by the PR155 Operating System for the IBM 1410/7010 processors. It was one of the earliest operating systems to require the computer hardware to include at least one direct access storage device.

Basic Direct Access Method, or BDAM is an access method for IBM's OS/360 and successors computer operating systems on System/360 and later mainframes. BDAM "consists of routines used in retrieving data from, and storing data onto, direct access devices." BDAM is available on OS/360, OS/VS2, MVS, z/OS, and related high-end operating systems.

TERSE is an IBM archive file format that supports lossless compression. A TERSE file may contain a sequential data set, a partitioned data set (PDS), partitioned data set extended (PDSE), or a large format dataset (DSNTYPE=LARGE). Any record format (RECFM) is allowed as long as the record length is less than 32 K. Records may contain printer control characters.

Data Facility Storage Management Subsystem (DFSMS) is a central component of IBM's flagship operating system z/OS. It includes access methods, utilities and program management functions. Data Facility Storage Management Subsystem is also a collective name for a collection of several products, all but two of which are included in the DFSMS/MVS product.

References

  1. "What is a catalog?". IBM . Cataloging of data sets on magnetic tape ...
  2. "IBM Knowledge Center - Home of IBM product documentation". publib.boulder.ibm.com.
  3. "What is a data set?". IBM . data set .. a file that contains one or more records.
  4. "Data set record formats". IBM . Records are either fixed length or variable length in a given data set.
  5. "Section IV: The DD Statement -- DCB Parameter" (PDF). IBM System/3S0 Operating System: Job Control Language Reference - OS Release 21.7 (PDF). IBM Systems Reference Library. IBM. pp. 138–139. GC28-6704-4.
  6. "Example: Record format VBS". IBM . Variable-length, blocked, spanned (VBS)
  7. "Structure of a PDS", z/OS DFSMS Using Data Sets Version 2 Release 3 (PDF), October 2, 2018, SC23-6855-30
  8. 1 2 Stephens, David (Oct 2008). What On Earth is a Mainframe?. Lulu.com. p. 52. ISBN   978-1-4092-2535-5 . Retrieved May 11, 2018.
  9. "Compressing a Partitioned Data Set", z/OS DFSMSdfp Utilities Version 2 Release 3 (PDF), IBM Corporation, July 17, 2017, SC23-6864-30, A partitioned data set will contain unused areas (sometimes called gas) where a deleted member or the old version of an updated member once resided. This unused space is only reclaimed when a partitioned data set is copied to a new data set, or after a compress-in-place operation successfully completes. It has no meaning for a PDSE and is ignored if requested.
  10. "Generation Data Groups (GDG's), an Introduction with Examples". create and process a Generation Data Group or GDG on ...
  11. "JCL TUTORIAL REFERENCE - Generation Data Groups". Generation Data Groups (GDG)
  12. "What is a generation data group?". IBM.com. ... non-VSAM ...
  13. 1 2 3 "Generation data sets". IBM . successive, historically related,
  14. "VSE/VSAM Commands" (PDF).
  15. "A generation data set is one of ...
  16. "What is a GDG?".
  17. "BLDG (Build Generation Index) Statement" (PDF). OS Utilities (PDF). IBM Systems Reference Library (Sixteenth ed.). IBM. April 1973. p. 269. GC28-6586-15. Retrieved May 19, 2022.
  18. "Defining a Generation Data Group" (PDF). OS/VS Access Method Services (PDF) (Second ed.). IBM. May 1974. pp. 107–110. GC26-3836-1. Retrieved May 19, 2022.{{cite book}}: |work= ignored (help)
  19. "IBM How to create and use Generation Data Groups (GDG)". IBM . 2 March 2012. Create a GDG... IDCAMS will do it
  20. "IDCAMS – Create and delete GDG base using JCL".