Record-oriented filesystem

Last updated

In computer science, a record-oriented filesystem is a file system where data is stored as collections of records. This is in contrast to a byte-oriented filesystem, where the data is treated as an unformatted stream of bytes. There are several different possible record formats; the details vary depending on the particular system. In general the formats can be fixed-length or variable length, with different physical organizations or padding mechanisms; metadata may be associated with the file records to define the record length, or the data may be part of the record. Different access methods for records may be provided, for example records may be retrieved in sequential order, by key, or by record number.

Contents

Origin and characteristics

Record-oriented filesystems are frequently associated with mainframe operating systems, such as OS/360 and successors [1] and DOS/360 and successors, and midrange operating systems, such as RSX-11 and VMS. However, they originated earlier in software such as Input/Output Control System (IOCS). [2] Records, sometimes called logical records, are often written together in blocks, sometimes called physical records; this is the norm for direct access and tape devices, but files on unit record devices are normally unblocked, i.e., there is only one record per block.

Record-oriented filesystems can be supported on media other than direct access devices. A deck of punched cards can be considered a record-oriented file. A magnetic tape is an example of a medium that can support records of uniform length or variable length.

In a record file system, a programmer designs the records that may be used in a file. All application programs accessing the file, whether adding, reading, or updating records share an understanding of the design of the records. In DOS/360, OS/360, and their successors there is no restriction on the bit patterns composing the data record, i.e. there is no delimiter character; this is not always true in other software, e.g., certain record types for RCA File Control Processor (FCP) on the 301, 501, 601 and 3301.

The file comes into existence when a file create request is issued to the filesystem. Some information about the file may be included with the create request. This information may specify that the file has fixed-length records (all records are the same size) along with the size of the records. Alternatively, the specification may state that the records are of variable length, along with the maximum record length. Additional information including blocking factor, binary vs. text and the maximum number of records may be specified.

It may be permitted to read only the beginning of a record; the next sequential read returns the next collection of data (record) that the writer intended to be grouped together. It may also be permitted to write only the beginning of a record. In these cases, the record is padded with binary zeros or with spaces, depending on whether the file is recognized as a binary file or a text file.

Some operating systems require that library routines specific to the record format be included in the program. This means that a program originally expected to read a variable length record file cannot read a fixed length file. These operating systems must provide file system utilities for converting files between one format and another. This means copying the file (which requires additional storage space, time and coordination) may be necessary.

Other operating systems include various routines and associate the appropriate routine, based on the file organization, at execution time.

In either case, significant amounts of code to manage records must be provided in protected routines to ensure file integrity.

An alternative to a Record-oriented file is a stream. In a stream file, in which the file system treats files as an unstructured sequence of bytes. The applications may, but need not, impose a record structure. This approach significantly reduces the size and complexity of the library and reduces the number of utilities required to maintain files.

A common application convention for text files represented as streams is to use a new line delimiter to separate or terminate records, commonly CR, CRLF or LF. Unfortunately, the CPU time required to parse for the record delimiter is significant and the exclusion of the record delimiter pattern from the data is frequently undesirable.

An alternate convention is to include a length field in each record. The writer application is responsible for imposing any record structure and the reader application is responsible for separating out the records.

Advantages and costs

A record oriented file has several advantages. After a program writes a collection of data as a record the program that reads that record has the understanding of those data as a collection. Often a file will contain several related records in sequence; after the program reads the beginning of the sequence, the next sequential read returns the next collection of data (record) that the writer intended to be grouped together. Another advantage is that the record has a length and there is usually no restriction on the bit patterns composing the data record, i.e. there is no delimiter character.

There is usually a cost associated with record oriented files. For fixed length records, some records may have unused space, while for variable length records the delimiter or length field takes up space. Variable length blocks may have overhead due to delimiters or length fields. In addition, there is overhead imposed by the device. On a magnetic tape overhead typically takes the form of an inter-record gap. On a direct access device with fixed length sectors, there may be unused space in the last sector of a block. On a direct access device with variable length physical records, that overhead typically takes the form of metadata and inter-record gaps.

On a file composed of varying length records a maximum record length is defined to determine the size of the length metadata associated with each record.

A major advantage of record-oriented file systems is that they abstract files kept on paper in earlier times. A record might contain data associated with a particular, e.g., building, contact, employee, part, venue.

A second motivator for the idea of record orientation is that it is in some sense the more natural orientation for persistent storage on a non-volatile but slow physical storage device. Most physical storage devices can communicate only in units of a block. Significant portions of modern operating system kernels and associated device drivers are devoted to hiding the naturally structured and delimited (and in some sense a block is just a physical record) nature of physical storage devices. It is not coincidental that record oriented file systems arose earlier in the history of computing than byte-stream oriented file systems, when the capabilities for abstraction were far less.

See also

Related Research Articles

XFS is a high-performance 64-bit journaling file system created by Silicon Graphics, Inc (SGI) in 1993. It was the default file system in SGI's IRIX operating system starting with its version 5.3. XFS was ported to the Linux kernel in 2001; as of June 2014, XFS is supported by most Linux distributions; Red Hat Enterprise Linux uses it as its default file system.

A direct-access storage device (DASD) is a secondary storage device in which "each physical record has a discrete location and a unique address". The term was coined by IBM to describe devices that allowed random access to data, the main examples being drum memory and hard disk drives. Later, optical disc drives and flash memory units are also classified as DASD.

Disk formatting is the process of preparing a data storage device such as a hard disk drive, solid-state drive, floppy disk, memory card or USB flash drive for initial use. In some cases, the formatting operation may also create one or more new file systems. The first part of the formatting process that performs basic medium preparation is often referred to as "low-level formatting". Partitioning is the common term for the second part of the process, dividing the device into several sub-devices and, in some cases, writing information to the device allowing an operating system to be booted from it. The third part of the process, usually termed "high-level formatting" most often refers to the process of generating a new file system. In some operating systems all or parts of these three processes can be combined or repeated at different levels and the term "format" is understood to mean an operation in which a new disk medium is fully prepared to store files. Some formatting utilities allow distinguishing between a quick format, which does not erase all existing data and a long option that does erase all existing data.

In computing, a block, sometimes called a physical record, is a sequence of bytes or bits, usually containing some whole number of records, having a maximum length; a block size. Data thus structured are said to be blocked. The process of putting data into blocks is called blocking, while deblocking is the process of extracting data from blocks. Blocked data is normally stored in a data buffer, and read or written a whole block at a time. Blocking reduces the overhead and speeds up the handling of the data stream. For some devices, such as magnetic tape and CKD disk devices, blocking reduces the amount of external storage required for the data. Blocking is almost universally employed when storing data to 9-track magnetic tape, NAND flash memory, and rotating media such as floppy disks, hard disks, and optical discs.

Virtual Storage Access Method (VSAM) is an IBM DASD file storage access method, first used in the OS/VS1, OS/VS2 Release 1 (SVS) and Release 2 (MVS) operating systems, later used throughout the Multiple Virtual Storage (MVS) architecture and now in z/OS. Originally a record-oriented filesystem, VSAM comprises four data set organizations: key-sequenced (KSDS), relative record (RRDS), entry-sequenced (ESDS) and linear (LDS). The KSDS, RRDS and ESDS organizations contain records, while the LDS organization simply contains a sequence of pages with no intrinsic record structure, for use as a memory-mapped file.

Files-11 is the file system used in the RSX-11 and OpenVMS operating systems from Digital Equipment Corporation. It supports record-oriented I/O, remote network access, and file versioning. The original ODS-1 layer is a flat file system; the ODS-2 version is a hierarchical file system, with support for access control lists,.

<span class="mw-page-title-main">File system</span> Format or program for storing files and directories

In computing, a file system or filesystem is a method and data structure that the operating system uses to control how data is stored and retrieved. Without a file system, data placed in a storage medium would be one large body of data with no way to tell where one piece of data stopped and the next began, or where any piece of data was located when it was time to retrieve it. By separating the data into pieces and giving each piece a name, the data are easily isolated and identified. Taking its name from the way a paper-based data management system is named, each group of data is called a "file". The structure and logic rules used to manage the groups of data and their names is called a "file system."

In the context of IBM mainframe computers in the S/360 line, a data set or dataset is a computer file having a record organization. Use of this term began with, e.g., DOS/360, OS/360, and is still used by their successors, including the current z/OS. Documentation for these systems historically preferred this term rather than file.

EncFS is a Free (LGPL) FUSE-based cryptographic filesystem. It transparently encrypts files, using an arbitrary directory as storage for the encrypted files.

The following tables compare general and technical information for a number of file systems.

<span class="mw-page-title-main">Disk sector</span> Logical or physical division of storage media

In computer disk storage, a sector is a subdivision of a track on a magnetic disk or optical disc. For most disks, each sector stores a fixed amount of user-accessible data, traditionally 512 bytes for hard disk drives (HDDs) and 2048 bytes for CD-ROMs and DVD-ROMs. Newer HDDs use 4096-byte (4 KiB) sectors, which are known as the Advanced Format (AF).

Count key data (CKD) is a direct-access storage device (DASD) data recording format introduced in 1964, by IBM with its IBM System/360 and still being emulated on IBM mainframes. It is a self-defining format with each data record represented by a Count Area that identifies the record and provides the number of bytes in an optional Key Area and an optional Data Area. This is in contrast to devices using fixed sector size or a separate format track.

In IBM mainframe operating systems, such as OS/360, MVS, z/OS, a Data Control Block (DCB) is a description of a dataset in a program. A DCB is coded in Assembler programs using the DCB macro instruction. High level language programmers use library routines containing DCBs.

<span class="mw-page-title-main">OS/360 and successors</span> Operating system for IBM S/360 and later mainframes

OS/360, officially known as IBM System/360 Operating System, is a discontinued batch processing operating system developed by IBM for their then-new System/360 mainframe computer, announced in 1964; it was influenced by the earlier IBSYS/IBJOB and Input/Output Control System (IOCS) packages for the IBM 7090/7094 and even more so by the PR155 Operating System for the IBM 1410/7010 processors. It was one of the earliest operating systems to require the computer hardware to include at least one direct access storage device.

Fixed-block architecture (FBA) is an IBM term for the hard disk drive (HDD) layout in which each addressable block on the disk has the same size, utilizing 4 byte block numbers and a new set of command codes. FBA as a term was created and used by IBM for its 3310 and 3370 HDDs beginning in 1979 to distinguish such drives as IBM transitioned away from their variable record size format used on IBM's mainframe hard disk drives beginning in 1964 with its System/360.

Basic Direct Access Method, or BDAM is an access method for IBM's OS/360 and successors computer operating systems on System/360 and later mainframes. BDAM "consists of routines used in retrieving data from, and storing data onto, direct access devices." BDAM is available on OS/360, OS/VS2, MVS, z/OS, and related high-end operating systems.

Object storage is a computer data storage that manages data as objects, as opposed to other storage architectures like file systems which manages data as a file hierarchy, and block storage which manages data as blocks within sectors and tracks. Each object typically includes the data itself, a variable amount of metadata, and a globally unique identifier. Object storage can be implemented at multiple levels, including the device level, the system level, and the interface level. In each case, object storage seeks to enable capabilities not addressed by other storage architectures, like interfaces that are directly programmable by the application, a namespace that can span multiple instances of physical hardware, and data-management functions like data replication and data distribution at object-level granularity.

<span class="mw-page-title-main">Distributed Data Management Architecture</span> Open, published architecture for creating, managing and accessing data on a remote computer

Distributed Data Management Architecture (DDM) is IBM's open, published software architecture for creating, managing and accessing data on a remote computer. DDM was initially designed to support record-oriented files; it was extended to support hierarchical directories, stream-oriented files, queues, and system command processing; it was further extended to be the base of IBM's Distributed Relational Database Architecture (DRDA); and finally, it was extended to support data description and conversion. Defined in the period from 1980 to 1993, DDM specifies necessary components, messages, and protocols, all based on the principles of object-orientation. DDM is not, in itself, a piece of software; the implementation of DDM takes the form of client and server products. As an open architecture, products can implement subsets of DDM architecture and products can extend DDM to meet additional requirements. Taken together, DDM products implement a distributed file system.

ZFS is a file system with volume management capabilities. It began as part of the Sun Microsystems Solaris operating system in 2001. Large parts of Solaris – including ZFS – were published under an open source license as OpenSolaris for around 5 years from 2005, before being placed under a closed source license when Oracle Corporation acquired Sun in 2009–2010. During 2005 to 2010, the open source version of ZFS was ported to Linux, Mac OS X and FreeBSD. In 2010, the illumos project forked a recent version of OpenSolaris, to continue its development as an open source project, including ZFS. In 2013, OpenZFS was founded to coordinate the development of open source ZFS. OpenZFS maintains and manages the core ZFS code, while organizations using ZFS maintain the specific code and validation processes required for ZFS to integrate within their systems. OpenZFS is widely used in Unix-like systems.

References

  1. z/OS DFSMS Using Data Sets Version 2 Release 3 (PDF), October 2, 2018, SC23-6855-30
  2. Reference Manual, IBM 709/7090 Input/output Control System (PDF). IBM. p. 3. C28-6100-2. Retrieved Sep 12, 2020.