Terse (file format)

Last updated
TERSE file format
Filename extension .trs
Developed by IBM
Initial release1984
Type of format Data compression
Open format?Yes

TERSE is an IBM archive file format that supports lossless compression. A TERSE file may contain a sequential data set, a partitioned data set (PDS), partitioned data set extended (PDSE), or a large format dataset (DSNTYPE=LARGE). Any record format (RECFM) is allowed as long as the record length is less than 32 K (64 K for RECFM=VBS). Records may contain printer control characters. [1]

Contents

Terse files are compressed using a modification of Ziv, Lempel compression algorithm developed by Victor S. Miller and Mark Wegman at the Thomas J. Watson Research Center in Yorktown Heights, New York. [2] [3]

The Terse algorithm was proprietary to IBM; however, IBM has released an open source Java decompressor under the Apache 2 license. [4] The compression/decompression program (called terse and unterse)—AMATERSE or TRSMAIN—is available from IBM for z/OS; the z/VM equivalents are the TERSE and DETERSE commands, for sequential datasets only. Versions for PC DOS, OS/2, AIX, Windows (2000,XP,2003), Linux, and Mac OS/X are available online. [5]

AMATERSE

The following JCL can be used to invoke AMATERSE on z/OS (TRSMAIN uses INFILE and OUTFILE instead of SYSUT1 and SYSUT2): [6] [7]

//jobnameJOB...//stepnameEXECPGM=AMATERSE,PARM=ppppp//SYSPRINT DD SYSOUT=*//SYSUT1   DD DISP=SHR,DSN=input.dataset//SYSUT2   DD DISP=(NEW,CATLG),DCB=ddd,DSN=output.dataset,//SPACE=space_parameters//SYSUT3   DD DISP=(NEW,DELETE),SPACE=space_parametersOptional temporary dataset

Uses

Terse can be used as a general-purpose compression/decompression tool. IBM also distributes downloadable Program temporary fixs (PTFs) as tersed datasets. Terse is also used by IBM customers to package diagnostic information such as z/OS dumps and traces, for transmission to IBM.

Related Research Articles

In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression. Lossy compression reduces bits by removing unnecessary or less important information. Typically, a device that performs data compression is referred to as an encoder, and one that performs the reversal of the process (decompression) as a decoder.

gzip GNU file compression/decompression tool

gzip is a file format and a software application used for file compression and decompression. The program was created by Jean-loup Gailly and Mark Adler as a free software replacement for the compress program used in early Unix systems, and intended for use by GNU. Version 0.1 was first publicly released on 31 October 1992, and version 1.0 followed in February 1993.

Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits statistical redundancy. By contrast, lossy compression permits reconstruction only of an approximation of the original data, though usually with greatly improved compression rates.

<span class="mw-page-title-main">MVS</span> Operating system for IBM mainframes

Multiple Virtual Storage, more commonly called MVS, is the most commonly used operating system on the System/370, System/390 and IBM Z IBM mainframe computers. IBM developed MVS, along with OS/VS1 and SVS, as a successor to OS/360. It is unrelated to IBM's other mainframe operating system lines, e.g., VSE, VM, TPF.

zlib DEFLATE codec library

zlib is a software library used for data compression as well as a data format. zlib was written by Jean-loup Gailly and Mark Adler and is an abstraction of the DEFLATE compression algorithm used in their gzip file compression program. zlib is also a crucial component of many software platforms, including Linux, macOS, and iOS. It has also been used in gaming consoles such as the PlayStation 4, PlayStation 3, Wii U, Wii, Xbox One and Xbox 360.

z/OS 64-bit operating system for IBM mainframes

z/OS is a 64-bit operating system for IBM z/Architecture mainframes, introduced by IBM in October 2000. It derives from and is the successor to OS/390, which in turn was preceded by a string of MVS versions. Like OS/390, z/OS combines a number of formerly separate, related products, some of which are still optional. z/OS has the attributes of modern operating systems, but also retains much of the older functionality originated in the 1960s and still in regular use—z/OS is designed for backward compatibility.

Lempel–Ziv–Welch (LZW) is a universal lossless data compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch. It was published by Welch in 1984 as an improved implementation of the LZ78 algorithm published by Lempel and Ziv in 1978. The algorithm is simple to implement and has the potential for very high throughput in hardware implementations. It is the algorithm of the Unix file compression utility compress and is used in the GIF image format.

LZ77 and LZ78 are the two lossless data compression algorithms published in papers by Abraham Lempel and Jacob Ziv in 1977 and 1978. They are also known as LZ1 and LZ2 respectively. These two algorithms form the basis for many variations including LZW, LZSS, LZMA and others. Besides their academic influence, these algorithms formed the basis of several ubiquitous compression schemes, including GIF and the DEFLATE algorithm used in PNG and ZIP.

Job Control Language (JCL) is a name for scripting languages used on IBM mainframe operating systems to instruct the system on how to run a batch job or start a subsystem. The purpose of JCL is to say which programs to run, using which files or devices for input or output, and at times to also indicate under what conditions to skip a step. Parameters in the JCL can also provide accounting information for tracking the resources used by a job as well as which machine the job should run on.

In the context of IBM mainframe computers in the S/360 line, a data set or dataset is a computer file having a record organization. Use of this term began with, e.g., DOS/360, OS/360, and is still used by their successors, including the current z/OS. Documentation for these systems historically preferred this term rather than file.

IEFBR14 is an IBM mainframe utility program. It runs in all IBM mainframe environments derived from OS/360, including z/OS. It is a placeholder that returns the exit status zero, similar to the true command on UNIX-like systems.

Lempel–Ziv–Oberhumer (LZO) is a lossless data compression algorithm that is focused on decompression speed.

This article discusses support programs included in or available for OS/360 and successors. IBM categorizes some of these programs as utilities and others as service aids; the boundaries are not always consistent or obvious. Many, but not all, of these programs match the types in utility software.

<span class="mw-page-title-main">Abraham Lempel</span> Israeli computer scientist (1936–2023)

Abraham Lempel was an Israeli computer scientist and one of the fathers of the LZ family of lossless data compression algorithms.

Lempel–Ziv–Storer–Szymanski (LZSS) is a lossless data compression algorithm, a derivative of LZ77, that was created in 1982 by James A. Storer and Thomas Szymanski. LZSS was described in article "Data compression via textual substitution" published in Journal of the ACM.

Grammar-based codes or Grammar-based compression are compression algorithms based on the idea of constructing a context-free grammar (CFG) for the string to be compressed. Examples include universal lossless data compression algorithms. To compress a data sequence , a grammar-based code transforms into a context-free grammar . The problem of finding a smallest grammar for an input sequence is known to be NP-hard, so many grammar-transform algorithms are proposed from theoretical and practical viewpoints. Generally, the produced grammar is further compressed by statistical encoders like arithmetic coding.

In computing, a bitmap is a mapping from some domain to bits. It is also called a bit array or bitmap index.

XZ Utils is a set of free software command-line lossless data compressors, including the programs lzma and xz, for Unix-like operating systems and, from version 5.0 onwards, Microsoft Windows. For compression/decompression the Lempel–Ziv–Markov chain algorithm (LZMA) is used. XZ Utils started as a Unix port of Igor Pavlov's LZMA-SDK that has been adapted to fit seamlessly into Unix environments and their usual structure and behavior.

LZFSE is an open source lossless data compression algorithm created by Apple Inc. It was released with a simpler algorithm called LZVN.

842, 8-4-2, or EFT is a data compression algorithm. It is a variation on Lempel–Ziv compression with a limited dictionary length. With typical data, 842 gives 80 to 90 percent of the compression of LZ77 with much faster throughput and less memory use. Hardware implementations also provide minimal use of energy and minimal chip area.

References

  1. IBM Corporation (25 August 2016). "AMATERSE: Pack and unpack a data set". IBM Knowledge Center. Retrieved Sep 4, 2016.
  2. Miller, Victor S.; Wegman, Mark N. (1988). "Variations on a theme by Ziv and Lempel (Data compression)". IEEE International Conference on Communications, - Spanning the Universe. IEEE International Conference on Communications '88: Digital Technology - Spanning the Universe. pp. 390–394. doi:10.1109/ICC.1988.13597. S2CID   56571935 . Retrieved Sep 4, 2016.
  3. Lalonde, Bill. "Terse/MVS". Big Iron. Retrieved Sep 5, 2016.
  4. "openmainframeproject/tersedecompress". GitHub. Retrieved 2020-11-28.
  5. "Hercules-390". Discussion group for users of the Hercules ESA/390 mainframe emulator. Retrieved Sep 5, 2016.
  6. IBM Corporation (25 August 2016). "Specifying the JCL statements for AMATERSE". IBM Knowledge Center. Retrieved Sep 4, 2016.
  7. ppppp is PACK (compress), SPACK (compress, slower and compresses better), or UNPACK (uncompress)