Solid compression

Last updated
A tar.gz is created by joining the files in tar and then compressing with gzip. Targzip.svg
A tar.gz is created by joining the files in tar and then compressing with gzip.

In computing, solid compression is a method for data compression of multiple files, wherein all the uncompressed files are concatenated and treated as a single data block. Such an archive is called a solid archive. It is used natively in the 7z [1] and RAR [2] formats, as well as indirectly in tar-based formats such as .tar.gz and .tar.bz2 . By contrast, the ZIP format is not solid because it stores separately compressed files (though solid compression can be emulated for small archives by combining the files into an uncompressed archive file and then compressing that archive file inside a second compressed ZIP file). [3] [4]

Contents

Explanation

Compressed file formats often feature both compression (storing the data in a small space) and archiving (storing multiple files and metadata in a single file). One can combine these in two natural ways:

The order matters (these operations do not commute), and the latter is solid compression.

In Unix, compression and archiving are traditionally separate operations, which allows one to understand this distinction:

A rough graphical representation

In this example, three files each have a common part with the same information, a unique part with information not in the other files, and an "air" part with low-entropy and accordingly well-compressible information.

original file A

commonuniqueair

original file B

commonuniqueair

original file C

commonuniqueair

non solid archive:

commonA commonB commonC 

solid archive:

commonABC 

Rationale

Benefits

Solid compression allows for much better compression rates when all the files are similar, which is often the case if they are of the same file format. It can also be efficient when archiving a large number of small files.

Costs

On the other hand, getting a single file out of a solid archive requires processing all the files before it, so modifying solid archives could be slow and inconvenient. On newer formats such as 7-zip, there is a solid block size option that allows for the concatenated data block to be split into individually-compressed smaller blocks, so that only a limited amount of data in the block must be processed in order to extract one file. Parameters control the maximum solid block window size, the number of files in a block, and whether blocks are separated by file extension. [5]

Additionally, if the archive becomes even slightly damaged, some of the data (sometimes even all data) after the damaged part in the block can be unusable (depending on the compression and archiving format), whereas in a non-solid archive format, usually only one file is unusable and the subsequent files can usually still be extracted.

Related Research Articles

A file archiver is a computer program that combines a number of files together into one archive file, or a series of archive files, for easier transportation or storage. File archivers may employ lossless data compression in their archive formats to reduce the size of the archive.

gzip GNU file compression/decompression tool

gzip is a file format and a software application used for file compression and decompression. The program was created by Jean-loup Gailly and Mark Adler as a free software replacement for the compress program used in early Unix systems, and intended for use by GNU. Version 0.1 was first publicly released on 31 October 1992, and version 1.0 followed in February 1993.

In computing, tar is a computer software utility for collecting many files into one archive file, often referred to as a tarball, for distribution or backup purposes. The name is derived from "tape archive", as it was originally developed to write data to sequential I/O devices with no file system of their own, such as devices that use magnetic tape. The archive data sets created by tar contain various file system parameters, such as name, timestamps, ownership, file-access permissions, and directory organization. POSIX abandoned tar in favor of pax, yet tar sees continued widespread use.

ZIP is an archive file format that supports lossless data compression. A ZIP file may contain one or more files or directories that may have been compressed. The ZIP file format permits a number of compression algorithms, though DEFLATE is the most common. This format was originally created in 1989 and was first implemented in PKWARE, Inc.'s PKZIP utility, as a replacement for the previous ARC compression format by Thom Henderson. The ZIP format was then quickly supported by many software utilities other than PKZIP. Microsoft has included built-in ZIP support in versions of Microsoft Windows since 1998 via the "Plus! 98" addon for Windows 98. Native support was added as of the year 2000 in Windows ME. Apple has included built-in ZIP support in Mac OS X 10.3 and later. Most free operating systems have built in support for ZIP in similar manners to Windows and macOS.

RAR is a proprietary archive file format that supports data compression, error correction and file spanning. It was developed in 1993 by Russian software engineer Eugene Roshal and the software is licensed by win.rar GmbH. The name RAR stands for Roshal Archive.

<span class="mw-page-title-main">7-Zip</span> Open-source file archiver

7-Zip is a free and open-source file archiver, a utility used to place groups of files within compressed containers known as "archives". It is developed by Igor Pavlov and was first released in 1999. 7-Zip has its own archive format called 7z, but can read and write several others.

7z is a compressed archive file format that supports several different data compression, encryption and pre-processing algorithms. The 7z format initially appeared as implemented by the 7-Zip archiver. The 7-Zip program is publicly available under the terms of the GNU Lesser General Public License. The LZMA SDK 4.62 was placed in the public domain in December 2008. The latest stable version of 7-Zip and LZMA SDK is version 22.01.

<span class="mw-page-title-main">StuffIt Expander</span> File decompressor software utility

StuffIt Expander is a proprietary, freeware, closed source, decompression software utility developed by Allume Systems. It runs on the classic Mac OS, macOS, and Microsoft Windows. Prior to 2011, a Linux version had also been available for download.

<span class="mw-page-title-main">WinRAR</span> File archiver

WinRAR is a trialware file archiver utility for Windows, developed by Eugene Roshal of win.rar GmbH. It can create and view archives in RAR or ZIP file formats, and unpack numerous archive file formats. To enable the user to test the integrity of archives, WinRAR embeds CRC32 or BLAKE2 checksums for each file in each archive. WinRAR supports creating encrypted, multi-part and self-extracting archives.

The following tables compare general and technical information for a number of file archivers. Please see the individual products' articles for further information. They are neither all-inclusive nor are some entries necessarily up to date. Unless otherwise specified in the footnotes section, comparisons are based on the stable versions—without add-ons, extensions or external programs.

<span class="mw-page-title-main">Ark (software)</span> Archiving tool for KDE desktop environment

Ark is a file archiver and compressor developed by KDE and included in the KDE Applications software bundle. It supports various common archive and compression formats including zip, 7z, rar, lha and tar.

In computing, an archive file is a computer file that is composed of one or more files along with metadata. Many archive formats also support compression of member files. Archive files are used to collect multiple data files together into a single file for easier portability and storage, or simply to compress files to use less storage space. Archive files often store directory structures, error detection and correction information, comments, and some use built-in encryption.

<span class="mw-page-title-main">ZipGenius</span> Freeware file archiver

ZipGenius is a freeware file archiver developed by The ZipGenius Team for Microsoft Windows. It is capable of handling nearly two dozen file formats, including all the most common formats, as well as password-protect archives and work directly with CD-R/RW drives. It is presented in two editions: standard and suite. While the suite edition includes optional modules of the ZipGenius project, the standard setup package simply includes the main ZipGenius application.

<span class="mw-page-title-main">Self-extracting archive</span>

A self-extracting archive is a computer executable program which contains compressed data in an archive file combined with machine-executable program instructions to extract this information on a compatible operating system and without the necessity for a suitable extractor to be already installed on the target computer. The executable part of the file is known as a decompressor stub.

<span class="mw-page-title-main">PeaZip</span> File archive computer program

PeaZip is a free and open-source file manager and file archiver for Microsoft Windows, ReactOS, Linux, MacOS and BSD by Giorgio Tani. It supports its native PEA archive format and other mainstream formats, with special focus on handling open formats. Version 9.4.0 supported 234 file extensions.

<span class="mw-page-title-main">Zipeg</span> Open source free software

Zipeg is an open source free software that extracts files from a wide range of compressed archive formats. Zipeg works under Mac OS X and Windows. It is best known for its file preview ability. It is incapable of compressing files, although it is able to extract compressed ones. Zipeg is built on top of the 7-Zip backend. Its UI is implemented in Java and is open source.

FreeArc is a free and open-source high-performance file archiver developed by Bulat Ziganshin. The project is presumably discontinued, since no information has been released by the developers since 2016 and the official website is down.

lzip Data compression utility

lzip is a free, command-line tool for the compression of data; it employs the Lempel–Ziv–Markov chain algorithm (LZMA) with a user interface that is familiar to users of usual Unix compression tools, such as gzip and bzip2.

<span class="mw-page-title-main">BetterZip</span>

BetterZip is a trialware file archiver developed by Robert Rezabek, and first released in May 2006.It is developed solely for the macOS platform. Unlike the built-in Archive Utility from Apple it includes the ability to extract and compress in many archive formats, as well as the ability to view an archive and selectively extract files without automatically extracting the entire contents.

References

  1. "7za man page". Archived from the original on 2010-01-10. Retrieved 2010-01-24. -ms=on[:] solid archive on
  2. "RAR Frequently Asked Questions (FAQ)". 1994-08-15. Retrieved 2010-01-24.
  3. "CAFxXcrossway - Emulate solid archiving with ZIP". cafxx.strayorange.com.
  4. "ZIP and solid archives". PC Review. 2006-03-15.
  5. "HISTORY of the 7-Zip". www.7-zip.org. Retrieved 2019-09-09.