7z

Last updated

7z file format
7zip archive icon.svg
Filename extension
.7z
Internet media type
application/x-7z-compressed
Uniform Type Identifier (UTI) org.7-zip.7-zip-archive
Magic number '7', 'z', 0xBC, 0xAF, 0x27, 0x1C
Size limitation264 bytes (roughly 18 exabytes)
Developed by Igor Pavlov [1]
Initial release1999;25 years ago (1999) [2]
Type of format Data compression
Open format?Yes: GNU Lesser General Public License / Public domain
Website 7-zip.org

7z is a compressed archive file format that supports several different data compression, encryption and pre-processing algorithms. The 7z format initially appeared as implemented by the 7-Zip archiver. The 7-Zip program is publicly available under the terms of the GNU Lesser General Public License. The LZMA SDK 4.62 was placed in the public domain in December 2008. The latest stable version of 7-Zip and LZMA SDK is version 23.01. [2]

Contents

The official, informal 7z file format specification is distributed with 7-Zip's source code since 2015. The specification can be found in plain text format in the 'doc' sub-directory of the source code distribution. [3] There have been additional third-party attempts at writing more concrete documentation based on the released code. [4]

Features and enhancements

The 7z format provides the following main features:

The format's open architecture allows additional future compression methods to be added to the standard.

Compression methods

The following compression methods are currently defined:

A suite of recompression tools called AdvanceCOMP contains a copy of the DEFLATE encoder from the 7-Zip implementation; these utilities can often be used to further compress the size of existing gzip, ZIP, PNG, or MNG files.

Pre-processing filters

The LZMA SDK comes with the BCJ and BCJ2 preprocessors included, so that later stages are able to achieve greater compression: For x86, ARM, PowerPC (PPC), IA-64 Itanium, and ARM Thumb processors, jump targets are 'normalized' [5] before compression by changing relative position into absolute values. For x86, this means that near jumps, calls and conditional jumps (but not short jumps and conditional jumps) are converted from the machine language "jump 1655 bytes backwards" style notation to normalized "jump to address 5554" style notation; all jumps to 5554, perhaps a common subroutine, are thus encoded identically, making them more compressible.

Similar executable pre-processing technology is included in other software; the RAR compressor features displacement compression for 32-bit x86 executables and IA-64 executables, and the UPX runtime executable file compressor includes support for working with 16-bit values within DOS binary files.

Encryption

The 7z format supports encryption with the AES algorithm with a 256-bit key. The key is generated from a user-supplied passphrase using an algorithm based on the SHA-256 hash function. The SHA-256 is executed 219 (524288) times, [6] which causes a significant delay on slow PCs before compression or extraction starts. This technique is called key stretching and is used to make a brute-force search for the passphrase more difficult. Current GPU-based, and custom hardware attacks limit the effectiveness of this particular method of key stretching, [7] so it is still important to choose a strong password. The 7z format provides the option to encrypt the filenames of a 7z archive.

Limitations

The 7z format does not store filesystem permissions (such as UNIX owner/group permissions or NTFS ACLs), and hence can be inappropriate for backup/archival purposes. A workaround on UNIX-like systems for this is to convert data to a tar bitstream before compressing with 7z. But GNU tar (common in many UNIX environments) can also compress with the LZMA2 algorithm ("xz") natively, without the use of 7z, using the "-J" switch. The resulting file extension is ".tar.xz" or ".txz" and not ".tar.7z". This method of compression has been adopted with many distributions for packaging, such as Arch, Debian (deb), Fedora (rpm) and Slackware. (The older "lzma" format is less efficient.) [8] On the other hand, it is important to note, that tar does not save the filesystem encoding, which means that tar compressed filenames can become unreadable if decompressed on a different computer.

The 7z format does not allow extraction of some "broken files"—that is (for example) if one has the first segment of a series of 7z files, 7z cannot give the start of the files within the archive—it must wait until all segments are downloaded. The 7z format also lacks recovery records, making it vulnerable to data degradation unless used in conjunction with external solutions, like parchives, or within filesystems with robust error-correction. By way of comparison, zip files also lack a recovery feature while the rar format has one.

See also

Related Research Articles

Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits statistical redundancy. By contrast, lossy compression permits reconstruction only of an approximation of the original data, though usually with greatly improved compression rates.

bzip2 File compression software

bzip2 is a free and open-source file compression program that uses the Burrows–Wheeler algorithm. It only compresses single files and is not a file archiver. It relies on separate external utilities for tasks such as handling multiple files, encryption, and archive-splitting.

In computing, Deflate is a lossless data compression file format that uses a combination of LZ77 and Huffman coding. It was designed by Phil Katz, for version 2 of his PKZIP archiving tool. Deflate was later specified in RFC 1951 (1996).

ZIP is an archive file format that supports lossless data compression. A ZIP file may contain one or more files or directories that may have been compressed. The ZIP file format permits a number of compression algorithms, though DEFLATE is the most common. This format was originally created in 1989 and was first implemented in PKWARE, Inc.'s PKZIP utility, as a replacement for the previous ARC compression format by Thom Henderson. The ZIP format was then quickly supported by many software utilities other than PKZIP. Microsoft has included built-in ZIP support in versions of Microsoft Windows since 1998 via the "Plus! 98" addon for Windows 98. Native support was added as of the year 2000 in Windows ME. Apple has included built-in ZIP support in Mac OS X 10.3 and later. Most free operating systems have built in support for ZIP in similar manners to Windows and macOS.

<span class="mw-page-title-main">7-Zip</span> Open-source file archiver

7-Zip is a free and open-source file archiver, a utility used to place groups of files within compressed containers known as "archives". It is developed by Igor Pavlov and was first released in 1999. 7-Zip has its own archive format called 7z, but can read and write several others.

The Lempel–Ziv–Markov chain algorithm (LZMA) is an algorithm used to perform lossless data compression. It has been under development since either 1996 or 1998 by Igor Pavlov and was first used in the 7z format of the 7-Zip archiver. This algorithm uses a dictionary compression scheme somewhat similar to the LZ77 algorithm published by Abraham Lempel and Jacob Ziv in 1977 and features a high compression ratio and a variable compression-dictionary size, while still maintaining decompression speed similar to other commonly used compression algorithms.

StuffIt is a discontinued family of computer software utilities for archiving and compressing files. Originally produced for Macintosh, versions for Microsoft Windows, Linux (x86), and Sun Solaris were later created. The proprietary compression format used by the StuffIt utilities is also termed StuffIt.

<span class="mw-page-title-main">StuffIt Expander</span> File decompressor software utility

StuffIt Expander is a proprietary, freeware, closed source, decompression software utility developed by Allume Systems. It runs on the classic Mac OS, macOS, and Microsoft Windows. Prior to 2011, a Linux version had also been available for download.

rzip is a huge-scale data compression computer program designed around initial LZ77-style string matching on a 900 MB dictionary window, followed by bzip2-based Burrows–Wheeler transform and entropy coding (Huffman) on 900 kB output chunks.

<span class="mw-page-title-main">UPX</span>

UPX is a free and open source executable packer supporting a number of file formats from different operating systems.

<span class="mw-page-title-main">HTTP compression</span> Capability that can be built into web servers and web clients

HTTP compression is a capability that can be built into web servers and web clients to improve transfer speed and bandwidth utilization.

<span class="mw-page-title-main">Self-extracting archive</span> Computer executable program

A self-extracting archive is a computer executable program which combines compressed data in an archive file with machine-executable code to extract the information. Run on a compatible operating system, there is no need for a suitable extractor in the target computer to extract the data. The executable part of the file is known as a decompressor stub.

<span class="mw-page-title-main">PeaZip</span> File archive computer program

PeaZip is a free and open-source file manager and file archiver for Microsoft Windows, ReactOS, Linux, MacOS and BSD by Giorgio Tani. It supports its native PEA archive format and other mainstream formats, with special focus on handling open formats. Version 9.4.0 supported 234 file extensions.

XZ Utils is a set of free software command-line lossless data compressors, including the programs lzma and xz, for Unix-like operating systems and, from version 5.0 onwards, Microsoft Windows. For compression/decompression the Lempel–Ziv–Markov chain algorithm (LZMA) is used. XZ Utils started as a Unix port of Igor Pavlov's LZMA-SDK that has been adapted to fit seamlessly into Unix environments and their usual structure and behavior.

FreeArc is a free and open-source high-performance file archiver developed by Bulat Ziganshin. The project is presumably discontinued, since no information has been released by the developers since 2016 and the official website is down.

lzip Data compression utility

lzip is a free, command-line tool for the compression of data; it employs the Lempel–Ziv–Markov chain algorithm (LZMA) with a user interface that is familiar to users of usual Unix compression tools, such as gzip and bzip2.

Brotli is a lossless data compression algorithm developed by Google. It uses a combination of the general-purpose LZ77 lossless compression algorithm, Huffman coding and 2nd-order context modelling. Brotli is primarily used by web servers and content delivery networks to compress HTTP content, making internet websites load faster. A successor to gzip, it is supported by all major web browsers and has become increasingly popular, as it provides better compression than gzip.

Zstandard is a lossless data compression algorithm developed by Yann Collet at Facebook. Zstd is the corresponding reference implementation in C, released as open-source software on 31 August 2016.

In data compression, BCJ, short for Branch/Call/Jump, refers to a technique that improves the compression of machine code by replacing relative branch addresses with absolute ones. This allows a Lempel–Ziv compressor to identify duplicate targets and more efficiently encode them. On decompression, the inverse filter restores the original encoding. Different BCJ filters are used for different instruction sets, as each use different opcodes for branching.

References

  1. "A Few Questions for Igor Pavlov". Dr. Dobb's Data Compression Newsletter. 30 April 2003. Retrieved 26 December 2009.
  2. 1 2 History of 7-zip changes
  3. LZMA SDK, "DOC" directory, 7zFormat.txt
  4. ".7z format specification — py7zr – 7-zip archive library". py7zr.readthedocs.io.
  5. 1 2 Collin, Lasse. "lzma_.lzma". liblzma bindings. Archived from the original on 8 February 2010. Retrieved 3 January 2010. Compared to LZMA1, LZMA2 adds support for LZMA_SYNC_FLUSH, uncompressed chunks (smaller expansion when trying to compress uncompressible data), possibility to change lc/lp/pb in the middle of encoding, and some other internal improvements.
  6. 7-zip source code
  7. Colin Percival. scrypt. As presented in "Stronger Key Derivation via Sequential Memory-Hard Functions". presented at BSDCan'09, May 2009.
  8. "GNU tar 1.34: 8.1 Using Less Space through Compression".

Further reading