Filename extension | .7z |
---|---|
Internet media type | application/x-7z-compressed |
Uniform Type Identifier (UTI) | org.7-zip.7-zip-archive |
Magic number | '7', 'z', 0xBC, 0xAF, 0x27, 0x1C |
Size limitation | 264 bytes (roughly 18 exabytes) |
Developed by | Igor Pavlov [1] |
Initial release | 1999[2] |
Type of format | Data compression |
Open format? | Yes: GNU Lesser General Public License / Public domain |
Website | 7-zip |
7z is a compressed archive file format that supports several different data compression, encryption and pre-processing algorithms. The 7z format initially appeared as implemented by the 7-Zip archiver. The 7-Zip program is publicly available under the terms of the GNU Lesser General Public License. The LZMA SDK 4.62 was placed in the public domain in December 2008. The latest stable version of 7-Zip and LZMA SDK is version 24.09. [2]
The 7z file format specification is distributed with 7-Zip's source code since 2015. The specification can be found in plain text format in the 'doc' sub-directory of the source code distribution. [3]
The 7z format provides the following main features:
The format's open architecture allows additional future compression methods to be added to the standard.
The following compression methods are currently defined:
A suite of recompression tools called AdvanceCOMP contains a copy of the DEFLATE encoder from the 7-Zip implementation; these utilities can often be used to further compress the size of existing gzip, ZIP, PNG, or MNG files.
The LZMA SDK comes with the BCJ and BCJ2 preprocessors included, so that later stages are able to achieve greater compression: For x86, ARM, PowerPC (PPC), IA-64 Itanium, and ARM Thumb processors, jump targets are 'normalized' [4] before compression by changing relative position into absolute values. For x86, this means that near jumps, calls and conditional jumps (but not short jumps and conditional jumps) are converted from the machine language "jump 1655 bytes backwards" style notation to normalized "jump to address 5554" style notation; all jumps to 5554, perhaps a common subroutine, are thus encoded identically, making them more compressible.
Similar executable pre-processing technology is included in other software; the RAR compressor features displacement compression for 32-bit x86 executables and IA-64 executables, and the UPX runtime executable file compressor includes support for working with 16-bit values within DOS binary files.
The 7z format supports encryption with the AES algorithm with a 256-bit key. The key is generated from a user-supplied passphrase using an algorithm based on the SHA-256 hash function. The SHA-256 is executed 219 (524288) times, [5] which causes a significant delay on slow PCs before compression or extraction starts. This technique is called key stretching and is used to make a brute-force search for the passphrase more difficult. Current GPU-based, and custom hardware attacks limit the effectiveness of this particular method of key stretching, [6] so it is still important to choose a strong password. The 7z format provides the option to encrypt the filenames of a 7z archive.
The 7z format does not store filesystem permissions (such as UNIX owner/group permissions or NTFS ACLs), and hence can be inappropriate for backup/archival purposes. A workaround on UNIX-like systems for this is to convert data to a tar bitstream before compressing with 7z. But GNU tar (common in many UNIX environments) can also compress with the LZMA2 algorithm ("xz") natively, without the use of 7z, using the "-J" switch. The resulting file extension is ".tar.xz" or ".txz" and not ".tar.7z". This method of compression has been adopted with many distributions for packaging, such as Arch, Debian (deb), Fedora (rpm) and Slackware. (The older "lzma" format is less efficient.) [7] On the other hand, it is important to note, that tar does not save the filesystem encoding, which means that tar compressed filenames can become unreadable if decompressed on a different computer.
The 7z format does not allow extraction of some "broken files"—that is (for example) if one has the first segment of a series of 7z files, 7z cannot give the start of the files within the archive—it must wait until all segments are downloaded. The 7z format also lacks recovery records, making it vulnerable to data degradation unless used in conjunction with external solutions, like parchives, or within filesystems with robust error-correction. By way of comparison, zip files also lack a recovery feature while the rar format has one.
Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits statistical redundancy. By contrast, lossy compression permits reconstruction only of an approximation of the original data, though usually with greatly improved compression rates.
bzip2 is a free and open-source file compression program that uses the Burrows–Wheeler algorithm. It only compresses single files and is not a file archiver. It relies on separate external utilities for tasks such as handling multiple files, encryption, and archive-splitting.
In computing, Deflate is a lossless data compression file format that uses a combination of LZ77 and Huffman coding. It was designed by Phil Katz, for version 2 of his PKZIP archiving tool. Deflate was later specified in RFC 1951 (1996).
ZIP is an archive file format that supports lossless data compression. A ZIP file may contain one or more files or directories that may have been compressed. The ZIP file format permits a number of compression algorithms, though DEFLATE is the most common. This format was originally created in 1989 and was first implemented in PKWARE, Inc.'s PKZIP utility, as a replacement for the previous ARC compression format by Thom Henderson. The ZIP format was then quickly supported by many software utilities other than PKZIP. Microsoft has included built-in ZIP support in versions of Microsoft Windows since 1998 via the "Plus! 98" addon for Windows 98. Native support was added as of the year 2000 in Windows ME. Apple has included built-in ZIP support in Mac OS X 10.3 and later. Most free operating systems have built in support for ZIP in similar manners to Windows and macOS.
compress is a Unix shell compression program based on the LZW compression algorithm. Compared to gzip's fastest setting, compress is slightly slower at compression, slightly faster at decompression, and has a significantly lower compression ratio. 1.8 MiB of memory is used to compress the Hutter Prize data, slightly more than gzip's slowest setting.
7-Zip is a free and open-source file archiver, a utility used to place groups of files within compressed containers known as "archives". It is developed by Igor Pavlov and was first released in 1999. 7-Zip has its own archive format called 7z introduced in 2001, but can read and write several others.
The Lempel–Ziv–Markov chain algorithm (LZMA) is an algorithm used to perform lossless data compression. It has been used in the 7z format of the 7-Zip archiver since 2001. This algorithm uses a dictionary compression scheme somewhat similar to the LZ77 algorithm published by Abraham Lempel and Jacob Ziv in 1977 and features a high compression ratio and a variable compression-dictionary size, while still maintaining decompression speed similar to other commonly used compression algorithms.
StuffIt is a discontinued family of computer software utilities for archiving and compressing files. Originally produced for Macintosh, versions for Microsoft Windows, Linux (x86), and Sun Solaris were later created. The proprietary compression format used by the StuffIt utilities is also termed StuffIt.
StuffIt Expander is a proprietary, freeware, closed source, decompression software utility developed by Allume Systems. It runs on the classic Mac OS, macOS, and Microsoft Windows. Prior to 2011, a Linux version had also been available for download.
WinRAR is a trialware file archiver utility, developed by Eugene Roshal of win.rar GmbH. It can create and view archives in RAR or ZIP file formats, and unpack numerous archive file formats. To enable the user to test the integrity of archives, WinRAR embeds CRC32 or BLAKE2 checksums for each file in each archive. WinRAR supports creating encrypted, multi-part and self-extracting archives.
rzip is a huge-scale data compression computer program designed around initial LZ77-style string matching on a 900 MB dictionary window, followed by bzip2-based Burrows–Wheeler transform and entropy coding (Huffman) on 900 kB output chunks.
UPX is a free and open source executable packer supporting a number of file formats from different operating systems.
HTTP compression is a capability that can be built into web servers and web clients to improve transfer speed and bandwidth utilization.
A self-extracting archive is a computer executable program which combines compressed data in an archive file with machine-executable code to extract the information. Running on a compatible operating system, it does not need a suitable extractor in the target computer to extract the data. The executable part of the file is known as a decompressor stub.
PeaZip is a free and open-source file manager and file archiver for Microsoft Windows, ReactOS, Linux, MacOS and BSD by Giorgio Tani. It supports its native PEA archive format and other mainstream formats, with special focus on handling open formats. Version 9.4.0 supported 234 file extensions.
XZ Utils is a set of free software command-line lossless data compressors, including the programs lzma and xz, for Unix-like operating systems and, from version 5.0 onwards, Microsoft Windows. For compression/decompression the Lempel–Ziv–Markov chain algorithm (LZMA) is used. XZ Utils started as a Unix port of Igor Pavlov's LZMA-SDK that has been adapted to fit seamlessly into Unix environments and their usual structure and behavior.
FreeArc is a free and open-source high-performance file archiver developed by Bulat Ziganshin. The project is presumably discontinued, since no information has been released by the developers since 2016 and the official website is down.
lzip is a free, command-line tool for the compression of data; it employs the Lempel–Ziv–Markov chain algorithm (LZMA) with a user interface that is familiar to users of usual Unix compression tools, such as gzip and bzip2.
In data compression, BCJ, short for branch/call/jump, refers to a technique that improves the compression of machine code by replacing relative branch addresses with absolute ones. This allows a Lempel–Ziv compressor to identify duplicate targets and more efficiently encode them. On decompression, the inverse filter restores the original encoding. Different BCJ filters are used for different instruction sets, as each use different opcodes for branching.
Compared to LZMA1, LZMA2 adds support for LZMA_SYNC_FLUSH, uncompressed chunks (smaller expansion when trying to compress uncompressible data), possibility to change lc/lp/pb in the middle of encoding, and some other internal improvements.