Executable compression

Last updated

Executable compression is any means of compressing an executable file and combining the compressed data with decompression code into a single executable. When this compressed executable is executed, the decompression code recreates the original code from the compressed code before executing it. In most cases this happens transparently so the compressed executable can be used in exactly the same way as the original. Executable compressors are often referred to as executable packers, runtime packers, software packers, software protectors, or even "polymorphic packers" and "obfuscating tools".

Contents

A compressed executable can be considered a self-extracting archive, where a compressed executable is packaged along with the relevant decompression code in an executable file. Some compressed executables can be decompressed to reconstruct the original program file without being directly executed. Two programs that can be used to do this are CUP386 and UNP.[ citation needed ]

Most compressed executables decompress the original code in memory and most require slightly more memory to run (because they need to store the decompressor code, the compressed data and the decompressed code). Moreover, some compressed executables have additional requirements, such as those that write the decompressed executable to the file system before executing it.

Executable compression is not limited to binary executables, but can also be applied to scripts, such as JavaScript. Because most scripting languages are designed to work on human-readable code, which has a high redundancy, compression can be very effective and as simple as replacing long names used to identify variables and functions with shorter versions and/or removing white-space.

Advantages and disadvantages

Software distributors use executable compression for a variety of reasons, primarily to reduce the secondary storage requirements of their software; as executable compressors are specifically designed to compress executable code, they often achieve better compression ratio than standard data compression facilities such as gzip, zip or bzip2 [ citation needed ]. This allows software distributors to stay within the constraints of their chosen distribution media (such as CD-ROM, DVD-ROM, or floppy disk), or to reduce the time and bandwidth customers require to access software distributed via the Internet.

Executable compression is also frequently used to deter reverse engineering or to obfuscate the contents of the executable (for example, to hide the presence of malware from antivirus scanners) by proprietary methods of compression and/or added encryption. Executable compression can be used to prevent direct disassembly, mask string literals and modify signatures. Although this does not eliminate the chance of reverse engineering, it can make the process more costly.

A compressed executable requires less storage space in the file system, thus less time to transfer data from the file system into memory. On the other hand, it requires some time to decompress the data before execution begins. However, the speed of various storage media has not kept up with average processor speeds, so the storage is very often the bottleneck. Thus the compressed executable will load faster on most common systems. On modern desktop computers, this is rarely noticeable unless the executable is unusually big, so loading speed is not a primary reason for or against compressing an executable.

On operating systems which page executable images on demand from the disk, compressed executables make this process less efficient. The decompressor stub allocates a block of memory to hold the decompressed data, which stays allocated as long as the executable stays loaded, whether it is used or not, competing for memory resources with other applications all along. If the operating system uses a swap file, the decompressed data has to be written to it to free up the memory instead of simply discarding unused data blocks and reloading them from the executable image if needed again. This is usually not noticeable, but it becomes a problem when an executable is loaded more than once at the same time—the operating system cannot reuse data blocks it has already loaded, the data has to be decompressed into a new memory block, and will be swapped out independently if not used. The additional storage and time requirements mean that it has to be weighed carefully whether to compress executables which are typically run more than once at the same time.

Another disadvantage is that some utilities can no longer identify run-time library dependencies, as only the statically linked extractor stub is visible.

Also, some older virus scanners simply report all compressed executables as viruses because the decompressor stubs share some characteristics with those. Most modern virus scanners can unpack several different executable compression layers to check the actual executable inside, but some popular anti-virus and anti-malware scanners have had troubles with false positive alarms on compressed executables. In an attempt to solve the problem of malware obfuscated with the help of runtime packers the IEEE Industry Connections Security Group has introduced a software taggant system.

Executable compression used to be more popular when computers were limited to the storage capacity of floppy disks, which were both slow and low capacity media, and small hard drives; it allowed the computer to store more software in the same amount of space, without the inconvenience of having to manually unpack an archive file every time the user wanted to use the software. However, executable compression has become less popular because of increased storage capacity on computers. It has its use in the demoscene where demos have to stay within a size limit, e.g. 64k intro. Only very sophisticated compression formats, which add to load time, keep an executable small enough to enter these competitions.

List of executable packers

CP/M and MSX-DOS executable

Known executable compressors for CP/M-80 / MSX-DOS .COM files:

MS-DOS executable

Known executable compressors for MS-DOS-compatible executable files (.COM or .EXE):

OS/2 executable

Known executable compressors under OS/2:

New Executable

Known executable compressors for New Executables:

Portable Executable

Known executable compressors for Portable Executables:

Note: Clients in purple are no longer in development.

NameLatest stable Software license x86-64 support
32Lite
Alienyze1.4 (17 August 2020) Proprietary No
ANDpakk2
Armadillo9.62 (7 June 2013) Proprietary Yes
ASPack2.40 (7 December 2018) Proprietary Yes
ASPR (ASProtect)2.78 (7 December 2018) Proprietary Yes
BeRoEXEPacker
BIN-crypter
BoxedApp Packer3.3 (26 July 2015) Proprietary Yes
CExe1.0b (20 July 2001) GPL No
Crinkler2.3 (22 July 2020) Zlib Yes
dotBundle1.3 (4 April 2013) [15] Proprietary Yes
Enigma Protector6.60 (21 August 2019) [16] Proprietary Yes
Enigma Virtual Box9.40 (10 October 2019) [16] Proprietary Yes
exe32pack
EXE Bundle3.11 (7 January 2011) [17] Proprietary ?
EXECryptor
EXE Stealth4.14 (29 June 2011) [17] Proprietary ?
eXPressor1.8.0.1 (14 January 2010) Proprietary ?
FSG2.0 (24 May 2004) [18] Freeware No
kkrunchy src0.23a4 (Unknown) Public domain No
MEW1.1 (Unknown) Freeware No
MPRESS2.19 (2 January 2012) Freeware Yes
MuCruncher
NeoLite
NsPack
Obsidium1.6 (11 April 2017) [19] Proprietary Yes
PECompact
PEPack
PESpin1.33 (3 May 2011) Freeware Yes
Petite2.4 (22 September 2016) Freeware No
PKLite32
RLPack Basic1.21 (31 October 2008) GPL No
Shrinker32
Smart Packer Pro X2.0.0.1 (3 June 2019) Proprietary Yes
Themida/WinLicense3.0 (24 October 2019) Proprietary Yes
Upack
UPX 3.96 (23 January 2020) GPL experimental
VMProtect3.4 (3 August 2019) Proprietary Yes
WWPack321.20 (19 June 2000)No
XComp/XPack0.98 (18 February 2007) Freeware No
Yoda's Crypte
YZPack

ELF files

Known executable compressors for ELF files:

CLI assembly files

Known executable compressors for CLI assembly files:

Mac OS Classic applications

Executable compressors for Mac OS Classic applications:

Mach-O (Apple Mac OS X) files

Known executable compressors for Mach-O (Apple Mac OS X) files:

Commodore 64 and VIC-20

Known executable compressors for executables on the Commodore 64 and VIC-20:

Amiga

Known executable compressors for executables on the Amiga series:

Java

Known executable compressors for Java:

JAR files:

WAR files:

JavaScript

There are two types of compression that can be applied to JavaScript scripts:

Self-decompressing compressors

These compress the original script and output a new script that has a decompressor and compressed data.

  • JsSfx
  • Packify

Redundancy reducing compressors

These remove white space, remove comments, and shorten variable and function names but do not alter the behavior of the script.

  • Packer
  • YUI compressor
  • Shrinksafe
  • JSMin

See also

Related Research Articles

In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression. Lossy compression reduces bits by removing unnecessary or less important information. Typically, a device that performs data compression is referred to as an encoder, and one that performs the reversal of the process (decompression) as a decoder.

gzip GNU file compression/decompression tool

gzip is a file format and a software application used for file compression and decompression. The program was created by Jean-loup Gailly and Mark Adler as a free software replacement for the compress program used in early Unix systems, and intended for use by GNU. Version 0.1 was first publicly released on 31 October 1992, and version 1.0 followed in February 1993.

Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits statistical redundancy. By contrast, lossy compression permits reconstruction only of an approximation of the original data, though usually with greatly improved compression rates.

bzip2 File compression software

bzip2 is a free and open-source file compression program that uses the Burrows–Wheeler algorithm. It only compresses single files and is not a file archiver. It relies on separate external utilities for tasks such as handling multiple files, encryption, and archive-splitting.

In computing, Deflate is a lossless data compression file format that uses a combination of LZ77 and Huffman coding. It was designed by Phil Katz, for version 2 of his PKZIP archiving tool. Deflate was later specified in RFC 1951 (1996).

ZIP is an archive file format that supports lossless data compression. A ZIP file may contain one or more files or directories that may have been compressed. The ZIP file format permits a number of compression algorithms, though DEFLATE is the most common. This format was originally created in 1989 and was first implemented in PKWARE, Inc.'s PKZIP utility, as a replacement for the previous ARC compression format by Thom Henderson. The ZIP format was then quickly supported by many software utilities other than PKZIP. Microsoft has included built-in ZIP support in versions of Microsoft Windows since 1998 via the "Plus! 98" addon for Windows 98. Native support was added as of the year 2000 in Windows ME. Apple has included built-in ZIP support in Mac OS X 10.3 and later. Most free operating systems have built in support for ZIP in similar manners to Windows and macOS.

<span class="mw-page-title-main">A20 line</span> Signal in the system bus of an x86-based computer system

The A20, or address line 20, is one of the electrical lines that make up the system bus of an x86-based computer system. The A20 line in particular is used to transmit the 21st bit on the address bus.

Cabinet is an archive-file format for Microsoft Windows that supports lossless data compression and embedded digital certificates used for maintaining archive integrity. Cabinet files have .cab filename extensions and are recognized by their first four bytes MSCF. Cabinet files were known originally as Diamond files.

<span class="mw-page-title-main">7-Zip</span> Open-source file archiver

7-Zip is a free and open-source file archiver, a utility used to place groups of files within compressed containers known as "archives". It is developed by Igor Pavlov and was first released in 1999. 7-Zip has its own archive format called 7z, but can read and write several others.

WavPack is a free and open-source lossless audio compression format and application implementing the format. It is unique in the way that it supports hybrid audio compression alongside normal compression which is similar to how FLAC works. It also supports compressing a wide variety of lossless formats, including various variants of PCM and also DSD as used in SACDs, together with its support for surround audio.

PAQ is a series of lossless data compression archivers that have gone through collaborative development to top rankings on several benchmarks measuring compression ratio. Specialized versions of PAQ have won the Hutter Prize and the Calgary Challenge. PAQ is free software distributed under the GNU General Public License.

<span class="mw-page-title-main">UPX</span>

UPX is a free and open source executable packer supporting a number of file formats from different operating systems.

Snappy is a fast data compression and decompression library written in C++ by Google based on ideas from LZ77 and open-sourced in 2011. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. Compression speed is 250 MB/s and decompression speed is 500 MB/s using a single core of a circa 2011 "Westmere" 2.26 GHz Core i7 processor running in 64-bit mode. The compression ratio is 20–100% lower than gzip.

A disk compression software utility increases the amount of information that can be stored on a hard disk drive of given size. Unlike a file compression utility, which compresses only specified files—and which requires the user to designate the files to be compressed—an on-the-fly disk compression utility works automatically through resident software without the user needing to be aware of its existence. On-the-fly disk compression is therefore also known as transparent, real-time or online disk compression.

DriveSpace is a disk compression utility supplied with MS-DOS starting from version 6.0 in 1993 and ending in 2000 with the release of Windows Me. The purpose of DriveSpace is to increase the amount of data the user could store on disks by transparently compressing and decompressing data on-the-fly. It is primarily intended for use with hard drives, but use for floppy disks is also supported. This feature was removed in Windows XP and later.

<span class="mw-page-title-main">Self-extracting archive</span> Computer executable program

A self-extracting archive is a computer executable program which combines compressed data in an archive file with machine-executable code to extract the information. Run on a compatible operating system, there is no need for a suitable extractor in the target computer to extract the data. The executable part of the file is known as a decompressor stub.

<span class="mw-page-title-main">ALZip</span> Software

ALZip is an archive and compression utility software application from ESTsoft for Microsoft Windows that can unzip 40 different zip file archives. ALZip can zip files into eight different archive formats such as ZIP, EGG, TAR, and others. Introduced in ALZip version 8, the EGG archive format can be used, which supports Unicode and other features.

XZ Utils is a set of free software command-line lossless data compressors, including the programs lzma and xz, for Unix-like operating systems and, from version 5.0 onwards, Microsoft Windows. For compression/decompression the Lempel–Ziv–Markov chain algorithm (LZMA) is used. XZ Utils started as a Unix port of Igor Pavlov's LZMA-SDK that has been adapted to fit seamlessly into Unix environments and their usual structure and behavior.

ZPAQ is an open source command line archiver for Windows and Linux. It uses a journaling or append-only format which can be rolled back to an earlier state to retrieve older versions of files and directories. It supports fast incremental update by adding only files whose last-modified date has changed since the previous update. It compresses using deduplication and several algorithms depending on the data type and the selected compression level. To preserve forward and backward compatibility between versions as the compression algorithm is improved, it stores the decompression algorithm in the archive. The ZPAQ source code includes a public domain API, libzpaq, which provides compression and decompression services to C++ applications. The format is believed to be unencumbered by patents.

References

  1. Gielen, Pierre; Taylor, Johnathan (1997) [1993]. Logan, Wolverine (ed.). "PMarc help manual". Archived from the original on 2019-04-22. Retrieved 2019-02-22. […] PMEXE.CPM […] is a module […] in combination with PMARC […] used to make executable compressed COM files (just like LZEXE or PKLITE […] type: PMARC <archive>.COM=PMEXE2.CPM <filename> [options] The archive-name must be .COM […] not .PMA. The output file will have the extension .CPM. It's an MSX-DOS COM file […] rename file […] to run it […]
  2. "Expert Report of Robert B. K. Dewar In Response To The Report Of Kenneth D. Crews". Cambridge University Press et al v. Patton et al, Filing 124, Supplemental Initial Disclosures by Cambridge University Press, Oxford University Press, Inc., Sage Publications, Inc. - Cambridge University Press, Oxfort University Press, Inc., and Sage Publications, Inc. v. Mark P. Becker, Georgia State University President, et al, Civil Action No. 1:08-CV-1425-ODE (Court document). United States District Court For The Northern District Of Georgia, Atlanta Division. p. 18. Exhibit A. Archived from the original on 2018-05-01. Retrieved 2019-04-23. […] SPACEMAKER and TERMULATOR, commodity software for IBM PC (PC DOS file compression utility and VT-100 emulator), being marketed by Realia, Inc. R.B.K. Dewar (1982–1983), 8088 assembly language, 8,000 lines […]
  3. Realia, Inc. (January 1983). "If you use DOS, you need this program". PC Magazine (advertisement). Ziff-Davis Publishing. 2 (9): 417. Archived from the original on 2019-04-22. Retrieved 2019-04-22.
  4. 1 2 Dewar, Robert Berriedale Keith (1984-03-13). "DOS 3.1 ASMB (Another Silly Microsoft Bug)". info-ibmpc@USC-ISIB.ARPA. Archived from the original on 2018-05-01. Retrieved 2019-04-23. […] The /E option of the linker should generate an EXE file which is logically equivalent to the uncompressed EXE file. The current version […] results in AX being clobbered. AX on entry to an EXE file has a definite meaning (it indicates drive validity for the parameters), thus it should be passed through to the uncompressed image. Given this one very obvious violation of the interface rules, there may be others, I have not bothered to investigate further […] I did write the Realia SpaceMaker program which does a similar sort of thing to the EXEPACK option (but needless to say does not have this particular […]
  5. 1 2 Paul, Matthias R. (2002-10-07) [2000]. "Re: masm .com (PSP) related trouble". Newsgroup:  alt.lang.asm. Archived from the original on 2017-09-03. Retrieved 2017-09-03.}
  6. Necasek, Michal (2018-04-30). "Realia SpaceMaker". OS/2 Museum. Archived from the original on 2019-01-27. Retrieved 2019-02-22.
  7. Parsons, Jeff (2019-01-10). "An Update on Early Norton Utilities". PCjs. Archived from the original on 2019-01-29. Retrieved 2019-02-22.
  8. Necasek, Michal (2019-01-12). "Yep, Norton Did It". OS/2 Museum. Archived from the original on 2019-04-22. Retrieved 2019-04-22.
  9. 1 2 Necasek, Michal (2018-03-23). "EXEPACK and the A20-Gate". OS/2 Museum. Archived from the original on 2018-11-13. Retrieved 2019-04-20.
  10. Miles, Ya'akov; Nather, Ed (1986-05-17) [1986-02-05, 1986-02-09]. "Undocumented Microsoft LINK option: /E". INFO-IBMPC mailing list. Archived from the original on 2018-05-01. Retrieved 2019-04-26. [Miles:] There exists an undocumented […] switch to Microsoft LINK.EXE […], which will cause an automatic compaction during binding. This process will eliminate storage for uninitialized arrays from the .EXE file produced by the linker […] To use this feature, specify the /E option to the command line […] [Nather:] The option does not exist in MS Link versions 3.00 and 3.01 [Miles:] By comparing the sizes of the (packed) files generated from LINK ver 3.02 and the /E option with the size of the .EXE file manually packed with […] EXEPACK, I have come to the conclusion that LINK ver 3.02 option /E generates EXACTLY the same size file as manually running EXEPACK on a regular .EXE file output by LINK […]
  11. Bellard, Fabrice (2003-02-09). "LZEXE home page". bellard.org. Archived from the original on 2019-03-24. Retrieved 2019-03-18.
  12. 1 2 3 Salomon, David (2000) [1998]. "Chapter 3.22: EXE Compressors". Data Compression: The Complete Reference (2 ed.). Springer-Verlag. p. 212. doi:10.1007/978-3-642-86092-8. ISBN   978-3-540-78086-1. S2CID   35889155 . Retrieved 2019-04-26.
  13. Paul, Matthias R. (2002-04-11). "Re: [fd-dev] ANNOUNCE: CuteMouse 2.0 alpha 1". freedos-dev. Archived from the original on 2020-02-21. Retrieved 2020-02-21. […] > no one packer may pack combos like .SYS+.COM or .SYS+.EXE. […] There are packers for .COM or .EXE and others for .SYS, but I too have not seen a packer which supports both in one. […] possibility to combine a program/TSR and device driver in .EXE files […] and a program/TSR.COM and device driver into a .COM program […] It might also be possible to add another self-made stub to the file, after it has already been compressed […] all the compressed DR-DOS device drivers use a similar technique to let the normal PKLITE .COM decompressor work with .SYS files (meanwhile PKLITE supports a similar feature for .SYS files itself). […] (NB. PKLITE 1.50 (1995) and higher gained the capability to compress device drivers, but not combined COM+SYS drivers.)
  14. "Google Code Archive - Long-term storage for Google Code Project Hosting".
  15. "DotBundle - Download an evaluation version". Archived from the original on 2013-08-21. Retrieved 2013-05-06.
  16. 1 2 "Software Protection, Software Licensing, Software Virtualization".
  17. 1 2 "WebtoolMaster Software News".
  18. "Archived copy". www.xtreeme.prv.pl. Archived from the original on 2004-05-25. Retrieved 2022-01-15.{{cite web}}: CS1 maint: archived copy as title (link)
  19. "Download | Obsidium Software Protection System".
  20. "624".
  21. DotProtect http://site.yvansoftware.be/dotpacker1_0 Archived 22 January 2011 at the Wayback Machine
  22. Kiene, Steve; Mark, Dave (1999). "A Chat With Steve Kiene". MacTech . Vol. 15, no. 4. Retrieved 2017-12-10.
  23. "Lossless Data Compression Program: Hybrid LZ77 RLE". www.cs.tut.fi. Archived from the original on 2014-07-30. Retrieved 2022-01-15.
  24. web.comhem.se/~u13114991/exo/
  25. "ByteBoozer (PC)".
  26. 1 2 3 "Crunchers to download".
  27. "Askeksa/Shrinkler". GitHub . 2021-09-25.
  28. "PackFire v1.2k by Neural".