Xdelta

Last updated
Original author(s) Joshua MacDonald
Developer(s) Joshua MacDonald
Initial releaseOctober 12, 1997;27 years ago (1997-10-12)
Stable release
3.1.0 [1]   OOjs UI icon edit-ltr-progressive.svg / 8 January 2016;8 years ago (8 January 2016)
Repository
Written in C and C++
Operating system Unix-like, Microsoft Windows
Type Utility software
License Apache License 2.0 [2]

Xdelta [3] is a command line tool for delta encoding, which stores or transmits the difference (deltas) between sequential data, instead of entire files. This is similar to diff and patch, except diff computes and shows the difference between two complete files, while patch is primarily designed for human-readable text files; Xdelta is designed for binary files and does not generate human readable output.

Xdelta was first released sometime before October 12, 1997 [4] by Joshua MacDonald, who currently maintains the program. The algorithm of xdelta1 was based on the algorithm of rsync, developed by Andrew Tridgell, though it uses a smaller block size.[ citation needed ]

Xdelta version 3 is primarily designed to work with streams following the standardized VCDIFF format, and it realized the compatibility among other delta encoding software which supports the VCDIFF format.[ citation needed ] It runs on Unix-like operating systems and Microsoft Windows. xdelta can handle up to 264 byte files, [5] [ failed verification ] and it is suitable for large backups.

Related Research Articles

<span class="mw-page-title-main">Andrew Tridgell</span> Australian computer programmer

Andrew "Tridge" Tridgell is an Australian computer programmer. He is the author of and a contributor to the Samba file server, and co-inventor of the rsync algorithm.

In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression. Lossy compression reduces bits by removing unnecessary or less important information. Typically, a device that performs data compression is referred to as an encoder, and one that performs the reversal of the process (decompression) as a decoder.

Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits statistical redundancy. By contrast, lossy compression permits reconstruction only of an approximation of the original data, though usually with greatly improved compression rates.

Samba is a free software re-implementation of the SMB networking protocol, and was originally developed by Andrew Tridgell. Samba provides file and print services for various Microsoft Windows clients and can integrate with a Microsoft Windows Server domain, either as a Domain Controller (DC) or as a domain member. As of version 4, it supports Active Directory and Microsoft Windows NT domains.

<span class="mw-page-title-main">FLAC</span> Lossless digital audio coding format

FLAC is an audio coding format for lossless compression of digital audio, developed by the Xiph.Org Foundation, and is also the name of the free software project producing the FLAC tools, the reference software package that includes a codec implementation. Digital audio compressed by FLAC's algorithm can typically be reduced to between 50 and 70 percent of its original size and decompresses to an identical copy of the original audio data.

In computing, the utility diff is a data comparison tool that computes and displays the differences between the contents of files. Unlike edit distance notions used for other purposes, diff is line-oriented rather than character-oriented, but it is like Levenshtein distance in that it tries to determine the smallest set of deletions and insertions to create one file from the other. The utility displays the changes in one of several standard formats, such that both humans or computers can parse the changes, and use them for patching.

rsync File synchronization protocol and software

rsync is a utility for transferring and synchronizing files between a computer and a storage drive and across networked computers by comparing the modification times and sizes of files. It is commonly found on Unix-like operating systems and is under the GPL-3.0-or-later license.

Delta encoding is a way of storing or transmitting data in the form of differences (deltas) between sequential data rather than complete files; more generally this is known as data differencing. Delta encoding is sometimes called delta compression, particularly where archival histories of changes are required.

<span class="mw-page-title-main">History of software configuration management</span>

The history of software configuration management (SCM) can be traced back as early as the 1950s, when CM, originally for hardware development and production control, was being applied to software development. Early software had a physical footprint, such as cards, tapes, and other media. The first software configuration management was a manual operation. With the advances in language and complexity, software engineering, involving configuration management and other methods, became a major concern due to issues like schedule, budget, and quality. Practical lessons, over the years, had led to the definition, and establishment, of procedures and tools. Eventually, the tools became systems to manage software changes. Industry-wide practices were offered as solutions, either in an open or proprietary manner. With the growing use of computers, systems emerged that handled a broader scope, including requirements management, design alternatives, quality control, and more; later tools followed the guidelines of organizations, such as the Capability Maturity Model of the Software Engineering Institute.

patch (Unix) Unix utility to apply changes to text files

The computer tool patch is a Unix program that updates text files according to instructions contained in a separate file, called a patch file. The patch file is a text file that consists of a list of differences and is produced by running the related diff program with the original and updated file as arguments. Updating files with patch is often referred to as applying the patch or simply patching the files.

rzip is a huge-scale data compression computer program designed around initial LZ77-style string matching on a 900 MB dictionary window, followed by bzip2-based Burrows–Wheeler transform and entropy coding (Huffman) on 900 kB output chunks.

<span class="mw-page-title-main">Sparse file</span> Type of computer file

In computer science, a sparse file is a type of computer file that attempts to use file system space more efficiently when the file itself is partially empty. This is achieved by writing brief information (metadata) representing the empty blocks to the data storage media instead of the actual "empty" space which makes up the block, thus consuming less storage space. The full block is written to the media as the actual size only when the block contains "real" (non-empty) data.

<span class="mw-page-title-main">File comparison</span>

Editing documents, program code, or any data always risks introducing errors. Displaying the differences between two or more sets of data, file comparison tools can make computing simpler, and more efficient by focusing on new data and ignoring what did not change. Generically known as a diff after the Unix diff utility, there are a range of ways to compare data sources and display the results.

<span class="mw-page-title-main">BackupPC</span>

BackupPC is a free disk-to-disk backup software suite with a web-based frontend. The cross-platform server will run on any Linux, Solaris, or UNIX-based server. No client is necessary, as the server is itself a client for several protocols that are handled by other services native to the client OS. In 2007, BackupPC was mentioned as one of the three most well known open-source backup software, even though it is one of the tools that are "so amazing, but unfortunately, if no one ever talks about them, many folks never hear of them".

Remote Differential Compression (RDC) is a client–server synchronization algorithm that allows the contents of two files to be synchronized by communicating only the differences between them. It was introduced with Microsoft Windows Server 2003 R2, is included with later Windows client and server operating systems, but by 2019 is not being developed and is not used by any Microsoft product.

The Debian build toolchain is a collection of software utilities used to create Debian source packages (.dsc) and Debian binary packages from upstream source tarballs.

In computer science and information theory, data differencing or differential compression is producing a technical description of the difference between two sets of data – a source and a target. Formally, a data differencing algorithm takes as input source data and target data, and produces difference data such that given the source data and the difference data, one can reconstruct the target data.

VCDIFF is a format and an algorithm for delta encoding, described in IETF's RFC 3284. The algorithm is based on Jon Bentley and Douglas McIlroy's paper "Data Compression Using Long Common Strings" written in 1999. VCDIFF is used as one of the delta encoding algorithms in "Delta encoding in HTTP" and was employed in Google's Shared Dictionary Compression Over HTTP technology, formerly used in their Chrome browser.

A delta update is a software update that requires the user to download only those parts of the software's code that are new, or have been changed from their previous state, in contrast to having to download the entire program. The use of delta updates can save significant amounts of time and computing bandwidth. The name "delta" derives from the mathematical science use of the Greek letter delta, Δ or δ to denote change.

In data compression, BCJ, short for branch/call/jump, refers to a technique that improves the compression of machine code by replacing relative branch addresses with absolute ones. This allows a Lempel–Ziv compressor to identify duplicate targets and more efficiently encode them. On decompression, the inverse filter restores the original encoding. Different BCJ filters are used for different instruction sets, as each use different opcodes for branching.

References

  1. "Release 3.1.0". 8 January 2016. Retrieved 23 July 2018.
  2. MacDonald, Joshua. "Xdelta3 License Change". Archived from the original on 2024-01-13.
  3. MacDonald, Joshua (2025-01-01), jmacd/xdelta , retrieved 2025-01-03
  4. Mention of first version [usurped]
  5. xdelta3 memory tuning