Rdiff-backup

Last updated
rdiff-backup
Original author(s) Ben Escoto (2001–2005)
Dean Gaudet, Andrew Ferguson, Edward Ned Harvey (2005–2016)
Eric Lavarde, Otto Kekäläinen, Patrik Dufresne (Python 3 rewrite beginning in 2019)
Initial release2001
Stable release
2.2.6 [1] [2]   OOjs UI icon edit-ltr-progressive.svg / 8 September 2023;5 months ago (8 September 2023)
Repository
Written in Python
Platform Unix-like systems, MacOS, Windows
Type backup software
License GPL-2.0-or-later
Website rdiff-backup.net

rdiff-backup is a backup software written in Python that creates reverse incremental backups. The most recent backup is thus directly accessible, while earlier backups will be reconstructed from diff files by rdiff-backup.

Contents

As the name implies, rdiff-backup uses the rdiff method (more exactly, the reimplementation of rsync within librsync) to compute the differences between file versions. rdiff-backup is able to back up files across different machines via ssh.

Usage

Beginning with version 2.2, the flags passed to rdiff-backup are either general, or specific to the operation. For example, disabling fsync (see below) is an option that is general, and thus comes after rdiff-backup. --no-compression is specific to the backup mode, and thus comes after backup.

Backup

Normal operation is rdiff-backup backup <source directory> <backup directory>. gzip compression of increment files can be disabled with --no-compression after the backup flag. The options -v 5 and --print-statistics show the backup's progress and some statistics.

Specifying --no-fsync will disable fsync, causing a significant speedup, with an elevated risk of data loss.

Restoration of files or directories

rdiff-backup --restore-as-of <date> <backup> <source> will restore to <source> the entire backup, a single file or a sub-directory. <date> can be specified in one of several ways:

It is also possible to find the relevant time-stamped file in the rdiff-backup-data/increments directory, and run rdiff-backup <time-stamped file> <file or folder to be restored>.

Simpler (but not always correctly, as the file permissions might not be properly restored), the most recent backup can also be restored by copying a backed-up file or directory with cp -a or rsync -a. A deleted file – recognizable by the suffix snapshot.gz – can also be restored by retrieving it in the rdiff-backup-data/increments directory, copying it to the source directory, and unpacking with gzip.

Deleting old backups

Only the oldest backups can be removed, with rdiff-backup --remove-older-than <date> <backup directory>. The ability to delete the oldest versions of specific files (or directories) is scheduled to appear in version 2.2.

When deleting old versions, <date> takes the same arguments as when restoring files or directories (see above).

Problems

rdiff-backup does not work under Linux with SSHFS and exFAT file systems, though FAT and NTFS do work. This is mostly due to their implementation as FUSE module, causing delays in certain operations, making it probably unfit for backup purposes. No file system is explicitly supported or unsupported, but rdiff-backup does tests before starting a backup, and refuses to progress on file systems deemed unfit. Regarding exFAT, by using the newer in-kernel exFAT-driver, this limitation should be overcome.

rdiff-backup cannot backup to a SFTP destination.

rdiff-backup recognizes changed files only by file size as well as modification time (mtime). To make sure all changed files have been backed-up, running rdiff-backup --compare-hash <source directory> <backup directory> (or rdiff-backup --compare-full <source directory> <backup directory> for a byte-wise comparison) will display all changed files. Then, using touch , the modification time of all problematic files can be reset to now, and thus, they will be included during the next rdiff-backup run.

  1. Eric Zolf (8 September 2023). "Final minor release v2.2.6" . Retrieved 13 September 2023.
  2. Eric Zolf (1 January 2024). "Happy New Year unreleased version 2.6.0" . Retrieved 9 January 2024.

Related Research Articles

zlib DEFLATE codec library

zlib is a software library used for data compression as well as a data format. zlib was written by Jean-loup Gailly and Mark Adler and is an abstraction of the DEFLATE compression algorithm used in their gzip file compression program. zlib is also a crucial component of many software platforms, including Linux, macOS, and iOS. It has also been used in gaming consoles such as the PlayStation 4, PlayStation 3, Wii U, Wii, Xbox One and Xbox 360.

rsync File synchronization protocol and software

rsync is a utility for transferring and synchronizing files between a computer and a storage drive and across networked computers by comparing the modification times and sizes of files. It is commonly found on Unix-like operating systems and is under the GPL-3.0-or-later license.

In computing, tar is a computer software utility for collecting many files into one archive file, often referred to as a tarball, for distribution or backup purposes. The name is derived from "tape archive", as it was originally developed to write data to sequential I/O devices with no file system of their own, such as devices that use magnetic tape. The archive data sets created by tar contain various file system parameters, such as name, timestamps, ownership, file-access permissions, and directory organization. POSIX abandoned tar in favor of pax, yet tar sees continued widespread use.

Delta encoding is a way of storing or transmitting data in the form of differences (deltas) between sequential data rather than complete files; more generally this is known as data differencing. Delta encoding is sometimes called delta compression, particularly where archival histories of changes are required.

<span class="mw-page-title-main">7-Zip</span> Open-source file archiver

7-Zip is a free and open-source file archiver, a utility used to place groups of files within compressed containers known as "archives". It is developed by Igor Pavlov and was first released in 1999. 7-Zip has its own archive format called 7z, but can read and write several others.

In information technology, a backup, or data backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event. The verb form, referring to the process of doing so, is "back up", whereas the noun and adjective form is "backup". Backups can be used to recover data after its loss from data deletion or corruption, or to recover data from an earlier time. Backups provide a simple form of disaster recovery; however not all backup systems are able to reconstitute a computer system or other complex configuration such as a computer cluster, active directory server, or database server.

A versioning file system is any computer file system which allows a computer file to exist in several versions at the same time. Thus it is a form of revision control. Most common versioning file systems keep a number of old copies of the file. Some limit the number of changes per minute or per hour to avoid storing large numbers of trivial changes. Others instead take periodic snapshots whose contents can be accessed using methods similar as those for normal file access.

A remote, online, or managed backup service, sometimes marketed as cloud backup or backup-as-a-service, is a service that provides users with a system for the backup, storage, and recovery of computer files. Online backup providers are companies that provide this type of service to end users. Such backup services are considered a form of cloud computing.

dar is a computer program, a command-line archiving tool intended as a replacement for tar in Unix-like operating systems.

Undeletion is a feature for restoring computer files which have been removed from a file system by file deletion. Deleted data can be recovered on many file systems, but not all file systems provide an undeletion feature. Recovering data without an undeletion facility is usually called data recovery, rather than undeletion. Undeletion can both help prevent users from accidentally losing data, or can pose a computer security risk, since users may not be aware that deleted files remain accessible.

An incremental backup is one in which successive copies of the data contain only the portion that has changed since the preceding backup copy was made. When a full recovery is needed, the restoration process would need the last full backup plus all the incremental backups until the point of restoration. Incremental backups are often desirable as they reduce storage space usage, and are quicker to perform than differential backups.

<span class="mw-page-title-main">Bacula</span>

Bacula is an open-source, enterprise-level computer backup system for heterogeneous networks. It is designed to automate backup tasks that had often required intervention from a systems administrator or computer operator.

<span class="mw-page-title-main">BackupPC</span>

BackupPC is a free disk-to-disk backup software suite with a web-based frontend. The cross-platform server will run on any Linux, Solaris, or UNIX-based server. No client is necessary, as the server is itself a client for several protocols that are handled by other services native to the client OS. In 2007, BackupPC was mentioned as one of the three most well known open-source backup software, even though it is one of the tools that are "so amazing, but unfortunately, if no one ever talks about them, many folks never hear of them".

<span class="mw-page-title-main">Time Machine (macOS)</span> Backup software application developed by Apple and distributed as part of macOS

Time Machine is the backup mechanism of macOS, the desktop operating system developed by Apple. The software is designed to work with both local storage devices and network-attached disks, and is most commonly used with external disk drives connected using either USB or Thunderbolt. It was first introduced in Mac OS X 10.5 Leopard, which appeared in October 2007 and incrementally refined in subsequent releases of macOS. Time Machine was revamped in macOS 11 Big Sur to support APFS, thereby enabling "faster, more compact, and more reliable backups" than were possible previously.

Duplicity, graphical interface known as Déjà Dup, is a software suite that provides encrypted, digitally signed, versioned, local or remote backup of files requiring little of the remote server. Released under the terms of the GNU General Public License (GPL), Duplicity is free software.

Robocopy is a command-line file transfer utility for Microsoft Windows. Robocopy is functionally more comprehensive than the COPY command and XCOPY, but replaces neither. Created by Kevin Allen and first released as part of the Windows NT 4.0 Resource Kit, it has been a standard feature of Windows since Windows Vista and Windows Server 2008.

<span class="mw-page-title-main">Areca Backup</span> File backup system

Areca Backup is an Open Source personal file backup software developed in Java. It is released under the GNU General Public License (GPL) 2.

Btrfs is a computer storage format that combines a file system based on the copy-on-write (COW) principle with a logical volume manager, developed together. It was founded by Chris Mason in 2007 for use in Linux, and since November 2013, the file system's on-disk format has been declared stable in the Linux kernel.

<span class="mw-page-title-main">FlyBack</span> Open-source backup utility for Linux

FlyBack is an open-source backup utility for Linux based on Git and modeled loosely after Apple's Time Machine.

<span class="mw-page-title-main">Duplicati</span> Backup software

Duplicati is a backup client that securely stores encrypted, incremental, compressed remote backups of local files on cloud storage services and remote file servers. Duplicati supports not only various online backup services like OneDrive, Amazon S3, Backblaze, Rackspace Cloud Files, Tahoe LAFS, and Google Drive, but also any servers that support SSH/SFTP, WebDAV, or FTP.