Simple file verification

Last updated
Simple file verification
Filename extension
.sfv
Internet media type
text/x-sfv
Type of format Plain text list of CRC32 checksums

Simple file verification (SFV) is a file format for storing CRC32 checksums of files to verify the integrity of files. SFV is used to verify that a file has not been corrupted, but it does not otherwise verify the file's authenticity. The .sfv file extension is usually used for SFV files. [1]

Contents

Checksum

Files can become corrupted for a variety of reasons, including faulty storage media, errors in transmission, write errors during copying or moving, and software bugs. SFV verification ensures that a file has not been corrupted by comparing the file's CRC hash value to a previously calculated value. [1] Due to the nature of hash functions, hash collisions may result in false positives, but the likelihood of collisions is usually negligible with random corruption. (The number of possible checksums is limited though large, so that with any checksum scheme many files will have the same checksum. However, the probability of a corrupted file having the same checksum as its original is exceedingly small, unless deliberately constructed to maintain the checksum.)

SFV cannot be used to verify the authenticity of files, as CRC32 is not a collision resistant hash function; even if the hash sum file is not tampered with, it is computationally trivial for an attacker to cause deliberate hash collisions, meaning that a malicious change in the file is not detected by a hash comparison. In cryptography, this attack is called a collision attack. For this reason, the md5sum and sha1sum utilities are often preferred in Unix operating systems, which use the MD5 and SHA-1 cryptographic hash functions respectively.

Even a single-bit error causes both SFV's CRC and md5sum's cryptographic hash to fail, requiring the entire file to be re-fetched. The Parchive and rsync utilities are often preferred for verifying that a file has not been accidentally corrupted in transmission, since they can correct common small errors with a much shorter download.

Despite the weaknesses of the SFV format, it is popular due to the relatively small amount of time taken by SFV utilities to calculate the CRC32 checksums when compared to the time taken to calculate cryptographic hashes such as MD5 or SHA-1.

SFV uses a plain text file containing one line for each file and its checksum [1] in the format FILENAME<whitespaces>CHECKSUM. Any line starting with a semicolon ';' is considered to be a comment and is ignored for the purposes of file verification. The delimiter between the filename and checksum is always one or several spaces; tabs are never used. A sample SFV file is:

; This is a comment file_one.zip   c45ad668 file_two.zip   7903b8e6 file_three.zip e99a65fb

Command-line utility

An example of an open-source cross-platform command-line utility that outputs crc32 checksums is 7-Zip. [2]

Many Linux distributions include a simple command-line tool cksfv to verify the checksums.

See also

Related Research Articles

<span class="mw-page-title-main">Checksum</span> Data used to detect errors in other data

A checksum is a small-sized block of data derived from another block of digital data for the purpose of detecting errors that may have been introduced during its transmission or storage. By themselves, checksums are often used to verify data integrity but are not relied upon to verify data authenticity.

The MD5 message-digest algorithm is a widely used hash function producing a 128-bit hash value. MD5 was designed by Ronald Rivest in 1991 to replace an earlier hash function MD4, and was specified in 1992 as RFC 1321.

RAR is a proprietary archive file format that supports data compression, error correction and file spanning. It was developed in 1993 by Russian software engineer Eugene Roshal and the software is licensed by win.rar GmbH. The name RAR stands for Roshal Archive.

<span class="mw-page-title-main">Cryptographic hash function</span> Hash function that is suitable for use in cryptography

A cryptographic hash function (CHF) is a hash algorithm that has special properties desirable for a cryptographic application:

md5sum is a computer program that calculates and verifies 128-bit MD5 hashes, as described in RFC 1321. The MD5 hash functions as a compact digital fingerprint of a file. As with all such hashing algorithms, there is theoretically an unlimited number of files that will have any given MD5 hash. However, it is very unlikely that any two non-identical files in the real world will have the same MD5 hash, unless they have been specifically created to have the same hash.

UUHash is a hash algorithm employed by clients on the FastTrack network. It is employed for its ability to hash very large files in a very short period of time, even on older computers. However, this is achieved by only hashing a fraction of the file. This weakness makes it trivial to create a hash collision, allowing large sections to be completely altered without altering the checksum.

passwd Tool to change passwords on Unix-like OSes

passwd is a command on Unix, Plan 9, Inferno, and most Unix-like operating systems used to change a user's password. The password entered by the user is run through a key derivation function to create a hashed version of the new password, which is saved. Only the hashed version is stored; the entered password is not saved for security reasons.

Parchive is an erasure code system that produces par files for checksum verification of data integrity, with the capability to perform data recovery operations that can repair or regenerate corrupted or missing data.

File verification is the process of using an algorithm for verifying the integrity of a computer file, usually by checksum. This can be done by comparing two files bit-by-bit, but requires two copies of the same file, and may miss systematic corruptions which might occur to both files. A more popular approach is to generate a hash of the copied file and comparing that to the hash of the original file.

cksum Unix command

cksum is a command in Unix and Unix-like operating systems that generates a checksum value for a file or stream of data. The cksum command reads each file given in its arguments, or standard input if no arguments are provided, and outputs the file's 32-bit cyclic redundancy check (CRC) checksum and byte count. The CRC output by cksum is different from the CRC-32 used in zip, PNG and zlib.

In cryptography, a collision attack on a cryptographic hash tries to find two inputs producing the same hash value, i.e. a hash collision. This is in contrast to a preimage attack where a specific target hash value is specified.

<span class="mw-page-title-main">Digest access authentication</span> Method of negotiating credentials between web server and browser

Digest access authentication is one of the agreed-upon methods a web server can use to negotiate credentials, such as username or password, with a user's web browser. This can be used to confirm the identity of a user before sending sensitive information, such as online banking transaction history. It applies a hash function to the username and password before sending them over the network. In contrast, basic access authentication uses the easily reversible Base64 encoding instead of hashing, making it non-secure unless used in conjunction with TLS.

<span class="mw-page-title-main">Magnet URI scheme</span> Scheme that defines the format of magnet links

Magnet is a URI scheme that defines the format of magnet links, a de facto standard for identifying files (URN) by their content, via cryptographic hash value rather than by their location.

<span class="mw-page-title-main">Metalink</span> File format that describes one or more computer files available for download

Metalink is an extensible metadata file format that describes one or more computer files available for download. It specifies files appropriate for the user's language and operating system; facilitates file verification and recovery from data corruption; and lists alternate download sources.

sha1sum is a computer program that calculates and verifies SHA-1 hashes. It is commonly used to verify the integrity of files. It is installed by default on most Linux distributions. Typically distributed alongside sha1sum are sha224sum, sha256sum, sha384sum and sha512sum, which use a specific SHA-2 hash function and b2sum, which uses the BLAKE2 cryptographic hash function.

<span class="mw-page-title-main">Fingerprint (computing)</span> Digital identifier derived from the data by an algorithm

In computer science, a fingerprinting algorithm is a procedure that maps an arbitrarily large data item to a much shorter bit string, its fingerprint, that uniquely identifies the original data for all practical purposes just as human fingerprints uniquely identify people for practical purposes. This fingerprint may be used for data deduplication purposes. This is also referred to as file fingerprinting, data fingerprinting, or structured data fingerprinting.

muCommander

muCommander is a lightweight, open-source, cross-platform file manager that runs on operating systems supporting Java. It has a Norton Commander style, and a dual-pane interface to allow manipulation of files via keyboard shortcuts. Pre-compiled builds are available for Mac OS X, Microsoft Windows, Linux, Solaris, FreeBSD, and OpenVMS. The software runs from the Internet via Java Web Start.

BagIt is a set of hierarchical file system conventions designed to support disk-based storage and network transfer of arbitrary digital content. A "bag" consists of a "payload" and "tags," which are metadata files intended to document the storage and transfer of the bag. A required tag file contains a manifest listing every file in the payload together with its corresponding checksum. The name, BagIt, is inspired by the "enclose and deposit" method, sometimes referred to as "bag it and tag it."

BLAKE is a cryptographic hash function based on Daniel J. Bernstein's ChaCha stream cipher, but a permuted copy of the input block, XORed with round constants, is added before each ChaCha round. Like SHA-2, there are two variants differing in the word size. ChaCha operates on a 4×4 array of words. BLAKE repeatedly combines an 8-word hash value with 16 message words, truncating the ChaCha result to obtain the next hash value. BLAKE-256 and BLAKE-224 use 32-bit words and produce digest sizes of 256 bits and 224 bits, respectively, while BLAKE-512 and BLAKE-384 use 64-bit words and produce digest sizes of 512 bits and 384 bits, respectively.

010 Editor is a commercial hex editor and text editor for Microsoft Windows, Linux and macOS. Typically 010 Editor is used to edit text files, binary files, hard drives, processes, tagged data, source code, shell scripts, log files, etc. A large variety of binary data formats can be edited through the use of Binary Templates.

References

  1. 1 2 3 Wang, Wallace (2004). Steal this file sharing book: what they won't tell you about file sharing. ISBN   9781593270940.
  2. "h (Hash) command", 7-Zip , May 23, 2016

Further reading

Windows only