Watermark (data file)

Last updated

A watermark stored in a data file refers to a method for ensuring data integrity which combines aspects of data hashing and digital watermarking. Both are useful for tamper detection, though each has its own advantages and disadvantages.

Contents

Data hashing

A typical data hash will process an input file to produce an alphanumeric string unique to the data file. Should the file be modified, such as if one or more bit changes occur within this original file, the same hash process on the modified file will produce a different alphanumeric. Through this method, a trusted source can calculate the hash of an original data file and subscribers can verify the integrity of the data. The subscriber simply compares a hash of the received data file with the known hash from the trusted source. This can lead to two situations: the hash being the same or the hash being different.

If the hash results are the same, the systems involved can have an appropriate degree of confidence to the integrity of the received data. On the other hand, if the hash results are different, they can conclude that the received data file has been altered.

This process is common in P2P networks, for example the BitTorrent protocol. Once a part of the file is downloaded, the data is then checked against the hash key (known as a hash check). Upon this result, the data is kept or discarded.

Digital watermarking

Digital watermarking is distinctly different from data hashing. It is the process of altering the original data file, allowing for the subsequent recovery of embedded auxiliary data referred to as a watermark.

A subscriber, with knowledge of the watermark and how it is recovered, can determine (to a certain extent) whether significant changes have occurred within the data file. Depending on the specific method used, recovery of the embedded auxiliary data can be robust to post-processing (such as lossy compression).

If the data file to be retrieved is an image, the provider can embed a watermark for protection purposes. The process allows tolerance to some change, while still maintaining an association with the original image file. Researchers have also developed techniques that embed components of the image within the image. This can help identify portions of the image that may contain unauthorized changes and even help in recovering some of the lost data.

A disadvantage of digital watermarking is that a subscriber cannot significantly alter some files without sacrificing the quality or utility of the data. This can be true of various files including image data, audio data, and computer code.

See also

Related Research Articles

<span class="mw-page-title-main">Checksum</span> Data used to detect errors in other data

A checksum is a small-sized block of data derived from another block of digital data for the purpose of detecting errors that may have been introduced during its transmission or storage. By themselves, checksums are often used to verify data integrity but are not relied upon to verify data authenticity.

<span class="mw-page-title-main">Error detection and correction</span> Techniques that enable reliable delivery of digital data over unreliable communication channels

In information theory and coding theory with applications in computer science and telecommunication, error detection and correction (EDAC) or error control are techniques that enable reliable delivery of digital data over unreliable communication channels. Many communication channels are subject to channel noise, and thus errors may be introduced during transmission from the source to a receiver. Error detection techniques allow detecting such errors, while error correction enables reconstruction of the original data in many cases.

Steganography is the practice of representing information within another message or physical object, in such a manner that the presence of the information is not evident to human inspection. In computing/electronic contexts, a computer file, message, image, or video is concealed within another file, message, image, or video. The word steganography comes from Greek steganographia, which combines the words steganós, meaning "covered or concealed", and -graphia meaning "writing".

<span class="mw-page-title-main">Hash collision</span> Hash function phenomenon

In computer science, a hash collision or clash is when two pieces of data in a hash table share the same hash value. The hash value in this case is derived from a hash function which takes a data input and returns a fixed length of bits.

In computer programming, Base64 is a group of binary-to-text encoding schemes that represent binary data in sequences of 24 bits that can be represented by four 6-bit Base64 digits.

<span class="mw-page-title-main">Cryptographic hash function</span> Hash function that is suitable for use in cryptography

A cryptographic hash function (CHF) is a mathematical algorithm that maps data of an arbitrary size to a bit array of a fixed size. It is a one-way function, that is, a function for which it is practically infeasible to invert or reverse the computation. Ideally, the only way to find a message that produces a given hash is to attempt a brute-force search of possible inputs to see if they produce a match, or use a rainbow table of matched hashes. Cryptographic hash functions are a basic tool of modern cryptography.

In information technology, a backup, or data backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event. The verb form, referring to the process of doing so, is "back up", whereas the noun and adjective form is "backup". Backups can be used to recover data after its loss from data deletion or corruption, or to recover data from an earlier time. Backups provide a simple form of disaster recovery; however not all backup systems are able to reconstitute a computer system or other complex configuration such as a computer cluster, active directory server, or database server.

File verification is the process of using an algorithm for verifying the integrity of a computer file, usually by checksum. This can be done by comparing two files bit-by-bit, but requires two copies of the same file, and may miss systematic corruptions which might occur to both files. A more popular approach is to generate a hash of the copied file and comparing that to the hash of the original file.

A digital watermark is a kind of marker covertly embedded in a noise-tolerant signal such as audio, video or image data. It is typically used to identify ownership of the copyright of such signal. "Watermarking" is the process of hiding digital information in a carrier signal; the hidden information should, but does not need to, contain a relation to the carrier signal. Digital watermarks may be used to verify the authenticity or integrity of the carrier signal or to show the identity of its owners. It is prominently used for tracing copyright infringements and for banknote authentication.

A watermark is a recognizable image or pattern in paper used to determine authenticity.

Disk encryption is a special case of data at rest protection when the storage medium is a sector-addressable device. This article presents cryptographic aspects of the problem. For an overview, see disk encryption. For discussion of different software packages and hardware devices devoted to this problem, see disk encryption software and disk encryption hardware.

Disk encryption is a technology which protects information by converting it into unreadable code that cannot be deciphered easily by unauthorized people. Disk encryption uses disk encryption software or hardware to encrypt every bit of data that goes on a disk or disk volume. It is used to prevent unauthorized access to data storage.

Anti-computer forensics or counter-forensics are techniques used to obstruct forensic analysis.

Trusted timestamping is the process of securely keeping track of the creation and modification time of a document. Security here means that no one—not even the owner of the document—should be able to change it once it has been recorded provided that the timestamper's integrity is never compromised.

<span class="mw-page-title-main">Fingerprint (computing)</span> Digital identifier derived from the data by an algorithm

In computer science, a fingerprinting algorithm is a procedure that maps an arbitrarily large data item to a much shorter bit string, its fingerprint, that uniquely identifies the original data for all practical purposes just as human fingerprints uniquely identify people for practical purposes. This fingerprint may be used for data deduplication purposes. This is also referred to as file fingerprinting, data fingerprinting, or structured data fingerprinting.

The subject of computer backups is rife with jargon and highly specialized terminology. This page is a glossary of backup terms that aims to clarify the meaning of such jargon and terminology.

TinEye is a reverse image search engine developed and offered by Idée, Inc., a company based in Toronto, Ontario, Canada. It is the first image search engine on the web to use image identification technology rather than keywords, metadata or watermarks. TinEye allows users to search not using keywords but with images. Upon submitting an image, TinEye creates a "unique and compact digital signature or fingerprint" of the image and matches it with other indexed images. This procedure is able to match even heavily edited versions of the submitted image, but will not usually return similar images in the results.

<span class="mw-page-title-main">Mobile device forensics</span> Mobile Digital Forensics

Mobile device forensics is a branch of digital forensics relating to recovery of digital evidence or data from a mobile device under forensically sound conditions. The phrase mobile device usually refers to mobile phones; however, it can also relate to any digital device that has both internal memory and communication ability, including PDA devices, GPS devices and tablet computers.

<span class="mw-page-title-main">Digital forensic process</span>

The digital forensic process is a recognized scientific and forensic process used in digital forensics investigations. Forensics researcher Eoghan Casey defines it as a number of steps from the original incident alert through to reporting of findings. The process is predominantly used in computer and mobile forensic investigations and consists of three steps: acquisition, analysis and reporting.

<span class="mw-page-title-main">Cinavia</span>

Cinavia, originally called Verance Copy Management System for Audiovisual Content (VCMS/AV), is an analog watermarking and steganography system under development by Verance since 1999, and released in 2010. In conjunction with the existing Advanced Access Content System (AACS) digital rights management (DRM) inclusion of Cinavia watermarking detection support became mandatory for all consumer Blu-ray Disc players from 2012.