Video copy detection

Last updated

Video copy detection is the process of detecting illegally copied videos by analyzing them and comparing them to original content.

Contents

The goal of this process is to protect a video creator's intellectual property.

History

Indyk et al. [1] produced a video copy detection theory based on the length of the film; however, it worked only for whole films without modifications. When applied to short clips of a video, Idynk et al.'s technique does not detect that the clip is a copy.

Later,[ when? ] Oostveen et al. introduced the concept of a fingerprint, or hash function, that creates a unique signature of the video based on its contents. This fingerprint is based on the length of the video and the brightness, as determined by splitting it into a grid. The fingerprint cannot be used to recreate the original video because it describes only certain features of its respective video.

Some time ago,[ when? ] B.Coskun et al. presented two robust algorithms based on discrete cosine transform.

Hampapur and Balle created an algorithm creating a global description of a piece of video based on the video's motion, color, space,[ clarification needed ] and length.

To look at the color levels of the image was thought, and for this reason, Li et al. created an algorithm that examines the colors of a clip by creating a binary signature get from the histogram of every frame.[ clarification needed ] This algorithm, however, returns inconsistent results in cases in which a logo is added to the video, because the insertion of the logo's color elements adds false information that can confuse the system.

Techniques

Watermarked image Antoin Sevruguin 51 2 SI.jpg
Watermarked image

Watermarks

Watermarks are used to introduce an invisible signal into a video to ease the detection of illegal copies. This technique is widely used by photographers. Placing a watermark on a video such that it is easily seen by an audience allows the content creator to detect easily whether the image has been copied.

The limitation of watermarks is that if the original image is not watermarked, then it is not possible to know whether other images are copies.

Content-based signature

Video copy detection. Video copy detection.jpg
Video copy detection.

In this technique, a unique signature is created for the video on the basis of the video's content. Various video copy detection algorithms exist that use features of the video's content to assign the video a unique videohash. The fingerprint can be compared with other videohashes in a database.

This type of algorithm has a significant problem: if various aspects of the videos' contents are similar, it is difficult for an algorithm to determine whether the video in question is a copy of the original or merely similar to it. In such a case (e.g., two distinct news broadcasts), the algorithm can return that the video in question is a copy as the news broadcast often involve similar kind of banner and presenter often sit in a similar position. Videos with very minimal changes in frames with respect to time are more vulnerable to hash collision.

Algorithms

The following are some algorithms and techniques proposed for video copy detection.

Global Descriptors

Global temporal descriptor

In this algorithm, a global intensity is defined as the sum of all intensities of all pixels weighted along all the video. Thus, an identity for a video sample can be constructed on the basis of the length of the video and the pixel intensities throughout.

The global intensity a(t) is defined as:

Where k is the weighting of the image, I is the image, and N is the number of pixels in the image.

Global ordinal measurement descriptor

In this algorithm, the video is divided in N blocks, sorted by gray level. Then it's possible to create a vector describing the average gray level of each block.

With these average levels it is possible to create a new vector S(t), the video's signature:

To compare two videos, the algorithm defines a D(t) representing the similarity between both.

The value returned by D(t) helps determine whether the video in question is a copy.[ clarification needed ]

Ordinal and Temporal Descriptors

This technique was proposed by L.Chen and F. Stentiford. A measurement of dissimilarity is made by combining the two aforementioned algorithms, Global temporal descriptors and Global ordinal measurement descriptors, in time and space.[ clarification needed ]

TMK+PDQF

In 2019, Facebook open sourced TMK+PDQF, [2] part of a suite of tools used at Facebook to detect harmful content. It generates a signature of a whole video, and can easily handle changes in format or added watermarks, but is less tolerant of cropping or clipping. [3]

Local Descriptors

AJ

Described by A. Joly et al., this algorithm is an improvement of Harris' Interest Points detector.[ clarification needed (what is this?)] This technique suggests that in many videos a significant number of frames are almost identical, so it is more efficient to test not every frame but just those depicting a significant amount of motion.

ViCopT

ViCopT uses the interest points from each image to define a signature of the whole video. In every image, the algorithms identifies and defines two parts: the background, a set of static elements along a temporal sequence, and the motion, persistent points changing positions throughout the video.

Space Time Interest Points (STIP)

This algorithm was developed by I. Laptev and T.Lindeberg. It uses the interest points technique along the space and time to define the video signature, and creates a 34th-dimension vector that stores this signature.[ clarification needed ]

Algorithm showcasing

There exist algorithms for video copy detection that are in use today. In 2007, there was an evaluation showcase known as the Multimedia Understanding Through Semantics, Computation and Learning (MUSCLE), which tested video copy detection algorithms on various video samples ranging from home video recordings to TV show segments ranging from one minute to one hour in length.

Related Research Articles

<span class="mw-page-title-main">Steganography</span> Hiding messages in other messages

Steganography is the practice of representing information within another message or physical object, in such a manner that the presence of the information is not evident to human inspection. In computing/electronic contexts, a computer file, message, image, or video is concealed within another file, message, image, or video. The word steganography comes from Greek steganographia, which combines the words steganós, meaning "covered or concealed", and -graphia meaning "writing".

<span class="mw-page-title-main">Canny edge detector</span> Image edge detection algorithm

The Canny edge detector is an edge detection operator that uses a multi-stage algorithm to detect a wide range of edges in images. It was developed by John F. Canny in 1986. Canny also produced a computational theory of edge detection explaining why the technique works.

<span class="mw-page-title-main">Image segmentation</span> Partitioning a digital image into segments

In digital image processing and computer vision, image segmentation is the process of partitioning a digital image into multiple image segments, also known as image regions or image objects. The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is typically used to locate objects and boundaries in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics.

A digital watermark is a kind of marker covertly embedded in a noise-tolerant signal such as audio, video or image data. It is typically used to identify ownership of the copyright of such signal. "Watermarking" is the process of hiding digital information in a carrier signal; the hidden information should, but does not need to, contain a relation to the carrier signal. Digital watermarks may be used to verify the authenticity or integrity of the carrier signal or to show the identity of its owners. It is prominently used for tracing copyright infringements and for banknote authentication.

A rolling hash is a hash function where the input is hashed in a window that moves through the input.

In computer vision, blob detection methods are aimed at detecting regions in a digital image that differ in properties, such as brightness or color, compared to surrounding regions. Informally, a blob is a region of an image in which some properties are constant or approximately constant; all the points in a blob can be considered in some sense to be similar to each other. The most common method for blob detection is convolution.

In computer science, locality-sensitive hashing (LSH) is a fuzzy hashing technique that hashes similar input items into the same "buckets" with high probability. Since similar items end up in the same buckets, this technique can be used for data clustering and nearest neighbor search. It differs from conventional hashing techniques in that hash collisions are maximized, not minimized. Alternatively, the technique can be seen as a way to reduce the dimensionality of high-dimensional data; high-dimensional input items can be reduced to low-dimensional versions while preserving relative distances between items.

Video fingerprinting or video hashing are a class of dimension reduction techniques in which a system identifies, extracts, and then summarizes characteristic

In the fields of computer vision and image analysis, the Harris affine region detector belongs to the category of feature detection. Feature detection is a preprocessing step of several algorithms that rely on identifying characteristic points or interest points so to make correspondences between images, recognize textures, categorize objects or build panoramas.

In computer vision, maximally stable extremal regions (MSER) are used as a method of blob detection in images. This technique was proposed by Matas et al. to find correspondences between image elements from two images with different viewpoints. This method of extracting a comprehensive number of corresponding image elements contributes to the wide-baseline matching, and it has led to better stereo matching and object recognition algorithms.

<span class="mw-page-title-main">Fingerprint (computing)</span> Digital identifier derived from the data by an algorithm

In computer science, a fingerprinting algorithm is a procedure that maps an arbitrarily large data item to a much shorter bit string, its fingerprint, that uniquely identifies the original data for all practical purposes just as human fingerprints uniquely identify people for practical purposes. This fingerprint may be used for data deduplication purposes. This is also referred to as file fingerprinting, data fingerprinting, or structured data fingerprinting.

An audio watermark is a unique electronic identifier embedded in an audio signal, typically used to identify ownership of copyright. It is similar to a watermark on a photograph.

<span class="mw-page-title-main">Reverse image search</span> Content-based image retrieval

Reverse image search is a content-based image retrieval (CBIR) query technique that involves providing the CBIR system with a sample image that it will then base its search upon; in terms of information retrieval, the sample image is very useful. In particular, reverse image search is characterized by a lack of search terms. This effectively removes the need for a user to guess at keywords or terms that may or may not return a correct result. Reverse image search also allows users to discover content that is related to a specific sample image or the popularity of an image, and to discover manipulated versions and derivative works.

In computer science and data mining, MinHash is a technique for quickly estimating how similar two sets are. The scheme was invented by Andrei Broder, and initially used in the AltaVista search engine to detect duplicate web pages and eliminate them from search results. It has also been applied in large-scale clustering problems, such as clustering documents by the similarity of their sets of words.

An acoustic fingerprint is a condensed digital summary, a fingerprint, deterministically generated from an audio signal, that can be used to identify an audio sample or quickly locate similar items in an audio database.

Features from accelerated segment test (FAST) is a corner detection method, which could be used to extract feature points and later used to track and map objects in many computer vision tasks. The FAST corner detector was originally developed by Edward Rosten and Tom Drummond, and was published in 2006. The most promising advantage of the FAST corner detector is its computational efficiency. Referring to its name, it is indeed faster than many other well-known feature extraction methods, such as difference of Gaussians (DoG) used by the SIFT, SUSAN and Harris detectors. Moreover, when machine learning techniques are applied, superior performance in terms of computation time and resources can be realised. The FAST corner detector is very suitable for real-time video processing application because of this high-speed performance.

<span class="mw-page-title-main">Foreground detection</span>

Foreground detection is one of the major tasks in the field of computer vision and image processing whose aim is to detect changes in image sequences. Background subtraction is any technique which allows an image's foreground to be extracted for further processing.

Perceptual hashing is the use of a fingerprinting algorithm that produces a snippet, hash, or fingerprint of various forms of multimedia. A perceptual hash is a type of locality-sensitive hash, which is analogous if features of the multimedia are similar. This is in contrast to cryptographic hashing, which relies on the avalanche effect of a small change in input value creating a drastic change in output value. Perceptual hash functions are widely used in finding cases of online copyright infringement as well as in digital forensics because of the ability to have a correlation between hashes so similar data can be found.

Chessboards arise frequently in computer vision theory and practice because their highly structured geometry is well-suited for algorithmic detection and processing. The appearance of chessboards in computer vision can be divided into two main areas: camera calibration and feature extraction. This article provides a unified discussion of the role that chessboards play in the canonical methods from these two areas, including references to the seminal literature, examples, and pointers to software implementations.

In computer vision, rigid motion segmentation is the process of separating regions, features, or trajectories from a video sequence into coherent subsets of space and time. These subsets correspond to independent rigidly moving objects in the scene. The goal of this segmentation is to differentiate and extract the meaningful rigid motion from the background and analyze it. Image segmentation techniques labels the pixels to be a part of pixels with certain characteristics at a particular time. Here, the pixels are segmented depending on its relative movement over a period of time i.e. the time of the video sequence.

References

  1. P. Indyk, G. Iyengar, and N. Shivakumar. Finding pirated video sequences on the internet. Technical report, Stanford University, 1999.
  2. "Facebook open-sources algorithms for detecting child exploitation and terrorism imagery". August 2019.
  3. "Papers with Code - PDQ & TMK + PDQF -- A Test Drive of Facebook's Perceptual Hashing Algorithms".