Perceptual hashing

Last updated

Perceptual hashing is the use of a fingerprinting algorithm that produces a snippet, hash, or fingerprint of various forms of multimedia. [1] [2] A perceptual hash is a type of locality-sensitive hash, which is analogous if features of the multimedia are similar. This is in contrast to cryptographic hashing, which relies on the avalanche effect of a small change in input value creating a drastic change in output value. Perceptual hash functions are widely used in finding cases of online copyright infringement as well as in digital forensics because of the ability to have a correlation between hashes so similar data can be found (for instance with a differing watermark).

Contents

Development

The 1980 work of Marr and Hildreth is a seminal paper in this field. [3]

The July 2010 thesis of Christoph Zauner is a well-written introduction to the topic. [4]

In June 2016 Azadeh Amir Asgari published work on robust image hash spoofing. Asgari notes that perceptual hash function like any other algorithm is prone to errors. [5]

Researchers remarked in December 2017 that Google image search is based on a perceptual hash. [6]

In research published in November 2021 investigators focused on a manipulated image of Stacey Abrams which was published to the internet prior to her loss in the 2018 Georgia gubernatorial election. They found that the pHash algorithm was vulnerable to nefarious actors. [7]

Characteristics

Research reported in January 2019 at Northumbria University has shown for video it can be used to simultaneously identify similar contents for video copy detection and detect malicious manipulations for video authentication. The system proposed performs better than current video hashing techniques in terms of both identification and authentication. [8]

Research reported in May 2020 by the University of Houston in deep learning based perceptual hashing for audio has shown better performance than traditional audio fingerprinting methods for the detection of similar/copied audio subject to transformations. [9]

In addition to its uses in digital forensics, research by a Russian group reported in 2019 has shown that perceptual hashing can be applied to a wide variety of situations. Similar to comparing images for copyright infringement, the group found that it could be used to compare and match images in a database. Their proposed algorithm proved to be not only effective, but more efficient than the standard means of database image searching. [10]

A Chinese team reported in July 2019 that they had discovered a perceptual hash for speech encryption which proved to be effective. They were able to create a system in which the encryption was not only more accurate, but more compact as well. [11]

Apple Inc reported as early as August 2021 a Child Sexual Abuse Material (CSAM) system that they know as NeuralHash. A technical summary document, which nicely explains the system with copious diagrams and example photographs, offers that "Instead of scanning images [on corporate] iCloud [servers], the system performs on-device matching using a database of known CSAM image hashes provided by [the National Center for Missing and Exploited Children] (NCMEC) and other child-safety organizations. Apple further transforms this database into an unreadable set of hashes, which is securely stored on users’ devices." [12]

In an essay entitled "The Problem With Perceptual Hashes", Oliver Kuederle produces a startling collision generated by a piece of commercial neural net software, of the NeuralHash type. A photographic portrait of a real woman (Adobe Stock #221271979) reduces through the test algorithm to the same hash as the photograph of a piece of abstract art (from the "deposit photos" database). Both sample images are in commercial databases. Kuederle is concerned with collisions like this. "These cases will be manually reviewed. That is, according to Apple, an Apple employee will then look at your (flagged) pictures... Perceptual hashes are messy. When such algorithms are used to detect criminal activities, especially at Apple scale, many innocent people can potentially face serious problems... Needless to say, I’m quite worried about this." [13]

Researchers have continued to publish a comprehensive analysis entitled "Learning to Break Deep Perceptual Hashing: The Use Case NeuralHash", in which they investigate the vulnerability of NeuralHash as a representative of deep perceptual hashing algorithms to various attacks. Their results show that hash collisions between different images can be achieved with minor changes applied to the images. According to the authors, these results demonstrate the real chance of such attacks and enable the flagging and possible prosecution of innocent users. They also state that the detection of illegal material can easily be avoided, and the system be outsmarted by simple image transformations, such as provided by free-to-use image editors. The authors assume their results to apply to other deep perceptual hashing algorithms as well, questioning their overall effectiveness and functionality in applications such as client-side scanning and chat controls. [14]

See also

Related Research Articles

<span class="mw-page-title-main">Hash function</span> Mapping arbitrary data to fixed-size values

A hash function is any function that can be used to map data of arbitrary size to fixed-size values, though there are some hash functions that support variable length output. The values returned by a hash function are called hash values, hash codes, hash digests, digests, or simply hashes. The values are usually used to index a fixed-size table called a hash table. Use of a hash function to index a hash table is called hashing or scatter storage addressing.

VoIP spam or SPIT is unsolicited, automatically dialed telephone calls, typically using voice over Internet Protocol (VoIP) technology.

A digital watermark is a kind of marker covertly embedded in a noise-tolerant signal such as audio, video or image data. It is typically used to identify ownership of the copyright of such signal. "Watermarking" is the process of hiding digital information in a carrier signal; the hidden information should, but does not need to, contain a relation to the carrier signal. Digital watermarks may be used to verify the authenticity or integrity of the carrier signal or to show the identity of its owners. It is prominently used for tracing copyright infringements and for banknote authentication.

In data analysis, anomaly detection is generally understood to be the identification of rare items, events or observations which deviate significantly from the majority of the data and do not conform to a well defined notion of normal behaviour. Such examples may arouse suspicions of being generated by a different mechanism, or appear inconsistent with the remainder of that set of data.

Plagiarism detection or content similarity detection is the process of locating instances of plagiarism or copyright infringement within a work or document. The widespread use of computers and the advent of the Internet have made it easier to plagiarize the work of others.

In computer science, locality-sensitive hashing (LSH) is a fuzzy hashing technique that hashes similar input items into the same "buckets" with high probability. Since similar items end up in the same buckets, this technique can be used for data clustering and nearest neighbor search. It differs from conventional hashing techniques in that hash collisions are maximized, not minimized. Alternatively, the technique can be seen as a way to reduce the dimensionality of high-dimensional data; high-dimensional input items can be reduced to low-dimensional versions while preserving relative distances between items.

Video fingerprinting or video hashing are a class of dimension reduction techniques in which a system identifies, extracts, and then summarizes characteristic

An audio search engine is a web-based search engine which crawls the web for audio content. The information can consist of web pages, images, audio files, or another type of document. Various techniques exist for research on these engines.

Private biometrics is a form of encrypted biometrics, also called privacy-preserving biometric authentication methods, in which the biometric payload is a one-way, homomorphically encrypted feature vector that is 0.05% the size of the original biometric template and can be searched with full accuracy, speed and privacy. The feature vector's homomorphic encryption allows search and match to be conducted in polynomial time on an encrypted dataset and the search result is returned as an encrypted match. One or more computing devices may use an encrypted feature vector to verify an individual person or identify an individual in a datastore without storing, sending or receiving plaintext biometric data within or between computing devices or any other entity. The purpose of private biometrics is to allow a person to be identified or authenticated while guaranteeing individual privacy and fundamental human rights by only operating on biometric data in the encrypted space. Some private biometrics including fingerprint authentication methods, face authentication methods, and identity-matching algorithms according to bodily features. Private biometrics are constantly evolving based on the changing nature of privacy needs, identity theft, and biotechnology.

<span class="mw-page-title-main">Fingerprint (computing)</span> Digital identifier derived from the data by an algorithm

In computer science, a fingerprinting algorithm is a procedure that maps an arbitrarily large data item to a much shorter bit string, its fingerprint, that uniquely identifies the original data for all practical purposes just as human fingerprints uniquely identify people for practical purposes. This fingerprint may be used for data deduplication purposes. This is also referred to as file fingerprinting, data fingerprinting, or structured data fingerprinting.

<span class="mw-page-title-main">Reverse image search</span> Content-based image retrieval

Reverse image search is a content-based image retrieval (CBIR) query technique that involves providing the CBIR system with a sample image that it will then base its search upon; in terms of information retrieval, the sample image is very useful. In particular, reverse image search is characterized by a lack of search terms. This effectively removes the need for a user to guess at keywords or terms that may or may not return a correct result. Reverse image search also allows users to discover content that is related to a specific sample image or the popularity of an image, and to discover manipulated versions and derivative works.

Video copy detection is the process of detecting illegally copied videos by analyzing them and comparing them to original content.

An acoustic fingerprint is a condensed digital summary, a fingerprint, deterministically generated from an audio signal, that can be used to identify an audio sample or quickly locate similar items in an audio database.

<span class="mw-page-title-main">Smudge attack</span> Discerning a password via screen smudges

A smudge attack is an information extraction attack that discerns the password input of a touchscreen device such as a cell phone or tablet computer from fingerprint smudges. A team of researchers at the University of Pennsylvania were the first to investigate this type of attack in 2010. An attack occurs when an unauthorized user is in possession or is nearby the device of interest. The attacker relies on detecting the oily smudges produced and left behind by the user's fingers to find the pattern or code needed to access the device and its contents. Simple cameras, lights, fingerprint powder, and image processing software can be used to capture the fingerprint deposits created when the user unlocks their device. Under proper lighting and camera settings, the finger smudges can be easily detected, and the heaviest smudges can be used to infer the most frequent input swipes or taps from the user.

PhotoDNA is a proprietary image-identification and content filtering technology widely used by online service providers.

Emotion recognition is the process of identifying human emotion. People vary widely in their accuracy at recognizing the emotions of others. Use of technology to help people with emotion recognition is a relatively nascent research area. Generally, the technology works best if it uses multiple modalities in context. To date, the most work has been conducted on automating the recognition of facial expressions from video, spoken expressions from audio, written expressions from text, and physiology as measured by wearables.

Shih-Fu Chang is a Taiwanese American computer scientist and electrical engineer noted for his research on multimedia information retrieval, computer vision, machine learning, and signal processing.

Deepfake pornography, or simply fake pornography, is a type of synthetic porn that is created via altering already-existing pornographic material by applying deepfake technology to the faces of the actors. The use of deepfake porn has sparked controversy because it involves the making and sharing of realistic videos featuring non-consenting individuals, typically female celebrities, and is sometimes used for revenge porn. Efforts are being made to combat these ethical concerns through legislation and technology-based solutions.

An audio deepfake is a type of artificial intelligence used to create convincing speech sentences that sound like specific people saying things they did not say. This technology was initially developed for various applications to improve human life. For example, it can be used to produce audiobooks, and also to help people who have lost their voices to get them back. Commercially, it has opened the door to several opportunities. This technology can also create more personalized digital assistants and natural-sounding text-to-speech as well as speech translation services.

Small object detection is a particular case of object detection where various techniques are employed to detect small objects in digital images and videos. "Small objects" are objects having a small pixel footprint in the input image. In areas such as aerial imagery, state-of-the-art object detection techniques under performed because of small objects.

References

  1. Buldas, Ahto; Kroonmaa, Andres; Laanoja, Risto (2013). "Keyless Signatures' Infrastructure: How to Build Global Distributed Hash-Trees". In Riis, Nielson H.; Gollmann, D. (eds.). Secure IT Systems. NordSec 2013. Lecture Notes in Computer Science. Vol. 8208. Berlin, Heidelberg: Springer. doi:10.1007/978-3-642-41488-6_21. ISBN   978-3-642-41487-9. ISSN   0302-9743. Keyless Signatures Infrastructure (KSI) is a globally distributed system for providing time-stamping and server-supported digital signature services. Global per-second hash trees are created and their root hash values published. We discuss some service quality issues that arise in practical implementation of the service and present solutions for avoiding single points of failure and guaranteeing a service with reasonable and stable delay. Guardtime AS has been operating a KSI Infrastructure for 5 years. We summarize how the KSI Infrastructure is built, and the lessons learned during the operational period of the service.
  2. Klinger, Evan; Starkweather, David. "pHash.org: Home of pHash, the open source perceptual hash library". pHash.org. Retrieved 2018-07-05. pHash is an open source software library released under the GPLv3 license that implements several perceptual hashing algorithms, and provides a C-like API to use those functions in your own programs. pHash itself is written in C++.
  3. Marr, D.; Hildreth, E. (29 Feb 1980). "Theory of Edge Detection". Proceedings of the Royal Society of London. Series B, Biological Sciences. 207 (1167): 187–217. Bibcode:1980RSPSB.207..187M. doi:10.1098/rspb.1980.0020. PMID   6102765. S2CID   2150419.
  4. Zauner, Christoph (July 2010). Implementation and Benchmarking of Perceptual Image Hash Functions (PDF). Upper Austria University of Applied Sciences, Hagenberg Campus.
  5. Asgari, Azadeh Amir (June 2016). Robust image hash spoofing (PDF). Blekinge Institute of Technology.
  6. "Google Image Search Explained". Medium. 26 December 2017.
  7. Hao, Qingying; Luo, Licheng; Jan, Steve T.K.; Wang, Gang (November 2021). "It's Not What It Looks Like: Manipulating Perceptual Hashing based Applications" (PDF). Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security (CCS '21), November 15–19, 2021, Virtual Event, Republic of Korea. Association for Computing Machinery.
  8. Khelifi, Fouad; Bouridane, Ahmed (January 2019). "Perceptual Video Hashing for Content Identification and Authentication" (PDF). IEEE Transactions on Circuits and Systems for Video Technology. 29 (1): 50–67. doi:10.1109/TCSVT.2017.2776159. S2CID   55725934.
  9. Báez-Suárez, Abraham; Shah, Nolan; Nolazco-Flores, Juan Arturo; Huang, Shou-Hsuan S.; Gnawali, Omprakash; Shi, Weidong (2020-05-19). "SAMAF: Sequence-to-sequence Autoencoder Model for Audio Fingerprinting". ACM Transactions on Multimedia Computing, Communications, and Applications. 16 (2): 43:1–43:23. doi:10.1145/3380828. ISSN   1551-6857.
  10. Zakharov, Victor; Kirikova, Anastasia; Munerman, Victor; Samoilova, Tatyana (2019). "Architecture of Software-Hardware Complex for Searching Images in Database". 2019 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EICon Rus). IEEE. pp. 1735–1739. doi:10.1109/EIConRus.2019.8657241. ISBN   978-1-7281-0339-6. S2CID   71152337.
  11. Zhang, Qiu-yu; Zhou, Liang; Zhang, Tao; Zhang, Deng-hai (July 2019). "A retrieval algorithm of encrypted speech based on short-term cross-correlation and perceptual hashing". Multimedia Tools and Applications. 78 (13): 17825–17846. doi:10.1007/s11042-019-7180-9. S2CID   58010160.
  12. "CSAM Detection - Technical Summary" (PDF). Apple Inc. August 2021.
  13. Kuederle, Oliver (n.d.). "THE PROBLEM WITH PERCEPTUAL HASHES". rentafounder.com. Retrieved 23 May 2022.
  14. Struppek, Lukas; Hintersdorf, Dominik; Neider, Daniel; Kersting, Kristian (2022). "Learning to Break Deep Perceptual Hashing: The Use Case Neural Hash". 2022 ACM Conference on Fairness, Accountability, and Transparency. Proceedings of the ACM Conference on Fairness, Accountability, and Transparency (FAccT). pp. 58–69. arXiv: 2111.06628 . doi:10.1145/3531146.3533073. ISBN   9781450393522. S2CID   244102645.