PhotoDNA

Last updated

PhotoDNA is a proprietary image-identification and content filtering technology [1] widely used by online service providers. [2] [3]

Contents

History

PhotoDNA was developed by Microsoft Research and Hany Farid, professor at Dartmouth College, beginning in 2009. From a database of known images and video files, it creates unique hashes to represent each image, which can then be used to identify other instances of those images. [4]

The hashing method initially relied on converting images into a black-and-white format, dividing them into squares, and quantifying the shading of the squares, [5] did not employ facial recognition technology, nor could it identify a person or object in the image.[ citation needed ] The method sought to be resistant to alterations in the image, including resizing and minor color alterations. [4] Since 2015, [6] similar methods are used for individual video frames in video files. [7]

Microsoft donated[ failed verification ] the PhotoDNA technology to Project VIC, managed and supported by the International Centre for Missing & Exploited Children (ICMEC) and used as part of digital forensics operations [8] [9] by storing "fingerprints" that can be used to uniquely identify an individual photo. [9] [10] The database includes hashes for millions of items. [11]

In December 2014, Microsoft made PhotoDNA available to qualified organizations in a software as a service model for free through the Azure Marketplace. [12]

In the 2010s and 2020s, PhotoDNA was put forward in connection with policy proposals relating to content moderation and internet censorship, [13] including US Senate hearings (2019 on "digital responsibility", [2] 2022 on the EARN IT Act [14] ) and various proposals by the European Commission dubbed "upload filters" by civil society [15] [16] such as so-called voluntary codes (in 2016 [17] on hate speech [18] after 2015 events, 2018 [19] and 2022 [20] on disinformation), copyright legislation (chiefly the 2019 copyright directive debated between 2014 [21] and 2021 [22] ), terrorism-related regulations (TERREG) [23] and internet wiretapping regulations (2021 "chat control"). [24]

In 2016, Hany Farid proposed to extend usage of the technology to terrorism-related content. [25] In December 2016, Facebook, Twitter, Google and Microsoft announced plans to use PhotoDNA to remove extremist content such as terrorist recruitment videos or violent terrorist imagery. [26] In 2018 Facebook stated that PhotoDNA was used to automatically remove al-Qaeda videos. [13]

By 2019, big tech companies including Microsoft, Facebook and Google publicly announced that since 2017 they were running the GIFCT as a shared database of content to be automatically censored. [2] As of 2021, Apple was thought to be using NeuralHash for similar purposes. [27]

In 2022, The New York Times covered the story of two dads whose Google accounts were closed after photos they took of their child for medical purposes were automatically uploaded to Google's servers. [28] The article compares PhotoDNA, which requires a database of known hashes, with Google's AI-based technology, which can recognize previously unseen exploitative images. [29] [30]

Usage

Microsoft originally used PhotoDNA on its own services including Bing and OneDrive. [31] As of 2022, PhotoDNA was widely used by online service providers for their content moderation efforts [10] [32] [33] including Google's Gmail, Twitter, [34] Facebook, [35] Adobe Systems, [36] Reddit, [37] and Discord. [38]

The UK Internet Watch Foundation, which has been compiling a reference database of PhotoDNA signatures, reportedly had over 300,000 hashes of known child sexual exploitation materials.[ citation needed ] Another source of the database was the National Center for Missing & Exploited Children (NCMEC). [39] [40]

PhotoDNA is widely used to remove content, [2] disable accounts, and report people. [7]

Inverting

In 2021, Anish Athalye was able to partially invert PhotoDNA hashes with a neural network, which raises concerns about the reversibility of a PhotoDNA hash. [41]

See also

Related Research Articles

The National Center for Missing & Exploited Children (NCMEC) is a private, nonprofit organization established in 1984 by the United States Congress. In September 2013, the United States House of Representatives, United States Senate, and the President of the United States reauthorized the allocation of $40 million in funding for the organization as part of Missing Children's Assistance Reauthorization Act of 2013. The current chair of the organization is Jon Grosso of Kohl's. NCMEC handles cases of missing minors from infancy to young adults through age 20.

The International Centre for Missing & Exploited Children (ICMEC), headquartered in Alexandria, Virginia, USA, with a regional presence in the United Kingdom, Europe, Turkey, Africa, Canada, Latin America, Caribbean, Southeast Asia, India, Japan, South Korea, Taiwan and Australasia, is a private 501(c)(3) non-governmental, nonprofit global organization. It combats child sexual exploitation, child pornography, child trafficking and child abduction.

<span class="mw-page-title-main">Internet censorship</span> Legal control of the internet

Internet censorship is the legal control or suppression of what can be accessed, published, or viewed on the Internet. Censorship is most often applied to specific internet domains but exceptionally may extend to all Internet resources located outside the jurisdiction of the censoring state. Internet censorship may also put restrictions on what information can be made internet accessible. Organizations providing internet access – such as schools and libraries – may choose to preclude access to material that they consider undesirable, offensive, age-inappropriate or even illegal, and regard this as ethical behavior rather than censorship. Individuals and organizations may engage in self-censorship of material they publish, for moral, religious, or business reasons, to conform to societal norms, political views, due to intimidation, or out of fear of legal or other consequences.

Video fingerprinting or video hashing are a class of dimension reduction techniques in which a system identifies, extracts and then summarizes characteristic components of a video as a unique or a set of multiple perceptual hashes or fingerprints, enabling that video to be uniquely identified. This technology has proven to be effective at searching and comparing video files.

Internet censorship in the United Kingdom is conducted under a variety of laws, judicial processes, administrative regulations and voluntary arrangements. It is achieved by blocking access to sites as well as the use of laws that criminalise publication or possession of certain types of material. These include English defamation law, the Copyright law of the United Kingdom, regulations against incitement to terrorism and child pornography.

<span class="mw-page-title-main">Internet Watch Foundation</span> Registered charity in Cambridge, England

The Internet Watch Foundation (IWF) is a global registered charity based in Cambridge, England. It states that its remit is "to minimise the availability of online sexual abuse content, specifically child sexual abuse images and videos hosted anywhere in the world and non-photographic child sexual abuse images hosted in the UK." Content inciting racial hatred was removed from the IWF's remit after a police website was set up for the purpose in April 2011. The IWF used to also take reports of criminally obscene adult content hosted in the UK. This was removed from the IWF's remit in 2017. As part of its function, the IWF says that it will "supply partners with an accurate and current URL list to enable blocking of child sexual abuse content". It has "an excellent and responsive national Hotline reporting service" for receiving reports from the public. In addition to receiving referrals from the public, its agents also proactively search the open web and deep web to identify child sexual abuse images and videos. It can then ask service providers to take down the websites containing the images or to block them if they fall outside UK jurisdiction.

Internet censorship in New Zealand refers to the New Zealand Government's system for filtering website traffic to prevent Internet users from accessing certain selected sites and material. While there are many types of objectionable content under New Zealand law, the filter specifically targets content depicting the sexual abuse or exploitation of children and young persons. The Department of Internal Affairs runs the filtering system, dubbed the Digital Child Exploitation Filtering System (DCEFS). It is voluntary for Internet Service Providers (ISPs) to join.

Child pornography is erotic material that depicts persons under the designated age of majority. The precise characteristics of what constitutes child pornography varies by criminal jurisdiction.

Hany Farid is an American university professor who specializes in the analysis of digital images and the detection of digitally manipulated images such as deepfakes. Farid served as Dean and Head of School for the UC Berkeley School of Information. In addition to teaching, writing, and conducting research, Farid acts as a consultant for non-profits, government agencies, and news organizations. He is the author of the book Photo Forensics (2016).

Kik Messenger, commonly called Kik, is a freeware instant messaging mobile app from the Canadian company Kik Interactive, available on iOS and Android operating systems.

Thorn: Digital Defenders of Children, previously known as DNA Foundation, is a nonprofit organization that builds technology to defend children from sexual abuse. Founded in 2012, the organization creates products and programs to empower the platforms and people who have the ability to defend children.

The child abuse image content list is a list of URLs and image hashes provided by the Internet Watch Foundation to its partners to enable the blocking of child pornography & criminally obscene adult content in the UK and by major international technology companies.

Perceptual hashing is the use of a fingerprinting algorithm that produces a snippet, hash, or fingerprint of various forms of multimedia. A perceptual hash is a type of locality-sensitive hash, which is analogous if features of the multimedia are similar. This is in contrast to cryptographic hashing, which relies on the avalanche effect of a small change in input value creating a drastic change in output value. Perceptual hash functions are widely used in finding cases of online copyright infringement as well as in digital forensics because of the ability to have a correlation between hashes so similar data can be found.

<span class="mw-page-title-main">Counter Extremism Project</span> Nonprofit NGO that combats extremist groups

The Counter Extremism Project (CEP) is a non-profit non-governmental organization that combats extremist groups "by pressuring financial support networks, countering the narrative of extremists and their online recruitment, and advocating for strong laws, policies and regulations".

Big Tech, also known as the Tech Giants or Tech Titans, are the largest IT companies in the world. The concept of Big Tech is similar to the grouping of dominant companies in other sectors. It typically refers to the Big Five United States tech companies: Alphabet, Amazon, Apple, Meta, and Microsoft; or the Magnificent Seven, which includes Nvidia and Tesla. Big Tech can also include Chinese companies such as Baidu, Alibaba, Tencent, and Xiaomi (BATX).

OnlyFans is an internet content subscription service based in London, England. The service is used primarily by sex workers who produce pornography, but it also hosts the work of other content creators, such as physical fitness experts and musicians.

Deepfake pornography, or simply fake pornography, is a type of synthetic pornography that is created via altering already-existing photographs or video by applying deepfake technology to the images of the participants. The use of deepfake porn has sparked controversy because it involves the making and sharing of realistic videos featuring non-consenting individuals, typically female celebrities, and is sometimes used for revenge porn. Efforts are being made to combat these ethical concerns through legislation and technology-based solutions.

Cybersex trafficking, live streaming sexual abuse, webcam sex tourism/abuse or ICTs -facilitated sexual exploitation is a cybercrime involving sex trafficking and the live streaming of coerced sexual acts and/or rape on webcam.

The Global Internet Forum to Counter Terrorism (GIFCT) is an Internet industry initiative to share proprietary information and technology for automated content moderation.

<span class="mw-page-title-main">Regulation to Prevent and Combat Child Sexual Abuse</span> European Union regulation proposal on CSAM detection

The Regulation to Prevent and Combat Child Sexual Abuse is a European Union regulation proposed by the European Commissioner for Home Affairs Ylva Johansson on 11 May 2022. The stated aim of the legislation is to prevent child sexual abuse online through the implementation of a number of measures, including the establishment of a framework that would make the detection and reporting of child sexual abuse material (CSAM) by digital platforms – known by its critics as Chat Control – a legal requirement within the European Union.

References

  1. Douze, Matthijs; Tolias, Giorgos; Pizzi, Ed; Papakipos, Zoë; Chanussot, Lowik; Radenovic, Filip; Jenicek, Tomas; Maximov, Maxim; Leal-Taixé, Laura; Elezi, Ismail; Chum, Ondřej; Ferrer, Cristian Canton (February 21, 2022). "The 2021 Image Similarity Dataset and Challenge". arXiv: 2106.09672 [cs.CV]. Image fingerprints, such as PhotoDNA from Microsoft, are used throughout the industry to identify images that depict child exploitation and abuse
  2. 1 2 3 4 "The Rise of Content Cartels". knightcolumbia.org. February 11, 2020. Retrieved August 21, 2022.
  3. Hill, Kashmir (August 21, 2022). "A Dad Took Photos of His Naked Toddler for the Doctor. Google Flagged Him as a Criminal". The New York Times. ISSN   0362-4331 . Retrieved August 21, 2022.
  4. 1 2 "New Technology Fights Child Porn by Tracking Its "PhotoDNA"". Microsoft Corporation. December 15, 2009. Retrieved September 9, 2016.
  5. "Photo DNA: Step by step". Microsoft. Archived from the original on September 21, 2013. Retrieved February 11, 2014.
  6. "How PhotoDNA for Video is being used to fight online child exploitation". September 12, 2018.
  7. 1 2 "How PhotoDNA for Video is being used to fight online child exploitation". news.microsoft.com. September 12, 2018.
  8. Jackson, William (August 27, 2014). "Improved image analysis tools speed exploited children cases". GCN.
  9. 1 2 Clark, Liat (April 30, 2014). "Child abuse-tracking tech donated to the world". Wired UK.
  10. 1 2 "Microsoft's response to the consultation on the European Commission Communication on the Rights of the Child (2011–2014)" (PDF). Archived from the original (PDF) on October 24, 2017., European Commission
  11. Ward, Mark (March 23, 2014). "Cloud-based archive tool to help catch child abusers". BBC News.
  12. "PhotoDNA Cloud Service". Microsoft.com. Microsoft Corporation. Retrieved February 19, 2015.
  13. 1 2 Richard Allan (June 18, 2018). "Hearing at 11:14". in "The EU's horizontal regulatory framework for illegal content removal in the DSM".
  14. Thu; Szoka, Feb 10th 2022 03:30pm-Berin; Cohn, Ari (February 10, 2022). "The Top Ten Mistakes Senators Made During Today's EARN IT Markup". Techdirt. Retrieved August 21, 2022.{{cite web}}: CS1 maint: numeric names: authors list (link)
  15. Schmon, Christoph (June 3, 2021). "The EU Commission's Refusal to Let Go of Filters". Electronic Frontier Foundation. Retrieved August 21, 2022.
  16. "Upload filters: a danger to free internet content?". IONOS Digitalguide. March 28, 2019. Retrieved August 21, 2022.
  17. "Fighting illegal online hate speech: first assessment of the new code of conduct". ec.europa.eu. December 6, 2016. Retrieved August 21, 2022.
  18. "The EU Code of conduct on countering illegal hate speech online | European Commission". Ec.europa.eu. Retrieved August 29, 2022.
  19. "Code of Practice on Disinformation | Shaping Europe's digital future". September 26, 2018.
  20. "The 2022 Code of Practice on Disinformation | Shaping Europe's digital future". March 24, 2023.
  21. "Procedure File: 2014/2256(INI) | Legislative Observatory | European Parliament".
  22. COMMUNICATION FROM THE COMMISSION TO THE EUROPEAN PARLIAMENT AND THE COUNCIL Guidance on Article 17 of Directive 2019/790 on Copyright in the Digital Single Market
  23. "Terrorist content online".
  24. Reuter, Markus; Rudl, Tomas; Rau, Franziska; Hildebr, Holly. "Why chat control is so dangerous". European Digital Rights (EDRi). Retrieved August 21, 2022.
  25. Waddell, Kaveh (June 22, 2016). "A Tool to Delete Beheading Videos Before They Even Appear Online". The Atlantic. Retrieved September 10, 2016.
  26. "Partnering to Help Curb Spread of Online Terrorist Content | Facebook Newsroom" . Retrieved December 6, 2016.
  27. Abelson, Hal; Anderson, Ross; Bellovin, Steven M.; Benaloh, Josh; Blaze, Matt; Callas, Jon; Diffie, Whitfield; Landau, Susan; Neumann, Peter G.; Rivest, Ronald L.; Schiller, Jeffrey I.; Schneier, Bruce; Teague, Vanessa; Troncoso, Carmela (2024). "Bugs in our pockets: The risks of client-side scanning". Journal of Cybersecurity. 10. arXiv: 2110.07450 . doi:10.1093/cybsec/tyad020.
  28. Hill, Kashmir (August 21, 2022). "A Dad Took Photos of His Naked Toddler for the Doctor. Google Flagged Him as a Criminal". The New York Times. ISSN   0362-4331 . Retrieved August 21, 2022. A bigger breakthrough came along almost a decade later, in 2018, when Google developed an artificially intelligent tool that could recognize never-before-seen exploitative images of children. [...] When Mark's and Cassio's photos were automatically uploaded from their phones to Google's servers, this technology flagged them.
  29. "Google Flagged Parents' Photos of Sick Children as Sexual Abuse". Gizmodo. August 22, 2022. Retrieved August 28, 2022. According to Google, those incident reports come from multiple sources, not limited to the automated PhotoDNA tool.
  30. Roth, Emma (August 21, 2022). "Google AI flagged parents' accounts for potential abuse over nude photos of their sick kids". The Verge. Retrieved August 28, 2022. Google has used hash matching with Microsoft's PhotoDNA for scanning uploaded images to detect matches with known CSAM. [...] In 2018, Google announced the launch of its Content Safety API AI toolkit that can "proactively identify never-before-seen CSAM imagery so it can be reviewed and, if confirmed as CSAM, removed and reported as quickly as possible." It uses the tool for its own services and, along with a video-targeting CSAI Match hash matching solution developed by YouTube engineers, offers it for use by others as well.
  31. "Unfortunate Truths about Child Pornography and the Internet [Feature]". December 7, 2012.
  32. Eher, Reinhard; Craig, Leam A.; Miner, Michael H.; Pfäfflin, Friedemann (2011). International Perspectives on the Assessment and Treatment of Sexual Offenders: Theory, Practice and Research. John Wiley & Sons. p. 514. ISBN   978-1119996200.
  33. Lattanzi-Licht, Marcia; Doka, Kenneth (2004). Living with Grief: Coping with Public Tragedy. Routledge. p. 317. ISBN   1135941513.
  34. Arthur, Charles (July 22, 2013). "Twitter to introduce PhotoDNA system to block child abuse images". The Guardian. Retrieved July 22, 2013.
  35. Smith, Catharine (May 2, 2011). "Facebook Adopts Microsoft PhotoDNA To Remove Child Pornography". Huffington Post. Retrieved July 22, 2013.
  36. "Adobe & PhotoDNA". www.adobe.com. Retrieved August 27, 2021.
  37. "Reddit use PhotoDNA to prevent child pornography". March 19, 2020.
  38. "Discord Transparency Report: July — Dec 2020". Discord Blog. April 2, 2021. Retrieved May 8, 2022.
  39. "Microsoft tip led police to arrest man over child abuse images". The Guardian. August 7, 2014.
  40. Salcito, Anthony (December 17, 2009). "Microsoft donates PhotoDNA technology to make the Internet safer for kids" . Retrieved July 22, 2013.
  41. Athalye, Anish (December 20, 2021). "Inverting PhotoDNA".