Artificial intelligence and copyright

Last updated

In the 2020s, the rapid advancement of deep learning-based generative artificial intelligence models are raising questions about whether copyright infringement occurs when the generative AI is trained or used. This includes text-to-image models such as Stable Diffusion and large language models such as ChatGPT. As of 2023, there are several pending US lawsuits challenging the use of copyrighted data to train AI models, with defendants arguing that this falls under fair use. [1]

Contents

Popular deep learning models are trained on mass amounts of media scraped from the Internet, often utilizing copyrighted material. [2] When assembling training data, the sourcing of copyrighted works may infringe on the copyright holder's exclusive right to control reproduction, unless covered by exceptions in relevant copyright laws. Additionally, using a model's outputs might violate copyright, and the model creator could be accused of vicarious liability and held responsible for that copyright infringement.

The United States Copyright Office has declared that works not created by a human author, like this "selfie" portrait taken by a monkey, are not eligible for copyright protection. Macaca nigra self-portrait large.jpg
The United States Copyright Office has declared that works not created by a human author, like this "selfie" portrait taken by a monkey, are not eligible for copyright protection.

Since most legal jurisdictions only grant copyright to original works of authorship by human authors, the definition of "originality" is central to the copyright status of AI-generated works. [3]

United States

In the US, the Copyright Act protects "original works of authorship". [4] The U.S. Copyright Office has interpreted this as being limited to works "created by a human being", [4] declining to grant copyright to works generated without human intervention. [5] Some have suggested that certain AI generations might be copyrightable in the US and similar jurisdictions if it can be shown that the human who ran the AI program exercised sufficient originality in selecting the inputs to the AI or editing the AI's output. [5] [4]

Proponents of this view suggest that an AI model may be viewed as merely a tool (akin to a pen or a camera) used by its human operator to express their creative vision. [4] [6] For example, proponents argue that if the standard of originality can be satisfied by an artist clicking the shutter button on a camera, then perhaps artists using generative AI should get similar deference, especially if they go through multiple rounds of revision to refine their prompts to the AI. [7] Other proponents argue that the Copyright Office is not taking a technology neutral approach to the use of AI or algorithmic tools. For other creative expressions (music, photography, writing) the test is effectively whether there is de minimis, or limited human creativity. For works using AI tools, the Copyright Office has made the test a different one i.e. whether there is no more than de minimis technological involvement. [8]

Theatre D'opera Spatial, 2022, created using Midjourney, prompted by Jason M. Allen Theatre D'opera Spatial.png
Théâtre D'opéra Spatial , 2022, created using Midjourney, prompted by Jason M. Allen

This difference in approach can be seen in the recent decision in respect of a registration claim by Jason Matthew Allen for his work Théâtre D'opéra Spatial created using Midjourney and an upscaling tool. The Copyright Office stated:

The Board finds that the Work contains more than a de minimis amount of content generated by artificial intelligence ("AI"), and this content must therefore be disclaimed in an application for registration. Because Mr. Allen is unwilling to disclaim the AI-generated material, the Work cannot be registered as submitted. [9]

As AI is increasingly used to generate literature, music, and other forms of art, the US Copyright Office has released new guidance emphasizing whether works, including materials generated by artificial intelligence, exhibit a 'mechanical reproduction' nature or are the 'manifestation of the author's own creative conception'. [10] The US Copyright Office published a Rule in March 2023 on a range of issues related to the use of AI, where they stated:

...because the Office receives roughly half a million applications for registration each year, it sees new trends in registration activity that may require modifying or expanding the information required to be disclosed on an application.

One such recent development is the use of sophisticated artificial intelligence ("AI") technologies capable of producing expressive material. These technologies "train" on vast quantities of preexisting human-authored works and use inferences from that training to generate new content. Some systems operate in response to a user's textual instruction, called a "prompt." 

The resulting output may be textual, visual, or audio, and is determined by the AI based on its design and the material it has been trained on. These technologies, often described as "generative AI," raise questions about whether the material they produce is protected by copyright, whether works consisting of both human-authored and AI-generated material may be registered, and what information should be provided to the Office by applicants seeking to register them. [11]

United Kingdom

Other jurisdictions include explicit statutory language related to computer-generated works, including the United Kingdom's Copyright, Designs and Patents Act 1988, which states:

In the case of a literary, dramatic, musical or artistic work which is computer-generated, the author shall be taken to be the person by whom the arrangements necessary for the creation of the work are undertaken. [6]

However, the computer generated work law under UK law relates to autonomous creations by computer programs. Individuals using AI tools will usually be the authors of the works assuming they meet the minimum requirements for copyright work. The language used for computer generated work relates, in respect of AI, to the ability of the human programmers to have copyright in the autonomous productions of the AI tools (i.e. where there is no direct human input):

In so far as each composite frame is a computer generated work then the arrangements necessary for the creation of the work were undertaken by Mr Jones because he devised the appearance of the various elements of the game and the rules and logic by which each frame is generated and he wrote the relevant computer program. In these circumstances I am satisfied that Mr Jones is the person by whom the arrangements necessary for the creation of the works were undertaken and therefore is deemed to be the author by virtue of s.9(3) [12]

The UK government has consulted on the use of generative tools and AI in respect of intellectual property leading to a proposed specialist Code of Practice: [13] "to provide guidance to support AI firms to access copyrighted work as an input to their models, whilst ensuring there are protections on generated output to support right holders of copyrighted work". [14] The US Copyright Office recently published a Notice of inquiry and request for comments following its 2023 Registration Guidance. [15]

China

On November 27, 2023, the Beijing Internet Court issued a decision recognizing copyright in AI-generated images in a litigation. [16]

As noted by a lawyer and AI art creator, the challenge for intellectual property regulators, legislators and the courts is how to protect human creativity in a technologically neutral fashion whilst considering the risks of automated AI factories. AI tools have the ability to autonomously create a range of material that is potentially subject to copyright (music, blogs, poetry, images, and technical papers) or other intellectual property rights (patents and design rights). This represents an unprecedented challenge to existing intellectual property regimes. [8]

Training AI with copyrighted data

Deep learning models source large data sets from the Internet such as publicly available images and the text of web pages. The text and images are then converted into numeric formats the AI can analyze. A deep learning model identifies patterns linking the encoded text and image data and learns which text concepts correspond to elements in images. Through repetitive testing, the model refines its accuracy by matching images to text descriptions. The trained model undergoes validation to evaluate its skill in generating or manipulating new images using only the text prompts provided after the training process. [17] Because assembling these training datasets involves making copies of copyrighted works, this has raised the question of whether this process infringes the copyright holder's exclusive right to make reproductions of their works.

US machine learning developers have traditionally believed this to be allowable under fair use because using copyrighted work is transformative, and limited. [18] The situation has been compared to Google Books's scanning of copyrighted books in Authors Guild, Inc. v. Google, Inc. , which was ultimately found to be fair use, because the scanned content was not made publicly available, and the use was non-expressive. [19]

Timothy B. Lee, in Ars Technica , argues that if the plaintiffs succeed, this may shift the balance of power in favour of large corporations such as Google, Microsoft, and Meta which can afford to license large amounts of training data from copyright holders and leverage their proprietary datasets of user-generated data. [20] IP scholars Bryan Casey and Mark Lemley argue in the Texas Law Review that datasets are so large that "there is no plausible option simply to license all of the (data)... allowing (any generative training) copyright claim is tantamount to saying, not that copyright owners will get paid, but that the use won't be permitted at all." [21] Other scholars disagree; some predict a similar outcome to the US music licensing procedures. [18]

Several jurisdictions have explicitly incorporated exceptions allowing for "text and data mining" (TDM) in their copyright statutes including the United Kingdom, Germany, Japan, and the EU. Unlike the EU, the United Kingdom prohibits data mining for commercial purposes but has proposed this should be changed to support the development of AI: "For text and data mining, we plan to introduce a new copyright and database exception which allows TDM for any purpose. Rights holders will still have safeguards to protect their content, including a requirement for lawful access." [22] As of June 2023, a clause in the draft EU AI Act would require generative AI to "make available summaries of the copyrighted material that was used to train their systems". [23]

Anne Graham Lotz (October 2008).jpg
A photograph of Anne Graham Lotz included in Stable Diffusion's training set
Ann graham lotz stable diffusion.webp
An image generated by Stable Diffusion using the prompt "Anne Graham Lotz"
Generative AI models may produce outputs that are virtually identical to images from their training set. The research paper from which this example was taken was able to produce similar replications for only 0.03% of training images. [24]
An image generated by Stable Diffusion using the prompt "an astronaut riding a horse, by Picasso". Generative image models are adept at imitating the visual style of particular artists in their training set. An astronaut riding a horse (Picasso) 2022-08-28.png
An image generated by Stable Diffusion using the prompt "an astronaut riding a horse, by Picasso". Generative image models are adept at imitating the visual style of particular artists in their training set.

In some cases, deep learning models may "memorize" specific details of items in their training set, and replicate them when generating new content, such that the outputs may be considered copyright infringement. This behaviour is generally considered a form of overfitting by AI developers and it is uncertain how prevalent this behaviour is in current systems.

OpenAI argued that "well-constructed AI systems generally do not regenerate, in any nontrivial portion, unaltered data from any particular work in their training corpus". [4] Under US law, to prove that an AI output infringes a copyright, a plaintiff must show the copyrighted work was "actually copied", meaning that the AI generates output which is "substantially similar" to their work, and that the AI had access to their work. [4]

In the course of learning to statistically model the data on which they are trained, deep generative AI models may learn to imitate the distinct style of particular authors in the training set. Since fictional characters enjoy some copyright protection in the US and other jurisdictions, an AI may also produce infringing content in the form of novel works which incorporate fictional characters. [4] [24]

A generative image model such as Stable Diffusion is able to model the stylistic characteristics of an artist like Pablo Picasso (including his particular brush strokes, use of colour, perspective, and so on), and a user can engineer a prompt such as "an astronaut riding a horse, by Picasso" to cause the model to generate a novel image applying the artist's style to an arbitrary subject. However, an artist's overall style is generally not subject to copyright protection. [4]

Litigation

Related Research Articles

Getty Images Holdings, Inc. is an American visual media company and supplier of stock images, editorial photography, video, and music for business and consumers, with a library of over 477 million assets. It targets three markets—creative professionals, the media, and corporate.

Royalty-free (RF) material subject to copyright or other intellectual property rights may be used without the need to pay royalties or license fees for each use, per each copy or volume sold or some time period of use or sales.

Anthropic PBC is a U.S.-based artificial intelligence (AI) startup company, founded in 2021, researching artificial intelligence as a public-benefit company to develop AI systems to “study their safety properties at the technological frontier” and use this research to deploy safe, reliable models for the public. Anthropic has developed a family of large language models (LLMs) named Claude as a competitor to OpenAI’s ChatGPT and Google’s Gemini.

Music and artificial intelligence is the development of music software programs which use AI to generate music. As with applications in other fields, AI in music also simulates mental tasks. A prominent feature is the capability of an AI algorithm to learn based on past data, such as in computer accompaniment technology, wherein the AI is capable of listening to a human performer and performing accompaniment. Artificial intelligence also drives interactive composition technology, wherein a computer composes music in response to a live performance. There are other AI applications in music that cover not only music composition, production, and performance but also how music is marketed and consumed. Several music player programs have also been developed to use voice recognition and natural language processing technology for music voice control. Current research includes the application of AI in music composition, performance, theory and digital sound processing.

<span class="mw-page-title-main">OpenAI</span> Artificial intelligence research organization

OpenAI is a U.S.-based artificial intelligence (AI) research organization founded in December 2015, researching artificial intelligence with the goal of developing "safe and beneficial" artificial general intelligence, which it defines as "highly autonomous systems that outperform humans at most economically valuable work". As one of the leading organizations of the AI boom, it has developed several large language models, advanced image generation models, and previously, released open-source models. Its release of ChatGPT has been credited with starting the AI boom.

<span class="mw-page-title-main">Artificial intelligence art</span> Machine application of knowledge of human aesthetic expressions

Artificial intelligence art is any visual artwork created through the use of artificial intelligence (AI) programs such as text-to-image models.

<span class="mw-page-title-main">DALL-E</span> Image-generating deep-learning model

DALL·E, DALL·E 2, and DALL·E 3 are text-to-image models developed by OpenAI using deep learning methodologies to generate digital images from natural language descriptions, called "prompts."

GitHub Copilot is a code completion tool developed by GitHub and OpenAI that assists users of Visual Studio Code, Visual Studio, Neovim, and JetBrains integrated development environments (IDEs) by autocompleting code. Currently available by subscription to individual developers and to businesses, the generative artificial intelligence software was first announced by GitHub on 29 June 2021, and works best for users coding in Python, JavaScript, TypeScript, Ruby, and Go. In March 2023 GitHub announced plans for "Copilot X", which will incorporate a chatbot based on GPT-4, as well as support for voice commands, into Copilot.

<span class="mw-page-title-main">Midjourney</span> Image-generating machine learning model

Midjourney is a generative artificial intelligence program and service created and hosted by the San Francisco–based independent research lab Midjourney, Inc. Midjourney generates images from natural language descriptions, called prompts, similar to OpenAI's DALL-E and Stability AI's Stable Diffusion. It is one of the technologies of the AI boom.

<span class="mw-page-title-main">Stable Diffusion</span> Image-generating machine learning model

Stable Diffusion is a deep learning, text-to-image model released in 2022 based on diffusion techniques. It is considered to be a part of the ongoing AI boom.

<span class="mw-page-title-main">Text-to-image model</span> Machine learning model

A text-to-image model is a machine learning model which takes an input natural language description and produces an image matching that description.

<span class="mw-page-title-main">LAION</span> Non-profit German artificial intelligence organization

LAION is a German non-profit which makes open-sourced artificial intelligence models and datasets. It is best known for releasing a number of large datasets of images and captions scraped from the web which have been used to train a number of high-profile text-to-image models, including Stable Diffusion and Imagen.

<span class="mw-page-title-main">ChatGPT</span> Chatbot developed by OpenAI

ChatGPT is a chatbot developed by OpenAI and launched on November 30, 2022. Based on a large language model, it enables users to refine and steer a conversation towards a desired length, format, style, level of detail, and language. Successive prompts and replies, known as prompt engineering, are considered at each conversation stage as a context.

Prisma Labs is a software company based in Sunnyvale, California that is known for developing Prisma and Lensa.

<span class="mw-page-title-main">Hallucination (artificial intelligence)</span> Confident unjustified claim by AI

In the field of artificial intelligence (AI), a hallucination or artificial hallucination is a response generated by AI which contains false or misleading information presented as fact. This term draws a loose analogy with human psychology, where hallucination typically involves false percepts. However, there’s a key difference: AI hallucination is associated with unjustified responses or beliefs rather than perceptual experiences.

<span class="mw-page-title-main">Generative pre-trained transformer</span> Type of large language model

Generative pre-trained transformers (GPT) are a type of large language model (LLM) and a prominent framework for generative artificial intelligence. They are artificial neural networks that are used in natural language processing tasks. GPTs are based on the transformer architecture, pre-trained on large data sets of unlabelled text, and able to generate novel human-like content. As of 2023, most LLMs have these characteristics and are sometimes referred to broadly as GPTs.

<span class="mw-page-title-main">Generative artificial intelligence</span> AI system capable of generating content in response to prompts

Generative artificial intelligence is artificial intelligence capable of generating text, images, videos, or other data using generative models, often in response to prompts. Generative AI models learn the patterns and structure of their input training data and then generate new data that has similar characteristics.

<span class="mw-page-title-main">AI boom</span> Rapid progress in artificial intelligence

The AI boom, or AI spring, is the ongoing period of rapid progress in the field of artificial intelligence (AI). Prominent examples include protein folding prediction led by Google DeepMind and generative AI led by OpenAI.

<i>Théâtre Dopéra Spatial</i> 2022 artwork by Midjourney, prompted by Jason M. Allen

Théâtre D'opéra Spatial is an image created by the generative artificial intelligence platform Midjourney, using prompts by Jason Michael Allen. The image won the 2022 Colorado State Fair's annual fine art competition in the photomanipulation category on September 5, becoming one of the first AI-generated images to win such a prize.

<span class="mw-page-title-main">Kelly McKernan</span> American artist (born 1986)

Kelly McKernan is an American artist based in Nashville, Tennessee.

References

  1. "Artificial Intelligence Copyright Challenges in US Courts Surge". www.natlawreview.com. Retrieved 2024-03-19.
  2. "Primer: Training AI Models with Copyrighted Work". AAF. Retrieved 2024-03-19.
  3. "What is the Copyright Status of AI Generated Works?". www.linkedin.com. Retrieved 2024-03-19.
  4. 1 2 3 4 5 6 7 8 Zirpoli, Christopher T. (24 February 2023). "Generative Artificial Intelligence and Copyright Law". Congressional Research Service.
  5. 1 2 Vincent, James (15 November 2022). "The scary truth about AI copyright is nobody knows what will happen next". The Verge.
  6. 1 2 Guadamuz, Andres (October 2017). "Artificial intelligence and copyright". WIPO Magazine.
  7. "Popular A.I. services for creating images are legal minefields for artists seeking payment for their work". Fortune. 2023. Retrieved 21 June 2023.
  8. 1 2 Peter Pink-Howitt, Copyright, AI And Generative Art, Ramparts, 2023.
  9. Second Request for Reconsideration for Refusal to Register Théâtre D'opéra Spatial (SR # 1-11743923581; Correspondence ID: 1-5T5320R, 2023).
  10. "Federal Register :: Request Access". unblock.federalregister.gov. Retrieved 2024-03-20.
  11. Copyright Registration Guidance: Works Containing Material Generated by Artificial Intelligence, US Copyright Office 2023.
  12. Nova Production v MazoomaGames [2006] EWHC 24 (Ch).
  13. The UK government's code of practice on copyright and AI. UK Government 2023.
  14. Artificial Intelligence and Intellectual Property: copyright and patents: Government response to consultation. UK Government 2023.
  15. https://www.govinfo.gov/content/pkg/FR-2023-08-30/pdf/2023-18624.pdf
  16. Aaron Wininger (2023-11-29). "Beijing Internet Court Recognizes Copyright in AI-Generated Images". The National Law Review.
  17. Takyar, Akash (2023-11-07). "Model validation techniques in machine learning". LeewayHertz - AI Development Company. Retrieved 2024-03-20.
  18. 1 2 Vincent, James (2022-11-15). "The scary truth about AI copyright is nobody knows what will happen next". The Verge. Retrieved 2024-03-20.
  19. Lee, Timothy B. (2023-04-03). "Stable Diffusion copyright lawsuits could be a legal earthquake for AI". Ars Technica. Retrieved 2024-03-20.
  20. Lee, Timothy B. (2023-04-03). "Stable Diffusion copyright lawsuits could be a legal earthquake for AI". Ars Technica. Retrieved 2024-03-20.
  21. Lemley, Mark A.; Casey, Bryan (2020). "Fair Learning". SSRN Electronic Journal. doi:10.2139/ssrn.3528447. ISSN   1556-5068.
  22. "Artificial Intelligence and Intellectual Property: copyright and patents: Government response to consultation". GOV.UK. Retrieved 2024-03-20.
  23. Mitsunaga, Takuho (2023-10-09). "Heuristic Analysis for Security, Privacy and Bias of Text Generative AI: GhatGPT-3.5 case as of June 2023". 2023 IEEE International Conference on Computing (ICOCO). IEEE. doi:10.1109/icoco59262.2023.10397858.
  24. 1 2 Lee, Timothy B. (3 April 2023). "Stable Diffusion copyright lawsuits could be a legal earthquake for AI". Ars Technica.
  25. Vincent, James (2022-11-08). "The lawsuit that could rewrite the rules of AI copyright". The Verge. Retrieved 2022-12-07.
  26. James Vincent "AI art tools Stable Diffusion and Midjourney targeted with copyright lawsuit" The Verge, 16 January, 2023.
  27. 1 2 Edwards, Benj (16 January 2023). "Artists file class-action lawsuit against AI image generator companies". Ars Technica.
  28. Brittain, Blake (2023-07-19). "US judge finds flaws in artists' lawsuit against AI companies". Reuters. Retrieved 2023-08-06.
  29. Korn, Jennifer (2023-01-17). "Getty Images suing the makers of popular AI art tool for allegedly stealing photos". CNN. Retrieved 2023-01-22.
  30. "Getty Images Statement". newsroom.gettyimages.com/. 17 January 2023. Retrieved 24 January 2023.
  31. Belanger, Ashley (6 February 2023). "Getty sues Stability AI for copying 12M photos and imitating famous watermark". Ars Technica.
  32. "The copyright battles against OpenAI have begun". 6 July 2023.
  33. Kris, Jimmy (2023-07-06). "OpenAI faces copyright lawsuit from authors Mona Awad and Paul Tremblay". DailyAi. Retrieved 2023-07-10.
  34. "'Thaler v. Perlmutter': AI Output is Not Copyrightable". New York Law Journal. Retrieved 2023-12-01.
  35. 1 2 Valinasab, Omid, "Big Data Analytics to Automate Patent Disclosure of Artificial Intelligence’s Inventions." U.S.F. Intell. Prop. & Tech. L.J. 133 (2023).
  36. https://arstechnica.com/information-technology/2024/02/us-says-ai-models-cant-hold-patents/
  37. "New bill would force AI companies to reveal use of copyrighted art | Artificial intelligence (AI) | The Guardian". amp.theguardian.com. Retrieved 2024-04-13.