Text watermarking

Last updated

Text watermarking is a technique for embedding hidden information within textual content to verify its authenticity, origin, or ownership. [1] With the rise of generative AI systems using large language models (LLM), there has been significant development focused on watermarking AI-generated text. [2] Potential applications include detecting fake news and academic cheating, and excluding AI-generated material from LLM training data. [3] With LLMs the focus is on linguistic approaches that involve selecting words to form patterns within the text that can later be identified. [1] The results of the first reported large-scale public deployment, a trial using Google's Gemini chatbot, appeared in October 2024: users across 20 million responses found watermarked and unwatermarked text to be of equal quality. [3] Research on text watermarking began in 1997. [1]

See also

Related Research Articles

Music and artificial intelligence (AI) is the development of music software programs which use AI to generate music. As with applications in other fields, AI in music also simulates mental tasks. A prominent feature is the capability of an AI algorithm to learn based on past data, such as in computer accompaniment technology, wherein the AI is capable of listening to a human performer and performing accompaniment. Artificial intelligence also drives interactive composition technology, wherein a computer composes music in response to a live performance. There are other AI applications in music that cover not only music composition, production, and performance but also how music is marketed and consumed. Several music player programs have also been developed to use voice recognition and natural language processing technology for music voice control. Current research includes the application of AI in music composition, performance, theory and digital sound processing.

Google Brain was a deep learning artificial intelligence research team that served as the sole AI branch of Google before being incorporated under the newer umbrella of Google AI, a research division at Google dedicated to artificial intelligence. Formed in 2011, it combined open-ended machine learning research with information systems and large-scale computing resources. It created tools such as TensorFlow, which allow neural networks to be used by the public, and multiple internal AI research projects, and aimed to create research opportunities in machine learning and natural language processing. It was merged into former Google sister company DeepMind to form Google DeepMind in April 2023.

DeepMind Technologies Limited, also known by its trade name Google DeepMind, is a British-American artificial intelligence research laboratory which serves as a subsidiary of Google. Founded in the UK in 2010, it was acquired by Google in 2014 and merged with Google AI's Google Brain division to become Google DeepMind in April 2023. The company is based in London, with research centres in Canada, France, Germany, and the United States.

<span class="mw-page-title-main">Braina</span> Intelligent personal assistant & dictation software

Braina is a virtual assistant and speech-to-text dictation application for Microsoft Windows developed by Brainasoft. Braina uses natural language interface, speech synthesis, and speech recognition technology to interact with its users and allows them to use natural language sentences to perform various tasks on a computer. The name Braina is a short form of "Brain Artificial".

Neural machine translation (NMT) is an approach to machine translation that uses an artificial neural network to predict the likelihood of a sequence of words, typically modeling entire sentences in a single integrated model.

The artificial intelligenceindustry in China is a rapidly developing multi-billion dollar industry. The roots of China's AI development started in the late 1970s following Deng Xiaoping's economic reforms emphasizing science and technology as the country's primary productive force.

Artificial intelligence is used in Wikipedia and other Wikimedia projects for the purpose of developing those projects. Human and bot interaction in Wikimedia projects is routine and iterative.

<span class="mw-page-title-main">Knowledge graph</span> Type of knowledge base

In knowledge representation and reasoning, a knowledge graph is a knowledge base that uses a graph-structured data model or topology to represent and operate on data. Knowledge graphs are often used to store interlinked descriptions of entities – objects, events, situations or abstract concepts – while also encoding the free-form semantics or relationships underlying these entities.

Prompt engineering is the process of structuring an instruction that can be interpreted and understood by a generative artificial intelligence (AI) model. A prompt is natural language text describing the task that an AI should perform. A prompt for a text-to-text language model can be a query such as "what is Fermat's little theorem?", a command such as "write a poem in the style of Edgar Allan Poe about leaves falling", or a longer statement including context, instructions, and conversation history.

You.com is an AI assistant that began as a personalization-focused search engine. While still offering web search capabilities, You.com has evolved to prioritize a chat-first AI assistant.

<span class="mw-page-title-main">Text-to-video model</span> Machine learning model

A text-to-video model is a machine learning model that uses a natural language description as input to produce a video relevant to the input text. Advancements during the 2020s in the generation of high-quality, text-conditioned videos have largely been driven by the development of video diffusion models.

<span class="mw-page-title-main">ChatGPT</span> Chatbot developed by OpenAI

ChatGPT is a generative artificial intelligence (AI) chatbot developed by OpenAI and launched in 2022. It is based on the GPT-4o large language model (LLM). ChatGPT can generate human-like conversational responses, and enables users to refine and steer a conversation towards a desired length, format, style, level of detail, and language. It is credited with accelerating the AI boom, which has led to ongoing rapid investment in and public attention to the field of artificial intelligence. Some observers have raised concern about the potential of ChatGPT and similar programs to displace human intelligence, enable plagiarism, or fuel misinformation.

<span class="mw-page-title-main">Hallucination (artificial intelligence)</span> Erroneous material generated by AI

In the field of artificial intelligence (AI), a hallucination or artificial hallucination is a response generated by AI that contains false or misleading information presented as fact. This term draws a loose analogy with human psychology, where hallucination typically involves false percepts. However, there is a key difference: AI hallucination is associated with erroneous responses rather than perceptual experiences.

<span class="mw-page-title-main">Generative pre-trained transformer</span> Type of large language model

A generative pre-trained transformer (GPT) is a type of large language model (LLM) and a prominent framework for generative artificial intelligence. It is an artificial neural network that is used in natural language processing by machines. It is based on the transformer deep learning architecture, pre-trained on large data sets of unlabeled text, and able to generate novel human-like content. As of 2023, most LLMs had these characteristics and are sometimes referred to broadly as GPTs.

A large language model (LLM) is a type of computational model designed for natural language processing tasks such as language generation. As language models, LLMs acquire these abilities by learning statistical relationships from vast amounts of text during a self-supervised and semi-supervised training process.

<span class="mw-page-title-main">Generative artificial intelligence</span> AI system capable of generating content in response to prompts

Generative artificial intelligence is a subset of artificial intelligence that uses generative models to produce text, images, videos, or other forms of data. These models often generate output in response to specific prompts. Generative AI systems learn the underlying patterns and structures of their training data, enabling them to create new data.

In the 2020s, the rapid advancement of deep learning-based generative artificial intelligence models are raising questions about whether copyright infringement occurs when the generative AI is trained or used. This includes text-to-image models such as Stable Diffusion and large language models such as ChatGPT. As of 2023, there are several pending U.S. lawsuits challenging the use of copyrighted data to train AI models, with defendants arguing that this falls under fair use.

In machine learning, the term stochastic parrot is a metaphor to describe the theory that large language models, though able to generate plausible language, do not understand the meaning of the language they process. The term was coined by Emily M. Bender in the 2021 artificial intelligence research paper "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜" by Bender, Timnit Gebru, Angelina McMillan-Major, and Margaret Mitchell.

Artificial intelligence detection software aims to determine whether some content was generated using artificial intelligence (AI).

Artificial intelligence or Ai is a broad “skewer” term that has specific areas of study clustered next to it, including machine learning, natural language processing, the philosophy of artificial intelligence, autonomous robots and TESCREAL.Ai in education (aied) also has a variety of areas of research, skewered together. Including anthropomorphism, generative artificial intelligence, data-driven decision-making, ai ethics, classroom surveillance, data-privacy and Ai Literacy.

References

  1. 1 2 3 Kamaruddin, Nurul Shamimi; Kamsin, Amirrudin; Por, Lip Yee; Rahman, Hameedur (2018). "A Review of Text Watermarking: Theory, Methods, and Applications". IEEE Access . 6: 8011–8028. doi:10.1109/ACCESS.2018.2796585. ISSN   2169-3536.
  2. Liu, Aiwei; Pan, Leyi; Lu, Yijian; Li, Jingjing; Hu, Xuming; Zhang, Xi; Wen, Lijie; King, Irwin; Xiong, Hui; Yu, Philip (2024-09-03). "A Survey of Text Watermarking in the Era of Large Language Models". ACM Computing Surveys . doi:10.1145/3691626. ISSN   0360-0300.
  3. 1 2 Gibney, Elizabeth (Oct 23, 2024). "Google unveils invisible 'watermark' for AI-generated text". Nature . Retrieved Oct 26, 2024.