Preamble (company)

Preamble, C-corp
Company type	Privately held company
Industry	Artificial intelligence
Founded	2021;3 years ago
Founders	Jonathan Cefalu; Jeremy McHugh;
Headquarters	Pittsburgh, Pennsylvania, U.S.
Website	preamble.com

Last updated December 08, 2024

Preamble is a U.S.-based AI safety startup founded in 2021. It provides tools and services to help companies securely deploy and manage large language models (LLMs). Preamble is known for its contributions to identifying and mitigating prompt injection attacks in LLMs.

History

Preamble is particularly notable for its early discovery of vulnerabilities in widely used AI models, such as GPT-3, with a primary discovery of the prompt injection attacks.^[1]^[2]^[3] These findings were first reported privately to OpenAI in 2022 and have since been the subject of numerous studies in the field.

Preamble has entered a partnership with Nvidia to improve AI safety and risk mitigation for enterprises.^[4] They are a part of the Air Force security program as a notable Pittsburgh AI hub.^[5] Since 2024, Preamble has partnered with IBM to combine their guardrails with IBM Watsonx.^[6]

Research

Preamble's research revolves around AI security, AI ethics, privacy and policy regulations. In May 2022, Preamble's researchers discovered vulnerabilities in GPT-3 which allowed malicious actors to manipulate the model's outputs through prompt injections.^[7]^[3] The resulting paper investigated the vulnerability of large pre-trained language models, such as GPT-3 and BERT, to adversarial attacks. These attacks are designed to manipulate the models' outputs by introducing subtle perturbations in the input text, leading to incorrect or harmful outputs, such as generating hate speech or leaking sensitive information.^[8]

Preamble was granted a patent by the United States Patent and Trademark Office to mitigate prompt injection in AI models.^[9]

Related Research Articles

OpenAI is an American artificial intelligence (AI) research organization founded in December 2015 and headquartered in San Francisco, California. Its stated mission is to develop "safe and beneficial" artificial general intelligence (AGI), which it defines as "highly autonomous systems that outperform humans at most economically valuable work". As a leading organization in the ongoing AI boom, OpenAI is known for the GPT family of large language models, the DALL-E series of text-to-image models, and a text-to-video model named Sora. Its release of ChatGPT in November 2022 has been credited with catalyzing widespread interest in generative AI.

DALL-E, DALL-E 2, and DALL-E 3 are text-to-image models developed by OpenAI using deep learning methodologies to generate digital images from natural language descriptions known as "prompts".

Prompt engineering is the process of structuring an instruction that can be interpreted and understood by a generative artificial intelligence (AI) model. A prompt is natural language text describing the task that an AI should perform. A prompt for a text-to-text language model can be a query such as "what is Fermat's little theorem?", a command such as "write a poem in the style of Edgar Allan Poe about leaves falling", or a longer statement including context, instructions, and conversation history.

Prompt injection is a family of related computer security exploits carried out by getting a machine learning model which was trained to follow human-given instructions to follow instructions provided by a malicious user. This stands in contrast to the intended operation of instruction-following systems, wherein the ML model is intended only to follow trusted instructions (prompts) provided by the ML model's operator.

ChatGPT is a generative artificial intelligence (AI) chatbot developed by OpenAI and launched in 2022. It is based on the GPT-4o large language model (LLM). ChatGPT can generate human-like conversational responses, and enables users to refine and steer a conversation towards a desired length, format, style, level of detail, and language. It is credited with accelerating the AI boom, which has led to ongoing rapid investment in and public attention to the field of artificial intelligence. Some observers have raised concern about the potential of ChatGPT and similar programs to displace human intelligence, enable plagiarism, or fuel misinformation.

<span class="mw-page-title-main">Hallucination (artificial intelligence)</span> Erroneous material generated by AI

In the field of artificial intelligence (AI), a hallucination or artificial hallucination is a response generated by AI that contains false or misleading information presented as fact. This term draws a loose analogy with human psychology, where hallucination typically involves false percepts. However, there is a key difference: AI hallucination is associated with erroneous responses rather than perceptual experiences.

Sparrow is a chatbot developed by the artificial intelligence research lab DeepMind, a subsidiary of Alphabet Inc. It is designed to answer users' questions correctly, while reducing the risk of unsafe and inappropriate answers. One motivation behind Sparrow is to address the problem of language models producing incorrect, biased or potentially harmful outputs. Sparrow is trained using human judgements, in order to be more “Helpful, Correct and Harmless” compared to baseline pre-trained language models. The development of Sparrow involved asking paid study participants to interact with Sparrow, and collecting their preferences to train a model of how useful an answer is.

Generative Pre-trained Transformer 4 (GPT-4) is a multimodal large language model created by OpenAI, and the fourth in its series of GPT foundation models. It was launched on March 14, 2023, and made publicly available via the paid chatbot product ChatGPT Plus, via OpenAI's API, and via the free chatbot Microsoft Copilot. As a transformer-based model, GPT-4 uses a paradigm where pre-training using both public data and "data licensed from third-party providers" is used to predict the next token. After this step, the model was then fine-tuned with reinforcement learning feedback from humans and AI for human alignment and policy compliance.

GPT-J or GPT-J-6B is an open-source large language model (LLM) developed by EleutherAI in 2021. As the name suggests, it is a generative pre-trained transformer model designed to produce human-like text that continues from a prompt. The optional "6B" in the name refers to the fact that it has 6 billion parameters.

A large language model (LLM) is a type of computational model designed for natural language processing tasks such as language generation. As language models, LLMs acquire these abilities by learning statistical relationships from vast amounts of text during a self-supervised and semi-supervised training process.

Generative artificial intelligence is a subset of artificial intelligence that uses generative models to produce text, images, videos, or other forms of data. These models learn the underlying patterns and structures of their training data and use them to produce new data based on the input, which often comes in the form of natural language prompts.

The AI boom, or AI spring, is an ongoing period of rapid progress in the field of artificial intelligence (AI) that started in the late 2010s before gaining international prominence in the early 2020s. Examples include protein folding prediction led by Google DeepMind as well as large language models and generative AI applications developed by OpenAI.

The Center for AI Safety (CAIS) is a nonprofit organization based in San Francisco, that promotes the safe development and deployment of artificial intelligence (AI). CAIS's work encompasses research in technical AI safety and AI ethics, advocacy, and support to grow the AI safety research field.

Artificial intelligence detection software aims to determine whether some content was generated using artificial intelligence (AI).

Watsonx is IBM's commercial generative AI and scientific data platform based on cloud. It offers a studio, data store, and governance toolkit. It supports multiple large language models (LLMs) along with IBM's own Granite.

Michael Karl Gschwind is an American computer scientist at Nvidia in Santa Clara, California. He is recognized for his seminal contributions to the design and exploitation of general-purpose programmable accelerators, as an early advocate of sustainability in computer design and as a prolific inventor.

IBM Granite is a series of decoder-only AI foundation models created by IBM. It was announced on September 7, 2023, and an initial paper was published 4 days later. Initially intended for use in the IBM's cloud-based data and generative AI platform Watsonx along with other models, IBM opened the source code of some code models. Granite models are trained on datasets curated from Internet, academic publishings, code datasets, legal and finance documents.

DBRX is an open-sourced large language model (LLM) developed by Mosaic ML team at Databricks, released on March 27, 2024. It is a mixture-of-experts Transformer model, with 132 billion parameters in total. 36 billion parameters are active for each token. The released model comes in either a base foundation model version or an instruct-tuned variant.

Nicholas Carlini is an American researcher affiliated with Google DeepMind who has published research in the fields of computer security and machine learning. He is known for his work on adversarial machine learning, particularly his work on the Carlini & Wagner attack in 2016. This attack was particularly useful in defeating defensive distillation, a method used to increase model robustness, and has since been effective against other defenses against adversarial input.

OpenAI o1 is a generative pre-trained transformer. A preview of o1 was released by OpenAI on September 12, 2024. o1 spends time "thinking" before it answers, making it better at complex reasoning tasks, science and programming than GPT-4o. The full version was released on December 5, 2024.

References

↑ Kosinski, Matthew; Forrest, Amber (March 21, 2024). "What is a prompt injection attack?". IBM.com.
↑ Rossi, Sippo; Michel, Alisia Marianne; Mukkamala, Raghava Rao; Thatcher, Jason Bennett (January 31, 2024). "An Early Categorization of Prompt Injection Attacks on Large Language Models". arXiv: 2402.00898 [cs.CR].
1 2 Rao, Abhinav Sukumar; Naik, Atharva Roshan; Vashistha, Sachin; Aditya, Somak; Choudhury, Monojit (2024). "Tricking LLMs into Disobedience: Formalizing, Analyzing, and Detecting Jailbreaks". In Calzolari, Nicoletta; Kan, Min-Yen; Hoste, Veronique; Lenci, Alessandro; Sakti, Sakriani; Xue, Nianwen (eds.). Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (PDF). Torino, Italia: ELRA and ICCL. pp. 16802–16830.
↑ Doughty, Nate (August 8, 2023). "Nvidia selects AI safety startup Preamble for its business development program". Pittsburgh Business Times. Retrieved August 15, 2024.
↑ Dabkowski, Jake (May 17, 2024). "Pittsburgh-area companies aim to make AI for businesses more secure". Pittsburgh Business Times. Retrieved August 15, 2024.
↑ "Watsonx technology partners". IBM.com. 2024.
↑ Rossi, Sippo; Michel, Alisia Marianne; Mukkamala, Raghava Rao; Thatcher, Jason Bennett (January 31, 2024). "An Early Categorization of Prompt Injection Attacks on Large Language Models". arXiv: 2402.00898 [cs.CR].
↑ Branch, Hezekiah J.; Cefalu, Jonathan; McHugh, Jeremy; Heichman, Ron; Hujer, Leyla; del Castillo Iglesias, Daniel. "Evaluating the Susceptibility of Pre-Trained Language Models via Handcrafted Adversarial Examples". arXiv: 2209.02128 .
↑ Dabkowski, Jake (October 20, 2024). "Preamble secures AI prompt injection patent". Pittsburgh Business Times.

External links

Official website

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Kosinski, Matthew; Forrest, Amber (March 21, 2024). "What is a prompt injection attack?". IBM.com.

[2] Rossi, Sippo; Michel, Alisia Marianne; Mukkamala, Raghava Rao; Thatcher, Jason Bennett (January 31, 2024). "An Early Categorization of Prompt Injection Attacks on Large Language Models". arXiv: 2402.00898 [cs.CR].

[TrickingLLMs-3] 1 2 Rao, Abhinav Sukumar; Naik, Atharva Roshan; Vashistha, Sachin; Aditya, Somak; Choudhury, Monojit (2024). "Tricking LLMs into Disobedience: Formalizing, Analyzing, and Detecting Jailbreaks". In Calzolari, Nicoletta; Kan, Min-Yen; Hoste, Veronique; Lenci, Alessandro; Sakti, Sakriani; Xue, Nianwen (eds.). Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (PDF). Torino, Italia: ELRA and ICCL. pp. 16802–16830.

[4] Doughty, Nate (August 8, 2023). "Nvidia selects AI safety startup Preamble for its business development program". Pittsburgh Business Times. Retrieved August 15, 2024.

[5] Dabkowski, Jake (May 17, 2024). "Pittsburgh-area companies aim to make AI for businesses more secure". Pittsburgh Business Times. Retrieved August 15, 2024.

[6] "Watsonx technology partners". IBM.com. 2024.

[7] Rossi, Sippo; Michel, Alisia Marianne; Mukkamala, Raghava Rao; Thatcher, Jason Bennett (January 31, 2024). "An Early Categorization of Prompt Injection Attacks on Large Language Models". arXiv: 2402.00898 [cs.CR].

[8] Branch, Hezekiah J.; Cefalu, Jonathan; McHugh, Jeremy; Heichman, Ron; Hujer, Leyla; del Castillo Iglesias, Daniel. "Evaluating the Susceptibility of Pre-Trained Language Models via Handcrafted Adversarial Examples". arXiv: 2209.02128 .

[9] Dabkowski, Jake (October 20, 2024). "Preamble secures AI prompt injection patent". Pittsburgh Business Times.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]