Preamble is a U.S.-based AI safety startup founded in 2021. It provides tools and services to help companies securely deploy and manage large language models (LLMs). Preamble is known for its contributions to identifying and mitigating prompt injection attacks in LLMs.
Preamble is particularly notable for its early discovery of vulnerabilities in widely used AI models, such as GPT-3, with a primary discovery of the prompt injection attacks.[1][2][3] These findings were first reported privately to OpenAI in 2022 and have since been the subject of numerous studies in the field.
Preamble has entered a partnership with Nvidia to improve AI safety and risk mitigation for enterprises.[4] They are a part of the Air Force security program as a notable Pittsburgh AI hub.[5] Since 2024, Preamble has partnered with IBM to combine their guardrails with IBM Watsonx.[6]
Research
Preamble's research revolves around AI security, AI ethics, privacy and regulation. In May 2022, Preamble's researchers discovered vulnerabilities in GPT-3 which allowed malicious actors to manipulate the model's outputs through prompt injections.[7][3] The resulting paper investigated the vulnerability of large pre-trained language models, such as GPT-3 and BERT, to adversarial attacks. These attacks are designed to manipulate the models' outputs by introducing subtle perturbations in the input text, leading to incorrect or harmful outputs, such as generating hate speech or leaking sensitive information.[8]
↑ Rossi, Sippo; Michel, Alisia Marianne; Mukkamala, Raghava Rao; Thatcher, Jason Bennett (January 31, 2024). "An Early Categorization of Prompt Injection Attacks on Large Language Models". arXiv:2402.00898 [cs.CR].
↑ Rossi, Sippo; Michel, Alisia Marianne; Mukkamala, Raghava Rao; Thatcher, Jason Bennett (January 31, 2024). "An Early Categorization of Prompt Injection Attacks on Large Language Models". arXiv:2402.00898 [cs.CR].
↑ Branch, Hezekiah J.; Cefalu, Jonathan; McHugh, Jeremy; Heichman, Ron; Hujer, Leyla; del Castillo Iglesias, Daniel (2022). "Evaluating the Susceptibility of Pre-Trained Language Models via Handcrafted Adversarial Examples". arXiv:2209.02128 [cs.CL].
This page is based on this Wikipedia article Text is available under the CC BY-SA 4.0 license; additional terms may apply. Images, videos and audio are available under their respective licenses.