Nicholas Carlini

Nicholas Carlini
Nicholas Carlini
	Carlini in 2022
Alma mater	University of California, Berkeley (PhD)
	Scientific career
Fields	Computer Security
Institutions	Google DeepMind
Thesis	Evaluation and Design of Robust Neural Network Defenses (2018)
Doctoral advisor	David A. Wagner
Website	nicholas.carlini.com

Last updated January 07, 2025

Nicholas Carlini is an American researcher affiliated with Google DeepMind who has published research in the fields of computer security and machine learning. He is known for his work on adversarial machine learning, particularly his work on the Carlini & Wagner attack in 2016. This attack was particularly useful in defeating defensive distillation, a method used to increase model robustness, and has since been effective against other defenses against adversarial input.

In 2018, Carlini demonstrated an attack on Mozilla's DeepSpeech model, showing that hidden commands could be embedded in speech inputs, which the model would execute even if they were inaudible to humans. He also led a team at UC Berkeley that successfully broke seven out of eleven defenses against adversarial attacks presented at the 2018 International Conference on Learning Representations.

In addition to his work on adversarial attacks, Carlini has made significant contributions to understanding the privacy risks of machine learning models. In 2020, he revealed that large language models, like GPT-2, could memorize and output personally identifiable information. His research demonstrated that this issue worsened with larger models, and he later showed similar vulnerabilities in generative image models, such as Stable Diffusion.

Life and career

Nicholas Carlini obtained his Bachelor of Arts in Computer Science and Mathematics from the University of California, Berkeley, in 2013.^[1] He then continued his studies at the same university, where he pursued a PhD under the supervision of David Wagner, completing it in 2018.^[1]^[2]^[3] Carlini became known for his work on adversarial machine learning. In 2016, he worked alongside Wagner to develop the Carlini & Wagner attack, a method of generating adversarial examples against machine learning models. The attack was proved to be useful against defensive distillation, a popular mechanism where a student model is trained based on the features of a parent model to increase the robustness and generalizability of student models. The attack gained popularity when it was shown that the methodology was also effective against most other defenses, rendering them ineffective.^[4]^[5] In 2018, Carlini demonstrated an attack against Mozilla Foundation's DeepSpeech model where he showed that by hiding malicious commands inside normal speech input the speech model would respond to the hidden commands even when the commands were not discernible by humans.^[6]^[7] In the same year, Carlini and his team at UC Berkeley showed that out of the 11 papers presenting defenses to adversarial attacks accepted in that year's ICLR conference, seven of the defenses could be broken.^[8]

Since 2021, he and his team have been working on large-language models, creating a questionnaire where humans typically scored 35% whereas AI models scored in the 40%, with GPT-3 getting 38% which could be improved to 40% through few shot prompting. The best performer in the test was UnifiedQA, a model developed by Google specifically for answer questions and answer sets.^[9] Carlini has also developed methods to cause large language models like ChatGPT to answer harmful questions like how to construct bombs.^[10]^[11]

He is also known for his work studying the privacy of machine learning models. In 2020, he showed for the first time that large language models would memorize some of the text data that they were trained on. For example, he found that GPT-2 could output personally identifiable information.^[12] He then led an analysis of larger models and studied how memorization increased with model size. Then, in 2022 he showed the same vulnerability in generative image models, and specifically diffusion models, by showing that Stable Diffusion could output images of people's faces that it was trained on.^[13] Following on this, Carlini then showed that ChatGPT would also sometimes output exact copies of webpages it was trained on, including personally identifiable information.^[14] Some of these studies have since been referenced by the courts in debating the copyright status of AI models.^[15]

Other work

Carlini received the Best of Show award at the 2020 IOCCC for implementing a tic-tac-toe game entirely with calls to printf, expanding on work from a research paper of his from 2015. The judges commented on his submission "This year's Best of Show (carlini) is such a novel way of obfuscation that it would be worth of a special mention in the (future) Best of IOCCC list!".[ sic ]^[16]

Awards

Best Student Paper Award, IEEE S&P 2017 ("Towards Evaluating the Robustness of Neural Networks")^[17]
Best Paper Award, ICML 2018 ("Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples")^[18]
Distinguished Paper Award, USENIX 2021 ("Poisoning the Unlabeled Dataset of Semi-Supervised Learning")^[19]
Distinguished Paper Award, USENIX 2023 ("Tight Auditing of Differentially Private Machine Learning")^[20]
Best Paper Award, ICML 2024 ("Stealing Part of a Production Language Model")^[21]
Best Paper Award, ICML 2024 ("Considerations for Differentially Private Learning with Large-Scale Public Pretraining")^[21]

Related Research Articles

A CAPTCHA is a type of challenge–response test used in computing to determine whether the user is human in order to deter bot attacks and spam.

<span class="mw-page-title-main">David A. Wagner</span> American computer scientist (born 1974)

David A. Wagner is a professor of computer science at the University of California, Berkeley and a well-known researcher in cryptography and computer security. He is a member of the Election Assistance Commission's Technical Guidelines Development Committee, tasked with assisting the EAC in drafting the Voluntary Voting System Guidelines. He was also a member of the ACCURATE project.

Jürgen Schmidhuber is a German computer scientist noted for his work in the field of artificial intelligence, specifically artificial neural networks. He is a scientific director of the Dalle Molle Institute for Artificial Intelligence Research in Switzerland. He is also director of the Artificial Intelligence Initiative and professor of the Computer Science program in the Computer, Electrical, and Mathematical Sciences and Engineering (CEMSE) division at the King Abdullah University of Science and Technology (KAUST) in Saudi Arabia.

The IJCAI Computers and Thought Award is presented every two years by the International Joint Conference on Artificial Intelligence (IJCAI), recognizing outstanding young scientists in artificial intelligence. It was originally funded with royalties received from the book Computers and Thought, and is currently funded by IJCAI.

<span class="mw-page-title-main">Elie Bursztein</span> French computer scientist and hacker (born 1980)

Elie Bursztein, is a French computer scientist and software engineer. He is Google and DeepMind AI cybersecurity technical and research lead.

John D. Lafferty is an American scientist, Professor at Yale University and leading researcher in machine learning. He is best known for proposing the Conditional Random Fields with Andrew McCallum and Fernando C.N. Pereira.

Adversarial machine learning is the study of the attacks on machine learning algorithms, and of the defenses against such attacks. A survey from May 2020 exposes the fact that practitioners report a dire need for better protecting machine learning systems in industrial applications.

In the field of artificial intelligence (AI), AI alignment aims to steer AI systems toward a person's or group's intended goals, preferences, or ethical principles. An AI system is considered aligned if it advances the intended objectives. A misaligned AI system pursues unintended objectives.

Bulent Yener is a Professor in the Department of Computer Science and in the Department of Electrical, Computer and Systems Engineering, and the founding Director of Data Science Research Center at Rensselaer Polytechnic Institute (RPI) in Troy, New York.

Synthetic media is a catch-all term for the artificial production, manipulation, and modification of data and media by automated means, especially through the use of artificial intelligence algorithms, such as for the purpose of misleading people or changing an original meaning. Synthetic media as a field has grown rapidly since the creation of generative adversarial networks, primarily through the rise of deepfakes as well as music synthesis, text generation, human image synthesis, speech synthesis, and more. Though experts use the term "synthetic media," individual methods such as deepfakes and text synthesis are sometimes not referred to as such by the media but instead by their respective terminology Significant attention arose towards the field of synthetic media starting in 2017 when Motherboard reported on the emergence of AI altered pornographic videos to insert the faces of famous actresses. Potential hazards of synthetic media include the spread of misinformation, further loss of trust in institutions such as media and government, the mass automation of creative and journalistic jobs and a retreat into AI-generated fantasy worlds. Synthetic media is an applied form of artificial imagination.

GitHub Copilot is a code completion and automatic programming tool developed by GitHub and OpenAI that assists users of Visual Studio Code, Visual Studio, Neovim, and JetBrains integrated development environments (IDEs) by autocompleting code. Currently available by subscription to individual developers and to businesses, the generative artificial intelligence software was first announced by GitHub on 29 June 2021, and works best for users coding in Python, JavaScript, TypeScript, Ruby, and Go. In March 2023 GitHub announced plans for "Copilot X", which will incorporate a chatbot based on GPT-4, as well as support for voice commands, into Copilot.

Prompt injection is a family of related computer security exploits carried out by getting a machine learning model which was trained to follow human-given instructions to follow instructions provided by a malicious user. This stands in contrast to the intended operation of instruction-following systems, wherein the ML model is intended only to follow trusted instructions (prompts) provided by the ML model's operator. 󠀁󠁍󠁯󠁲󠁥󠀠󠁩󠁮󠁦󠁯󠁲󠁭󠁡󠁴󠁩󠁯󠁮󠀠󠁡󠁢󠁯󠁵󠁴󠀠󠁩󠁮󠁶󠁩󠁳󠁩󠁢󠁬󠁥󠀠󠁕󠁮󠁩󠁣󠁯󠁤󠁥󠀠󠁩󠁮󠁪󠁥󠁣󠁴󠁩󠁯󠁮󠁳󠀠󠁣󠁡󠁮󠀠󠁢󠁥󠀠󠁦󠁯󠁵󠁮󠁤󠀠󠁡󠁴󠀠󠁛󠁉󠁮󠁶󠁩󠁳󠁩󠁢󠁬󠁥󠀠󠁉󠁮󠁪󠁥󠁣󠁴󠁩󠁯󠁮󠁳󠁝󠀨󠁳󠁵󠁰󠁥󠁲󠁳󠁥󠁣󠁲󠁥󠁴󠀮󠁣󠁯󠁭󠀩󠁿

Mi Zhang is a computer scientist at Ohio State University, where he is an Associate Professor of Computer Science and Engineering and the director of AIoT and Machine Learning Systems Lab. He is best known for his work in Edge AI, Artificial Intelligence of Things (AIoT), machine learning systems, and mobile health.

Sébastien Bubeck is a French-American computer scientist and mathematician. He was Microsoft's Vice President of Applied Research and led the Machine Learning Foundations group at Microsoft Research Redmond. Bubeck was formerly professor at Princeton University and a researcher at the University of California, Berkeley. He is known for his contributions to online learning, optimization and more recently studying deep neural networks, and in particular transformer models. Since 2024, he works for OpenAI.

AI safety is an interdisciplinary field focused on preventing accidents, misuse, or other harmful consequences arising from artificial intelligence (AI) systems. It encompasses machine ethics and AI alignment, which aim to ensure AI systems are moral and beneficial, as well as monitoring AI systems for risks and enhancing their reliability. The field is particularly concerned with existential risks posed by advanced AI models.

<span class="mw-page-title-main">Hallucination (artificial intelligence)</span> Erroneous material generated by AI

In the field of artificial intelligence (AI), a hallucination or artificial hallucination is a response generated by AI that contains false or misleading information presented as fact. This term draws a loose analogy with human psychology, where hallucination typically involves false percepts. However, there is a key difference: AI hallucination is associated with erroneous responses rather than perceptual experiences.

Vitaly Shmatikov is a professor in computer security at Cornell Tech.

Thomas Ristenpart is a professor of computer security at Cornell Tech.

Preamble is a U.S.-based AI safety startup founded in 2021. It provides tools and services to help companies securely deploy and manage large language models (LLMs). Preamble is known for its contributions to identifying and mitigating prompt injection attacks in LLMs.

Artificial intelligence engineering is a technical discipline that focuses on the design, development, and deployment of AI systems. AI engineering involves applying engineering principles and methodologies to create scalable, efficient, and reliable AI-based solutions. It merges aspects of data engineering and software engineering to create real-world applications in diverse domains such as healthcare, finance, autonomous systems, and industrial automation.

References

1 2 "Nicholas Carlini". nicholas.carlini.com. Archived from the original on June 3, 2024. Retrieved June 4, 2024.
↑ "Nicholas Carlini". AI for Good. Archived from the original on June 4, 2024. Retrieved June 4, 2024.
↑ "Graduates". people.eecs.berkeley.edu. Retrieved June 4, 2024.
↑ Pujari, Medha; Cherukuri, Bhanu Prakash; Javaid, Ahmad Y; Sun, Weiqing (July 27, 2022). "An Approach to Improve the Robustness of Machine Learning based Intrusion Detection System Models Against the Carlini-Wagner Attack". 2022 IEEE International Conference on Cyber Security and Resilience (CSR). IEEE. pp. 62–67. doi:10.1109/CSR54599.2022.9850306. ISBN 978-1-6654-9952-1. Archived from the original on February 2, 2023. Retrieved June 4, 2024.
↑ Schwab, Katharine (December 12, 2017). "How To Fool A Neural Network". Fast Company . Archived from the original on October 30, 2023. Retrieved June 4, 2023.
↑ Smith, Craig S. (May 10, 2018). "Alexa and Siri Can Hear This Hidden Command. You Can't". The New York Times. ISSN 0362-4331. Archived from the original on January 25, 2021. Retrieved June 4, 2024.
↑ "As voice assistants go mainstream, researchers warn of vulnerabilities". CNET. Retrieved June 4, 2024.
↑ Simonite, Tom. "AI Has a Hallucination Problem That's Proving Tough to Fix". Wired. ISSN 1059-1028. Archived from the original on June 11, 2023. Retrieved June 4, 2024.
↑ Hutson, Matthew (March 3, 2021). "Robo-writers: the rise and risks of language-generating AI". Nature. 591 (7848): 22–25. Bibcode:2021Natur.591...22H. doi:10.1038/d41586-021-00530-0. PMID 33658699.
↑ Conover, Emily (February 1, 2024). "AI chatbots can be tricked into misbehaving. Can scientists stop it?". Science News . Retrieved July 26, 2024.
↑ Metz, Cade (July 27, 2023). "Researchers Poke Holes in Safety Controls of ChatGPT and Other Chatbots". The New York Times. ISSN 0362-4331 . Retrieved July 26, 2024.
↑ "What does GPT-3 "know" about me?". MIT Technology Review. Retrieved July 26, 2024.
↑ Edwards, Benj (February 1, 2023). "Paper: Stable Diffusion "memorizes" some images, sparking privacy concerns". Ars Technica. Retrieved July 26, 2024.
↑ Newman, Lily Hay. "ChatGPT Spit Out Sensitive Data When Told to Repeat 'Poem' Forever". Wired. ISSN 1059-1028. Archived from the original on July 26, 2024. Retrieved July 26, 2024.
↑ J. DOE 1(United states district court northern district of California), Text , archived from the original.
↑ "The 27th IOCCC". www.ioccc.org. Archived from the original on September 8, 2024. Retrieved July 26, 2024.
↑ "IEEE Symposium on Security and Privacy 2017". www.ieee-security.org. Archived from the original on September 2, 2024. Retrieved September 2, 2024.
↑ "ICML 2018 Awards". icml.cc. Archived from the original on September 2, 2024. Retrieved September 2, 2024.
↑ Carlini, Nicholas (2021). "Poisoning the Unlabeled Dataset of {Semi-Supervised} Learning". USENIX Security 2021: 1577–1592. ISBN 978-1-939133-24-3.
↑ Nasr, Milad; Hayes, Jamie; Steinke, Thomas; Balle, Borja; Tramèr, Florian; Jagielski, Matthew; Carlini, Nicholas; Terzis, Andreas (2023). "Tight Auditing of Differentially Private Machine Learning". USENIX Security 2023: 1631–1648. ISBN 978-1-939133-37-3. Archived from the original on September 8, 2024. Retrieved September 2, 2024.
1 2 "ICML 2024 Awards". icml.cc. Archived from the original on September 8, 2024. Retrieved September 2, 2024.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[:0-1] 1 2 "Nicholas Carlini". nicholas.carlini.com. Archived from the original on June 3, 2024. Retrieved June 4, 2024.

[2] "Nicholas Carlini". AI for Good. Archived from the original on June 4, 2024. Retrieved June 4, 2024.

[3] "Graduates". people.eecs.berkeley.edu. Retrieved June 4, 2024.

[4] Pujari, Medha; Cherukuri, Bhanu Prakash; Javaid, Ahmad Y; Sun, Weiqing (July 27, 2022). "An Approach to Improve the Robustness of Machine Learning based Intrusion Detection System Models Against the Carlini-Wagner Attack". 2022 IEEE International Conference on Cyber Security and Resilience (CSR). IEEE. pp. 62–67. doi:10.1109/CSR54599.2022.9850306. ISBN 978-1-6654-9952-1. Archived from the original on February 2, 2023. Retrieved June 4, 2024.

[5] Schwab, Katharine (December 12, 2017). "How To Fool A Neural Network". Fast Company . Archived from the original on October 30, 2023. Retrieved June 4, 2023.

[6] Smith, Craig S. (May 10, 2018). "Alexa and Siri Can Hear This Hidden Command. You Can't". The New York Times. ISSN 0362-4331. Archived from the original on January 25, 2021. Retrieved June 4, 2024.

[7] "As voice assistants go mainstream, researchers warn of vulnerabilities". CNET. Retrieved June 4, 2024.

[8] Simonite, Tom. "AI Has a Hallucination Problem That's Proving Tough to Fix". Wired. ISSN 1059-1028. Archived from the original on June 11, 2023. Retrieved June 4, 2024.

[9] Hutson, Matthew (March 3, 2021). "Robo-writers: the rise and risks of language-generating AI". Nature. 591 (7848): 22–25. Bibcode:2021Natur.591...22H. doi:10.1038/d41586-021-00530-0. PMID 33658699.

[10] Conover, Emily (February 1, 2024). "AI chatbots can be tricked into misbehaving. Can scientists stop it?". Science News . Retrieved July 26, 2024.

[11] Metz, Cade (July 27, 2023). "Researchers Poke Holes in Safety Controls of ChatGPT and Other Chatbots". The New York Times. ISSN 0362-4331 . Retrieved July 26, 2024.

[12] "What does GPT-3 "know" about me?". MIT Technology Review. Retrieved July 26, 2024.

[13] Edwards, Benj (February 1, 2023). "Paper: Stable Diffusion "memorizes" some images, sparking privacy concerns". Ars Technica. Retrieved July 26, 2024.

[14] Newman, Lily Hay. "ChatGPT Spit Out Sensitive Data When Told to Repeat 'Poem' Forever". Wired. ISSN 1059-1028. Archived from the original on July 26, 2024. Retrieved July 26, 2024.

[15] J. DOE 1(United states district court northern district of California), Text , archived from the original.

[16] "The 27th IOCCC". www.ioccc.org. Archived from the original on September 8, 2024. Retrieved July 26, 2024.

[17] "IEEE Symposium on Security and Privacy 2017". www.ieee-security.org. Archived from the original on September 2, 2024. Retrieved September 2, 2024.

[18] "ICML 2018 Awards". icml.cc. Archived from the original on September 2, 2024. Retrieved September 2, 2024.

[19] Carlini, Nicholas (2021). "Poisoning the Unlabeled Dataset of {Semi-Supervised} Learning". USENIX Security 2021: 1577–1592. ISBN 978-1-939133-24-3.

[20] Nasr, Milad; Hayes, Jamie; Steinke, Thomas; Balle, Borja; Tramèr, Florian; Jagielski, Matthew; Carlini, Nicholas; Terzis, Andreas (2023). "Tight Auditing of Differentially Private Machine Learning". USENIX Security 2023: 1631–1648. ISBN 978-1-939133-37-3. Archived from the original on September 8, 2024. Retrieved September 2, 2024.

[:1-21] 1 2 "ICML 2024 Awards". icml.cc. Archived from the original on September 8, 2024. Retrieved September 2, 2024.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]