Andy Zeng

Andy Zeng
Andy Zeng
Alma mater	Princeton University ; University of California, Berkeley ;
Known for	Robotics ; Artificial intelligence ; Machine learning ;
	Scientific career
Institutions	Google DeepMind ;
Thesis	Learning Visual Affordances for Robotic Manipulation (2019)

Last updated January 27, 2025

Andy Zeng is an American computer scientist and AI engineer at Google DeepMind. He is best known for his research in robotics and machine learning, including robot learning algorithms that enable machines to intelligently interact with the physical world and improve themselves over time. Zeng was a recipient of the Gordon Y.S. Wu Fellowship in Engineering and Wu Prize in 2016, and the Princeton SEAS Award for Excellence in 2018.^[1]^[2]

Early life and education

Zeng studied computer science and mathematics as an undergraduate student at the University of California, Berkeley.^[3] He then moved to Princeton University, where he completed his Ph.D. in 2019. His thesis focused on deep learning algorithms that enable robots to understand the visual world and interact with unfamiliar physical objects.^[4] He developed a class of deep neural network architectures inspired by the concept of affordances in cognitive psychology (perceiving the world in terms of actions), which allow machines to learn skills that can quickly adapt and generalize to new scenarios.^[5] As a doctoral student, he co-led Team MIT-Princeton^[6] to win 1st Place of the Stow Task^[7] at the Amazon Picking Challenge,^[8] a global competition focused on advancing robotic manipulation and bin picking. He also spent time as a student researcher at Google Brain.^[9] His graduate studies were supported by the NVIDIA Fellowship.^[10]

Research and career

Zeng investigates the capabilities of robots to intelligently improve themselves over time through self-supervised learning algorithms, such as learning how to assemble objects by disassembling them,^[11] or acquiring new dexterous skills by watching videos of people.^[12] Notable demonstrations include Google's TossingBot,^[13] a robot that can learn to grasp and throw unfamiliar objects using physics as a prior model of how the world works. His research also investigates 3D computer vision algorithms.

He pioneered the use of Foundation models in robotics, from systems that take action by write their own code,^[14] to robots that can plan and reason by grounding language in affordances.^[15]^[16] He co-developed large multimodal models, and showed that they can be used for intelligent robot navigation, world modeling, and assistive agents.^[17] He also worked on algorithms that allow large language models to know when they don't know and ask for help.^[18]

In 2024, Zeng was awarded the IEEE Early Career Award in Robotics and Automation “for outstanding contributions to robot learning.”^[19]

Related Research Articles

Artificial intelligence (AI), in its broadest sense, is intelligence exhibited by machines, particularly computer systems. It is a field of research in computer science that develops and studies methods and software that enable machines to perceive their environment and use learning and intelligence to take actions that maximize their chances of achieving defined goals. Such machines may be called AIs.

Computer vision tasks include methods for acquiring, processing, analyzing, and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information, e.g. in the form of decisions. "Understanding" in this context signifies the transformation of visual images into descriptions of the world that make sense to thought processes and can elicit appropriate action. This image understanding can be seen as the disentangling of symbolic information from image data using models constructed with the aid of geometry, physics, statistics, and learning theory.

<span class="mw-page-title-main">Multi-agent system</span> Built of multiple interacting agents

A multi-agent system is a computerized system composed of multiple interacting intelligent agents. Multi-agent systems can solve problems that are difficult or impossible for an individual agent or a monolithic system to solve. Intelligence may include methodic, functional, procedural approaches, algorithmic search or reinforcement learning. With advancements in Large language model (LLMs), LLM-based multi-agent systems have emerged as a new area of research, enabling more sophisticated interactions and coordination among agents.

The expression computational intelligence (CI) usually refers to the ability of a computer to learn a specific task from data or experimental observation. Even though it is commonly considered a synonym of soft computing, there is still no commonly accepted definition of computational intelligence.

Robot learning is a research field at the intersection of machine learning and robotics. It studies techniques allowing a robot to acquire novel skills or adapt to its environment through learning algorithms. The embodiment of the robot, situated in a physical embedding, provides at the same time specific difficulties and opportunities for guiding the learning process.

The following outline is provided as an overview of and topical guide to artificial intelligence:

Daniela L. Rus is a Romanian-American computer scientist. She serves as director of the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL), and the Andrew and Erna Viterbi Professor in the Department of Electrical Engineering and Computer Science (EECS) at the Massachusetts Institute of Technology. She is the author of the books Computing the Future, The Heart and the Chip: Our Bright Future with Robots, and The Mind's Mirror: Risk and Reward in the Age of AI.

Google Brain was a deep learning artificial intelligence research team that served as the sole AI branch of Google before being incorporated under the newer umbrella of Google AI, a research division at Google dedicated to artificial intelligence. Formed in 2011, it combined open-ended machine learning research with information systems and large-scale computing resources. It created tools such as TensorFlow, which allow neural networks to be used by the public, and multiple internal AI research projects, and aimed to create research opportunities in machine learning and natural language processing. It was merged into former Google sister company DeepMind to form Google DeepMind in April 2023.

Cloud robotics is a field of robotics that attempts to invoke cloud technologies such as cloud computing, cloud storage, and other Internet technologies centered on the benefits of converged infrastructure and shared services for robotics. When connected to the cloud, robots can benefit from the powerful computation, storage, and communication resources of modern data center in the cloud, which can process and share information from various robots or agent. Humans can also delegate tasks to robots remotely through networks. Cloud computing technologies enable robot systems to be endowed with powerful capability whilst reducing costs through cloud technologies. Thus, it is possible to build lightweight, low-cost, smarter robots with an intelligent "brain" in the cloud. The "brain" consists of data center, knowledge base, task planners, deep learning, information processing, environment models, communication support, etc.

Visual computing is a generic term for all computer science disciplines dealing with the 3D modeling of graphical requirements, for which extenuates to all disciplines of the Computational Sciences. While this is directly relevant to the software visualistics of Microservices, Visual Computing also includes the specializations of the subfields that are called Computer Graphics, Image Processing, Visualization, Computer Vision, Computational Imaging, Augmented Reality, and Video Processing, upon which extenuates into Design Computation. Visual computing also includes aspects of Pattern Recognition, Human-Computer Interaction, Machine Learning, Robotics, Computer Simulation, Steganography, Security Visualization, Spatial Analysis, Computational Visualistics, and Computational Creativity. The core challenges are the acquisition, processing, analysis and rendering of visual information. Application areas include industrial quality control, medical image processing and visualization, surveying, multimedia systems, virtual heritage, special effects in movies and television, and ultimately computer games, which is central towards the visual models of User Experience Design. Conclusively, this includes the extenuations of large language models (LLM) that are in Generative Artificial Intelligence for developing research around the simulations of scientific instruments in the Computational Sciences. This is especially the case with the research simulations that are between Embodied Agents and Generative Artificial Intelligence that is designed for Visual Computation. Therefore, this field also extenuates into the diversity of scientific requirements that are addressed through the visualized technologies of interconnected research in the Computational Sciences.

This glossary of artificial intelligence is a list of definitions of terms and concepts relevant to the study of artificial intelligence (AI), its subdisciplines, and related fields. Related glossaries include Glossary of computer science, Glossary of robotics, and Glossary of machine vision.

An AI accelerator, deep learning processor or neural processing unit (NPU) is a class of specialized hardware accelerator or computer system designed to accelerate artificial intelligence (AI) and machine learning applications, including artificial neural networks and computer vision. Typical applications include algorithms for robotics, Internet of Things, and other data-intensive or sensor-driven tasks. They are often manycore designs and generally focus on low-precision arithmetic, novel dataflow architectures or in-memory computing capability. As of 2024, a typical AI integrated circuit chip contains tens of billions of MOSFETs.

Animashree (Anima) Anandkumar is the Bren Professor of Computing at California Institute of Technology. Previously, she was a senior director of Machine Learning research at NVIDIA and a principal scientist at Amazon Web Services. Her research considers tensor-algebraic methods, deep learning and non-convex problems.

Gregory D. Hager is the Mandell Bellmore Professor of Computer Science and founding director of the Johns Hopkins Malone Center for Engineering in Healthcare at Johns Hopkins University.

Vivian Chu is an American roboticist and entrepreneur, specializing in the field of human-robot interaction. She is Chief Technology Officer at Diligent Robotics, a company she co-founded in 2017 for creating autonomous, mobile, socially intelligent robots.

Felix Heide is a German-born computer scientist known for his work in the fields of computational imaging, computer vision, computer graphics and deep learning. He is an assistant professor at Princeton University and was the head of the Computational Imaging Lab. He serves as Head of Artificial Intelligence at Torc Robotics. Heide co-founded Algolux, a startup in computer vision technology for self-driving vehicles, which later merged with Torc Robotics.

Chelsea Finn is an American computer scientist and assistant professor at Stanford University. Her research investigates intelligence through the interactions of robots, with the hope to create robotic systems that can learn how to learn. She is part of the Google Brain group.

Juyang (John) Weng is a Chinese-American computer engineer, neuroscientist, author, and academic. He is a former professor at the Department of Computer Science and Engineering at Michigan State University and the President of Brain-Mind Institute and GENISAMA.

Jürgen Sturm is a German software engineer, entrepreneur and academic. He is a Senior Staff Software Engineer at Google, where he works on bringing 3D reconstruction and semantic scene understanding to mixed reality devices.

References

↑ "Princeton Robotics Seminar: Language as Robot Middleware | Computer Science Department at Princeton University". Princeton University .
↑ "Andy Zeng". IEEE .
↑ "CSL Seminar - Embodied Intelligence". Massachusetts Institute of Technology .
↑ "Learning Visual Affordances for Robotic Manipulation - ProQuest". www.proquest.com.
↑ "Visual Transfer Learning for Robotic Manipulation". Google .
↑ "MIT-Princeton at the Amazon Robotics Challenge". Princeton University .
↑ "Australian Centre for Robotic Vision from Australia Wins Grand Championship at 2017 Amazon Robotics Challenge". Press Center. 1 August 2017.
↑ Malamut, Layla; Nathans, Aaron. "Princeton graduate student teams advance in robotics, intelligent systems competitions". Princeton University .
↑ "Google's Tossingbot Can Toss Over 500 Objects Per Hour Into Target Locations". NVIDIA Technical Blog. 28 March 2019.
↑ "2018 Grad Fellows | Research". research.nvidia.com.
↑ "Learning to Assemble and to Generalize from Self-Supervised Disassembly". research.google.
↑ "Robot See, Robot Do". research.google.
↑ "Inside Google's Rebooted Robotics Program". The New York Times .
↑ Heater, Brian (2022-11-02). "Google wants robots to generate their own code". TechCrunch. Retrieved 2024-10-18.
↑ "PaLM-SayCan". families.google.com. Retrieved 2024-10-18.
↑ "Google is training its robots to be more like humans". The Washington Post .
↑ "Visual language maps for robot navigation". research.google. Retrieved 2024-10-18.
↑ "These robots know when to ask for help". MIT Technology Review. Retrieved 2024-10-18.
↑ "2024 IEEE RAS Award Recipients Announced! - IEEE Robotics and Automation Society". www.ieee-ras.org. 2024-03-22. Retrieved 2024-10-18.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "Princeton Robotics Seminar: Language as Robot Middleware | Computer Science Department at Princeton University". Princeton University .

[2] "Andy Zeng". IEEE .

[3] "CSL Seminar - Embodied Intelligence". Massachusetts Institute of Technology .

[4] "Learning Visual Affordances for Robotic Manipulation - ProQuest". www.proquest.com.

[5] "Visual Transfer Learning for Robotic Manipulation". Google .

[6] "MIT-Princeton at the Amazon Robotics Challenge". Princeton University .

[7] "Australian Centre for Robotic Vision from Australia Wins Grand Championship at 2017 Amazon Robotics Challenge". Press Center. 1 August 2017.

[8] Malamut, Layla; Nathans, Aaron. "Princeton graduate student teams advance in robotics, intelligent systems competitions". Princeton University .

[9] "Google's Tossingbot Can Toss Over 500 Objects Per Hour Into Target Locations". NVIDIA Technical Blog. 28 March 2019.

[10] "2018 Grad Fellows | Research". research.nvidia.com.

[11] "Learning to Assemble and to Generalize from Self-Supervised Disassembly". research.google.

[12] "Robot See, Robot Do". research.google.

[13] "Inside Google's Rebooted Robotics Program". The New York Times .

[14] Heater, Brian (2022-11-02). "Google wants robots to generate their own code". TechCrunch. Retrieved 2024-10-18.

[15] "PaLM-SayCan". families.google.com. Retrieved 2024-10-18.

[16] "Google is training its robots to be more like humans". The Washington Post .

[17] "Visual language maps for robot navigation". research.google. Retrieved 2024-10-18.

[18] "These robots know when to ask for help". MIT Technology Review. Retrieved 2024-10-18.

[19] "2024 IEEE RAS Award Recipients Announced! - IEEE Robotics and Automation Society". www.ieee-ras.org. 2024-03-22. Retrieved 2024-10-18.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

Andy Zeng
Alma mater	Princeton University University of California, Berkeley
Known for	Robotics Artificial intelligence Machine learning
Scientific career
Institutions	Google DeepMind
Thesis	Learning Visual Affordances for Robotic Manipulation (2019)

Andy Zeng

Contents

Early life and education

Research and career

Related Research Articles

References