Noam Shazeer (born 1975or1976 [1] ) is an American computer scientist and entrepreneur known for his contributions to the field of artificial intelligence and deep learning, particularly in the development of transformer models and natural language processing.
Noam Shazeer joined Google in 2000. One of his first major achievements was improving the spelling corrector of Google' search engine. [1] In 2017, Shazeer was one of the lead authors of the seminal paper "Attention Is All You Need", [2] [3] [1] which introduced the transformer architecture.
At Google, Shazeer and his colleague Daniel de Freitas built a chatbot named Meena. [1] Following the refusal of Google to release the chatbot to the public, Shazeer and Freitas left the company in 2021 to found Character.AI. [1] [4]
In August 2024, it was reported that Shazeer would be returning to Google to co-lead the Gemini AI project. [5] Shazeer was appointed as technical lead on Gemini, along with Jeff Dean and Oriol Vinyals. [6] It was part of a $2.7 billion deal for Google to license Character's technology. [1] [7]
Multimodal learning is a type of deep learning that integrates and processes multiple types of data, referred to as modalities, such as text, audio, images, or video. This integration allows for a more holistic understanding of complex data, improving model performance in tasks like visual question answering, cross-modal retrieval, text-to-image generation, aesthetic ranking, and image captioning.
Neural machine translation (NMT) is an approach to machine translation that uses an artificial neural network to predict the likelihood of a sequence of words, typically modeling entire sentences in a single integrated model.
This page is a timeline of machine learning. Major discoveries, achievements, milestones and other major events in machine learning are included.
Artificial neural networks (ANNs) are models created using machine learning to perform a number of tasks. Their creation was inspired by biological neural circuitry. While some of the computational implementations ANNs relate to earlier discoveries in mathematics, the first implementation of ANNs was by psychologist Frank Rosenblatt, who developed the perceptron. Little research was conducted on ANNs in the 1970s and 1980s, with the AAAI calling this period an "AI winter".
A transformer is a deep learning architecture developed by researchers at Google and based on the multi-head attention mechanism, proposed in the 2017 paper "Attention Is All You Need". Text is converted to numerical representations called tokens, and each token is converted into a vector via lookup from a word embedding table. At each layer, each token is then contextualized within the scope of the context window with other (unmasked) tokens via a parallel multi-head attention mechanism, allowing the signal for key tokens to be amplified and less important tokens to be diminished.
Seq2seq is a family of machine learning approaches used for natural language processing. Applications include language translation, image captioning, conversational models, and text summarization. Seq2seq uses sequence transformation: it turns one sequence into another sequence.
Generative Pre-trained Transformer 3 (GPT-3) is a large language model released by OpenAI in 2020.
Generative Pre-trained Transformer 2 (GPT-2) is a large language model by OpenAI and the second in their foundational series of GPT models. GPT-2 was pre-trained on a dataset of 8 million web pages. It was partially released in February 2019, followed by full release of the 1.5-billion-parameter model on November 5, 2019.
A vision transformer (ViT) is a transformer designed for computer vision. A ViT decomposes an input image into a series of patches, serializes each patch into a vector, and maps it to a smaller dimension with a single matrix multiplication. These vector embeddings are then processed by a transformer encoder as if they were token embeddings.
Generative Pre-trained Transformer 1 (GPT-1) was the first of OpenAI's large language models following Google's invention of the transformer architecture in 2017. In June 2018, OpenAI released a paper entitled "Improving Language Understanding by Generative Pre-Training", in which they introduced that initial model along with the general concept of a generative pre-trained transformer.
LaMDA is a family of conversational large language models developed by Google. Originally developed and introduced as Meena in 2020, the first-generation LaMDA was announced during the 2021 Google I/O keynote, while the second generation was announced the following year.
Oriol Vinyals is a Spanish machine learning researcher at DeepMind. He is currently technical lead on Gemini, along with Noam Shazeer and Jeff Dean.
Character.ai is a neural language model chatbot service that can generate human-like text responses and participate in contextual conversation. Constructed by previous developers of Google's LaMDA, Noam Shazeer and Daniel de Freitas, the beta model was made available to use by the public in September 2022. The beta model has since been retired on September 24, 2024, and can no longer be used.
A generative pre-trained transformer (GPT) is a type of large language model (LLM) and a prominent framework for generative artificial intelligence. It is an artificial neural network that is used in natural language processing by machines. It is based on the transformer deep learning architecture, pre-trained on large data sets of unlabeled text, and able to generate novel human-like content. As of 2023, most LLMs had these characteristics and are sometimes referred to broadly as GPTs.
A large language model (LLM) is a type of computational model designed for natural language processing tasks such as language generation. As language models, LLMs acquire these abilities by learning statistical relationships from vast amounts of text during a self-supervised and semi-supervised training process.
In machine learning, a neural scaling law is an empirical scaling law that describes how neural network performance changes as key factors are scaled up or down. These factors typically include the number of parameters, training dataset size, and training cost.
Ashish Vaswani is a computer scientist working in deep learning, who is known for his significant contributions to the field of artificial intelligence (AI) and natural language processing (NLP). He is one of the co-authors of the seminal paper "Attention Is All You Need" which introduced the Transformer model, a novel architecture that uses a self-attention mechanism and has since become foundational to many state-of-the-art models in NLP. Transformer architecture is the core of language models that power applications such as ChatGPT. He was a co-founder of Adept AI Labs and a former staff research scientist at Google Brain.
Aidan Gomez is a British-Canadian computer scientist working in the field of artificial intelligence, with a focus on natural language processing. He is the co-founder and CEO of the technology company Cohere.
"Attention Is All You Need" is a 2017 landmark research paper in machine learning authored by eight scientists working at Google. The paper introduced a new deep learning architecture known as the transformer, based on the attention mechanism proposed in 2014 by Bahdanau et al. It is considered a foundational paper in modern artificial intelligence, as the transformer approach has become the main architecture of large language models like those based on GPT. At the time, the focus of the research was on improving Seq2seq techniques for machine translation, but the authors go further in the paper, foreseeing the technique's potential for other tasks like question answering and what is now known as multimodal Generative AI.
T5 is a series of large language models developed by Google AI introduced in 2019. Like the original Transformer model, T5 models are encoder-decoder Transformers, where the encoder processes the input text, and the decoder generates the output text.