Attention Is All You Need

Last updated

An illustration of main components of the transformer model from the paper The-Transformer-model-architecture.png
An illustration of main components of the transformer model from the paper

"Attention Is All You Need" is a landmark [1] [2] 2017 research paper by Google. [3] Authored by eight scientists, it was responsible for expanding 2014 attention mechanisms proposed by Bahdanau et al. into a new deep learning architecture known as the transformer. The paper is considered by some to be a founding document for modern artificial intelligence, as transformers became the main architecture of large language models. [4] [5] At the time, the focus of the research was on improving Seq2seq techniques for machine translation, but even in their paper the authors saw the potential for other tasks like question answering and for what is now called multimodal Generative AI.

The paper's title is a reference to the song "All You Need Is Love" by the Beatles. [6]

As of 2024, the paper has been cited more than 100,000 times. [7]

Authors

All eight authors were "Equal contributors" to the paper, and the name order was randomised. The Wired article highlights the group's diversity: [6]

Six of the eight authors were born outside the United States; the other two are children of two green-card-carrying Germans who were temporarily in California and a first-generation American whose family had fled persecution, respectively.

By 2023, all eight authors had left Google and founded their own AI start-ups (except Łukasz Kaiser, who joined OpenAI). [6] [7]

Related Research Articles

<span class="mw-page-title-main">Jürgen Schmidhuber</span> German computer scientist

Jürgen Schmidhuber is a German computer scientist noted for his work in the field of artificial intelligence, specifically artificial neural networks. He is a scientific director of the Dalle Molle Institute for Artificial Intelligence Research in Switzerland. He is also director of the Artificial Intelligence Initiative and professor of the Computer Science program in the Computer, Electrical, and Mathematical Sciences and Engineering (CEMSE) division at the King Abdullah University of Science and Technology (KAUST) in Saudi Arabia.

Google Brain was a deep learning artificial intelligence research team under the umbrella of Google AI, a research division at Google dedicated to artificial intelligence. Formed in 2011, it combined open-ended machine learning research with information systems and large-scale computing resources. It created tools such as TensorFlow, which allow neural networks to be used by the public, and multiple internal AI research projects, and aimed to create research opportunities in machine learning and natural language processing. It was merged into former Google sister company DeepMind to form Google DeepMind in April 2023.

<span class="mw-page-title-main">Google DeepMind</span> Artificial intelligence division

DeepMind Technologies Limited, doing business as Google DeepMind, is a British-American artificial intelligence research laboratory which serves as a subsidiary of Google. Founded in the UK in 2010, it was acquired by Google in 2014. The company is based in London, with research centres in Canada, France, Germany, and the United States.

Neural machine translation (NMT) is an approach to machine translation that uses an artificial neural network to predict the likelihood of a sequence of words, typically modeling entire sentences in a single integrated model.

Semantic Scholar is a research tool for scientific literature powered by artificial intelligence. It is developed at the Allen Institute for AI and was publicly released in November 2015. Semantic Scholar uses modern techniques in natural language processing to support the research process, for example by providing automatically generated summaries of scholarly papers. The Semantic Scholar team is actively researching the use of artificial intelligence in natural language processing, machine learning, human–computer interaction, and information retrieval.

<span class="mw-page-title-main">OpenAI</span> Artificial intelligence research organization

OpenAI is a U.S.-based artificial intelligence (AI) research organization founded in December 2015, researching artificial intelligence with the goal of developing "safe and beneficial" artificial general intelligence, which it defines as "highly autonomous systems that outperform humans at most economically valuable work". As one of the leading organizations of the AI boom, it has developed several large language models, advanced image generation models, and previously, released open-source models. Its release of ChatGPT has been credited with starting the AI boom.

<span class="mw-page-title-main">Timnit Gebru</span> Computer scientist

Timnit Gebru is an Eritrean Ethiopian-born computer scientist who works in the fields of artificial intelligence (AI), algorithmic bias and data mining. She is an advocate for diversity in technology and co-founder of Black in AI, a community of Black researchers working in AI. She is the founder of the Distributed Artificial Intelligence Research Institute (DAIR).

<span class="mw-page-title-main">Transformer (deep learning architecture)</span> Machine learning algorithm used for natural-language processing

A transformer is a deep learning architecture developed by Google and based on the multi-head attention mechanism, proposed in a 2017 paper "Attention Is All You Need". Text is converted to numerical representations called tokens, and each token is converted into a vector via looking up from a word embedding table. At each layer, each token is then contextualized within the scope of the context window with other (unmasked) tokens via a parallel multi-head attention mechanism allowing the signal for key tokens to be amplified and less important tokens to be diminished. The transformer paper, published in 2017, is based on the softmax-based attention mechanism proposed by Bahdanau et. al. in 2014 for machine translation, and the Fast Weight Controller, similar to a transformer, proposed in 1992.

Generative Pre-trained Transformer 3 (GPT-3) is a large language model released by OpenAI in 2020. Like its predecessor, GPT-2, it is a decoder-only transformer model of deep neural network, which supersedes recurrence and convolution-based architectures with a technique known as "attention". This attention mechanism allows the model to selectively focus on segments of input text it predicts to be most relevant. It uses a 2048-tokens-long context, float16 (16-bit) precision, and a hitherto-unprecedented 175 billion parameters, requiring 350GB of storage space as each parameter takes 2 bytes of space, and has demonstrated strong "zero-shot" and "few-shot" learning abilities on many tasks.

Machine learning-based attention is a mechanism which intuitively mimics cognitive attention. It calculates "soft" weights for each word, more precisely for its embedding, in the context window. These weights can be computed either in parallel or sequentially. "Soft" weights can change during each runtime, in contrast to "hard" weights, which are (pre-)trained and fine-tuned and remain frozen afterwards.

<span class="mw-page-title-main">GPT-2</span> 2019 text-generating language model

Generative Pre-trained Transformer 2 (GPT-2) is a large language model by OpenAI and the second in their foundational series of GPT models. GPT-2 was pre-trained a dataset of 8 million web pages. It was partially released in February 2019, followed by full release of the 1.5-billion-parameter model on November 5, 2019.

A vision transformer (ViT) is a transformer designed for computer vision. A ViT breaks down an input image into a series of patches, serialises each patch into a vector, and maps it to a smaller dimension with a single matrix multiplication. These vector embeddings are then processed by a transformer encoder as if they were token embeddings.

Cohere is a Canadian multinational technology company focused on artificial intelligence for the enterprise, specializing in large language models. Cohere was founded in 2019 by Aidan Gomez, Ivan Zhang, and Nick Frosst, and is headquartered in Toronto and San Francisco, with offices in Palo Alto and London.

LaMDA is a family of conversational large language models developed by Google. Originally developed and introduced as Meena in 2020, the first-generation LaMDA was announced during the 2021 Google I/O keynote, while the second generation was announced the following year. In June 2022, LaMDA gained widespread attention when Google engineer Blake Lemoine made claims that the chatbot had become sentient. The scientific community has largely rejected Lemoine's claims, though it has led to conversations about the efficacy of the Turing test, which measures whether a computer can pass for a human. In February 2023, Google announced Bard, a conversational artificial intelligence chatbot powered by LaMDA, to counter the rise of OpenAI's ChatGPT.

<span class="mw-page-title-main">ChatGPT</span> Chatbot developed by OpenAI

ChatGPT is a chatbot developed by OpenAI and launched on November 30, 2022. Based on large language models, it enables users to refine and steer a conversation towards a desired length, format, style, level of detail, and language. Successive user prompts and replies are considered at each conversation stage as context.

<span class="mw-page-title-main">Generative pre-trained transformer</span> Type of large language model

Generative pre-trained transformers (GPT) are a type of large language model (LLM) and a prominent framework for generative artificial intelligence. They are artificial neural networks that are used in natural language processing tasks. GPTs are based on the transformer architecture, pre-trained on large data sets of unlabelled text, and able to generate novel human-like content. As of 2023, most LLMs have these characteristics and are sometimes referred to broadly as GPTs.

Ashish Vaswani is a computer scientist working in deep learning, who is known for his significant contributions to the field of artificial intelligence (AI) and natural language processing (NLP). He is one of the co-authors of the seminal paper "Attention Is All You Need" which introduced the Transformer model, a novel architecture that uses a self-attention mechanism and has since become foundational to many state-of-the-art models in NLP. Transformer architecture is the core of language models that power applications such as ChatGPT. He was a co-founder of Adept AI Labs and a former staff research scientist at Google Brain.

<span class="mw-page-title-main">Gemini (language model)</span> Large language model developed by Google

Gemini is a family of multimodal large language models developed by Google DeepMind, serving as the successor to LaMDA and PaLM 2. Comprising Gemini Ultra, Gemini Pro, and Gemini Nano, it was announced on December 6, 2023, positioned as a competitor to OpenAI's GPT-4. It powers the chatbot of the same name.

<span class="mw-page-title-main">Aidan Gomez</span> Artificial intelligence researcher

Aidan Gomez is a British-Canadian computer scientist working in the field of artificial intelligence, with a focus on natural language processing. He is the co-founder and CEO of the technology company Cohere.

T5 is a series of large language models developed by Google AI. Introduced in 2019, T5 models are trained on a massive dataset of text and code using a text-to-text framework. The T5 models are capable of performing the text-based tasks that they were pretrained for. They can also be finetuned to perform other tasks.They have been employed in various applications, including chatbots, machine translation systems, text summarization tools, code generation, and robotics.

References

  1. Love, Julia (10 July 2023). "AI Researcher Who Helped Write Landmark Paper Is Leaving Google". Bloomberg News . Retrieved 1 April 2024.
  2. Goldman, Sharon (20 March 2024). "'Attention is All You Need' creators look beyond Transformers for AI at Nvidia GTC: 'The world needs something better'". VentureBeat . Retrieved 1 April 2024.
  3. Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N; Kaiser, Łukasz; Polosukhin, Illia (2017). "Attention is All you Need" (PDF). Advances in Neural Information Processing Systems. 30. Curran Associates, Inc.
  4. Toews, Rob (3 September 2023). "Transformers Revolutionized AI. What Will Replace Them?". Forbes . Archived from the original on 26 September 2023. Retrieved 3 December 2023.
  5. Murgia, Madhumita (23 July 2023). "Transformers: the Google scientists who pioneered an AI revolution". Financial Times . Archived from the original on 28 December 2023. Retrieved 22 March 2024.
  6. 1 2 3 Levy, Steven. "8 Google Employees Invented Modern AI. Here's the Inside Story". Wired. ISSN   1059-1028 . Retrieved 20 March 2024.
  7. 1 2 "Meet the $4 Billion AI Superstars That Google Lost". Bloomberg. 13 July 2023 via www.bloomberg.com.