PaLM

Last updated
PaLM
Developer(s) Google AI
Predecessor LaMDA
Successor Gemini
Available inEnglish
Type Large language model
Website ai.google/discover/palm2/   OOjs UI icon edit-ltr-progressive.svg

PaLM (Pathways Language Model) is a 540 billion parameter transformer-based large language model developed by Google AI. [1] Researchers also trained smaller versions of PaLM, 8 and 62 billion parameter models, to test the effects of model scale. [2]

Contents

PaLM is capable of a wide range of tasks, including commonsense reasoning, arithmetic reasoning, joke explanation, code generation, and translation. [2] [3] [4] [5] When combined with chain-of-thought prompting, PaLM achieved significantly better performance on datasets requiring reasoning of multiple steps, such as word problems and logic-based questions. [1] [2]

The model was first announced in April 2022 and remained private until March 2023, when Google launched an API for PaLM and several other technologies. [6] The API was initially available to a limited number of developers who joined a waitlist before it was released to the public. [7]

Google and DeepMind developed a version of PaLM 540B called Med-PaLM that is fine-tuned on medical data and outperforms previous models on medical question answering benchmarks. [8] [9] Med-PaLM was the first to obtain a passing score on U.S. medical licensing questions, and in addition to answering both multiple choice and open-ended questions accurately, it also provides reasoning and is able to evaluate its own responses. [10]

Google also extended PaLM using a vision transformer to create PaLM-E, a state-of-the-art vision-language model that can be used for robotic manipulation. [11] [12] The model can perform tasks in robotics competitively without the need for retraining or fine-tuning. [13]

In May 2023, Google announced PaLM 2 at the annual Google I/O keynote. [14] PaLM 2 is reported to be a 340 billion parameter model trained on 3.6 trillion tokens. [15]

In June 2023, Google announced AudioPaLM for speech-to-speech translation, which uses the PaLM-2 architecture and initialization. [16]

Training

PaLM is pre-trained on a high-quality corpus of 780 billion tokens that comprise various natural language tasks and use cases. This dataset includes filtered webpages, books, Wikipedia articles, news articles, source code obtained from open source repositories on GitHub, and social media conversations. [1] [2] It is based on the dataset used to train Google's LaMDA model. [2] The social media conversation portion of the dataset makes up 50% of the corpus, which aids the model in its conversational capabilities. [2]

PaLM 540B was trained over two TPU v4 Pods with 3,072 TPU v4 chips in each Pod attached to 768 hosts, connected using a combination of model and data parallelism, which is the largest TPU configuration described to date. [2] [17] This allowed for efficient training at scale, using 6,144 chips, and marked a record for the highest training efficiency achieved for LLMs at this scale: a hardware FLOPs utilization of 57.8%. [3]

See also

Related Research Articles

Google Cloud Platform (GCP), offered by Google, is a suite of cloud computing services that provides a series of modular cloud services including computing, data storage, data analytics, and machine learning, alongside a set of management tools. It runs on the same infrastructure that Google uses internally for its end-user products, such as Google Search, Gmail, and Google Docs, according to Verma, et.al. Registration requires a credit card or bank account details.

<span class="mw-page-title-main">TensorFlow</span> Machine learning software library

TensorFlow is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of tasks but has a particular focus on training and inference of deep neural networks.

<span class="mw-page-title-main">OpenAI</span> Artificial intelligence research organization

OpenAI is a U.S. based artificial intelligence (AI) research organization founded in December 2015, researching artificial intelligence with the goal of developing "safe and beneficial" artificial general intelligence, which it defines as "highly autonomous systems that outperform humans at most economically valuable work". As one of the leading organizations of the AI spring, it has developed several large language models, advanced image generation models, and previously, released open-source models. Its release of ChatGPT has been credited with starting the AI spring.

Google AI is a division of Google dedicated to artificial intelligence. It was announced at Google I/O 2017 by CEO Sundar Pichai.

Bidirectional Encoder Representations from Transformers (BERT) is a language model based on the transformer architecture, notable for its dramatic improvement over previous state of the art models. It was introduced in October 2018 by researchers at Google. A 2020 literature survey concluded that "in a little over a year, BERT has become a ubiquitous baseline in Natural Language Processing (NLP) experiments counting over 150 research publications analyzing and improving the model."

Generative Pre-trained Transformer 3 (GPT-3) is a large language model released by OpenAI in 2020. Like its predecessor, GPT-2, it is a decoder-only transformer model of deep neural network, which supersedes recurrence and convolution-based architectures with a technique known as "attention". This attention mechanism allows the model to selectively focus on segments of input text it predicts to be most relevant. It uses a 2048-tokens-long context, float16 (16-bit) precision, and a hitherto-unprecedented 175 billion parameters, requiring 350GB of storage space as each parameter takes 2 bytes of space, and has demonstrated strong "zero-shot" and "few-shot" learning abilities on many tasks.

<span class="mw-page-title-main">GPT-2</span> 2019 text-generating language model

Generative Pre-trained Transformer 2 (GPT-2) is a large language model by OpenAI and the second in their foundational series of GPT models. GPT-2 was pre-trained a dataset of 8 million web pages. It was partially released in February 2019, followed by full release of the 1.5-billion-parameter model on November 5, 2019.

<span class="mw-page-title-main">DALL-E</span> Image-generating deep-learning model

DALL·E, DALL·E 2, and DALL·E 3 are text-to-image models developed by OpenAI using deep learning methodologies to generate digital images from natural language descriptions, called "prompts."

Wu Dao is a multimodal artificial intelligence developed by the Beijing Academy of Artificial Intelligence (BAAI). Wu Dao 1.0 was first announced on January 11, 2021; an improved version, Wu Dao 2.0, was announced on May 31. It has been compared to GPT-3, and is built on a similar architecture; in comparison, GPT-3 has 175 billion parameters — variables and inputs within the machine learning model — while Wu Dao has 1.75 trillion parameters. Wu Dao was trained on 4.9 terabytes of images and texts, while GPT-3 was trained on 45 terabytes of text data. Yet, a growing body of work highlights the importance of increasing both data and parameters. The chairman of BAAI said that Wu Dao was an attempt to "create the biggest, most powerful AI model possible"; although direct comparisons between models based on parameter count do not directly correlate to quality. Wu Dao 2.0, was called "the biggest language A.I. system yet". It was interpreted by commenters as an attempt to "compete with the United States".. Notably, the type of architecture used for Wu Dao 2.0 is a mixture-of-experts (MoE) model, unlike GPT-3, which is a "dense" model: while MoE models require much less computational power to train than dense models with the same numbers of parameters, trillion-parameter MoE models have shown comparable performance to models that are hundreds of times smaller.

Prompt engineering is the process of structuring text that can be interpreted and understood by a generative AI model. A prompt is natural language text describing the task that an AI should perform.

A foundation model is a machine learning model that is trained on broad data such that it can be applied across a wide range of use cases. Foundation models have transformed artificial intelligence (AI), powering prominent generative AI applications like ChatGPT. The Stanford Institute for Human-Centered Artificial Intelligence's Center for Research on Foundation Models created and popularized the term.

<span class="mw-page-title-main">Stable Diffusion</span> Image-generating machine learning model

Stable Diffusion is a deep learning, text-to-image model released in 2022 based on diffusion techniques. It is considered to be a part of the ongoing AI boom.

<span class="mw-page-title-main">Generative pre-trained transformer</span> Type of large language model

Generative pre-trained transformers (GPT) are a type of large language model (LLM) and a prominent framework for generative artificial intelligence. They are artificial neural networks that are used in natural language processing tasks. GPTs are based on the transformer architecture, pre-trained on large data sets of unlabelled text, and able to generate novel human-like content. As of 2023, most LLMs have these characteristics and are sometimes referred to broadly as GPTs.

<span class="mw-page-title-main">EleutherAI</span> Artificial intelligence research collective

EleutherAI is a grass-roots non-profit artificial intelligence (AI) research group. The group, considered an open-source version of OpenAI, was formed in a Discord server in July 2020 to organize a replication of GPT-3. In early 2023, it formally incorporated as the EleutherAI Foundation, a non-profit research institute.

A large language model (LLM) is a language model notable for its ability to achieve general-purpose language generation and other natural language processing tasks such as classification. LLMs acquire these abilities by learning statistical relationships from text documents during a computationally intensive self-supervised and semi-supervised training process. LLMs can be used for text generation, a form of generative AI, by taking an input text and repeatedly predicting the next token or word.

In deep learning, fine-tuning is an approach to transfer learning in which the weights of a pre-trained model are trained on new data. Fine-tuning can be done on the entire neural network, or on only a subset of its layers, in which case the layers that are not being fine-tuned are "frozen". A model may also be augmented with "adapters" that consist of far fewer parameters than the original model, and fine-tuned in a parameter–efficient way by tuning the weights of the adapters and leaving the rest of the model's weights frozen.

The Pile is an 886.03 GB diverse, open-source dataset of English text created as a training dataset for large language models (LLMs). It was constructed by EleutherAI in 2020 and publicly released on December 31 of that year. It is composed of 22 smaller datasets, including 14 new ones.

<span class="mw-page-title-main">Generative artificial intelligence</span> AI system capable of generating content in response to prompts

Generative artificial intelligence is artificial intelligence capable of generating text, images or other data using generative models, often in response to prompts. Generative AI models learn the patterns and structure of their input training data and then generate new data that has similar characteristics.

LLaMA is a family of autoregressive large language models (LLMs), released by Meta AI starting in February 2023.

<span class="mw-page-title-main">Gemini (language model)</span> Large language model developed by Google

Gemini is a family of multimodal large language models developed by Google DeepMind, serving as the successor to LaMDA and PaLM 2. Comprising Gemini Ultra, Gemini Pro, and Gemini Nano, it was announced on December 6, 2023, positioned as a competitor to OpenAI's GPT-4. It powers the generative artificial intelligence chatbot of the same name.

References

  1. 1 2 3 Narang, Sharan; Chowdhery, Aakanksha. "Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance". ai.googleblog.com. Retrieved 17 March 2023.
  2. 1 2 3 4 5 6 7 Chowdhery, Aakanksha; Narang, Sharan; Devlin, Jacob; et al. (2022). "PaLM: Scaling Language Modeling with Pathways". arXiv: 2204.02311 [cs.CL].
  3. 1 2 Anadiotis, George (12 April 2022). "Google sets the bar for AI language models with PaLM". VentureBeat. Retrieved 17 March 2023.
  4. Bastian, Matthias (5 April 2022). "Google PaLM: Giant language AI can explain jokes". THE DECODER. Retrieved 17 March 2023.
  5. "Google: Why Is No One Talking About PaLM (NASDAQ:GOOG) | Seeking Alpha". seekingalpha.com. 12 December 2022. Retrieved 17 March 2023.
  6. Vincent, James (14 March 2023). "Google opens up its AI language model PaLM to challenge OpenAI and GPT-3". The Verge. Retrieved 17 March 2023.
  7. Huffman, Scott; Woodward, Josh. "PaLM API & MakerSuite: an approachable way to start prototyping and building generative AI applications" . Retrieved 17 March 2023.
  8. Singhal, Karan; Azizi, Shekoofeh; Tu, Tao; et al. (2022). "Large Language Models Encode Clinical Knowledge". arXiv: 2212.13138 [cs.CL].
  9. "MedPaLM: New Chatbots Will Soon Be Better Than Waiting For A Doctor". The Medical Futurist. 17 January 2023. Retrieved 17 March 2023.
  10. Matias, Yossi; Corrado, Greg (14 March 2023). "Our latest health AI research updates". Google. Retrieved 17 March 2023.
  11. Driess, Danny; Xia, Fei; Sajjadi, Mehdi S. M.; et al. (2023). "PaLM-E: An Embodied Multimodal Language Model". arXiv: 2303.03378 [cs.LG].
  12. Driess, Danny; Florence, Pete. "PaLM-E: An embodied multimodal language model". ai.googleblog.com. Retrieved 17 March 2023.
  13. Edwards, Benj (7 March 2023). "Google's PaLM-E is a generalist robot brain that takes commands". Ars Technica. Retrieved 17 March 2023.
  14. Lardinois, Frederic (May 10, 2023). "Google launches PaLM 2, its next-gen large language model". TechCrunch . Archived from the original on May 10, 2023. Retrieved May 10, 2023.
  15. Elias, Jennifer (16 May 2023). "Google's newest A.I. model uses nearly five times more text data for training than its predecessor". CNBC . Retrieved 18 May 2023.
  16. "AudioPaLM". google-research.github.io. Retrieved 2023-06-30.
  17. "An empirical analysis of compute-optimal large language model training". www.deepmind.com. Retrieved 17 March 2023.